Use sed to replace the last space in each line with a comma, then remove all spaces

I have a two-column, space delimited .txt file, but the first column has spaces (which are errors). I need to convert it to a csv, but I can't just replace all the spaces with commas.

Example input:

gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1

Desired output:

gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

How can I use sed (or anything else) to replace the last space in a row with a comma, then remove all remaining spaces? Would that effectively create a CSV file?

4 Answers

Something like:

sed -r 's/(.*) /\1,/; s/ //g'

The first substitution, being greedy, will cover all but the last space in the group, replacing the last with a ,. The second will then eliminate the rest.

This would do the job:

sed -r "s/\s([0-9]+$)/,\1/" filename.txt | tr -d ' '

or:

sed -r "s/\s([0-9]+$)/,\1/; s/\s//g" filename.txt

Input example:

gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1

Output:

gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

Here's a geeky way - with a sed loop.

if the pattern contains only a single space, replace it with a comma
(otherwise) replace the first space with nothing and goto 1

which we can write in GNU sed as

sed -e :1 -e '/^[^ ]* [^ ]*$/ s/ /,/' -e 's/ //; t1'

Testing:

$ echo 'gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1' | sed -e :1 -e '/^[^ ]* [^ ]*$/ s/ /,/' -e 's/ //; t1'
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

Perl

$ perl -ne 's/\s//g;s/^(.*)([[:digit:]])$/\1,\2/;print' input.txt
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

or shorter:

perl -pe 's/\s//g;s/^(.*)([[:digit:]])$/\1,\2/' input.txt

Effectivelly this is the opposite of muru's approach: we get rid of all spaces first, then group everything before last item (group \1) and last item (group \2, two which happens to be digit). We replace the line with group \1 and \2 being separated by comma.

Note that ([[:digit:]]) can be changed into (.) to reference any character,in case that's necessary (i.e., if we expect last char to be of any type), or we can use ([[:graph:]]) to deal with only printable chars

Use sed to replace the last space in each line with a comma, then remove all spaces

4 Answers

Perl

Your Answer

Sign up or log in

Post as a guest

More in news

'Zurdo' pounds out decision vs. Smith in cruiserweight eliminator

Review: Feud: Capote vs. The Swans, “Hats, Gloves and Effete Homosexuals”