Use sed to replace the last space in each line with a comma, then remove all spaces
I have a two-column, space delimited .txt file, but the first column has spaces (which are errors). I need to convert it to a csv, but I can't just replace all the spaces with commas.
Example input:
gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1Desired output:
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1How can I use sed (or anything else) to replace the last space in a row with a comma, then remove all remaining spaces? Would that effectively create a CSV file?
4 Answers
Something like:
sed -r 's/(.*) /\1,/; s/ //g'The first substitution, being greedy, will cover all but the last space in the group, replacing the last with a ,. The second will then eliminate the rest.
This would do the job:
sed -r "s/\s([0-9]+$)/,\1/" filename.txt | tr -d ' 'or:
sed -r "s/\s([0-9]+$)/,\1/; s/\s//g" filename.txtInput example:
gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1Output:
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1 1 Here's a geeky way - with a sed loop.
- if the pattern contains only a single space, replace it with a comma
- (otherwise) replace the first space with nothing and goto 1
which we can write in GNU sed as
sed -e :1 -e '/^[^ ]* [^ ]*$/ s/ /,/' -e 's/ //; t1'Testing:
$ echo 'gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1' | sed -e :1 -e '/^[^ ]* [^ ]*$/ s/ /,/' -e 's/ //; t1'
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1 Perl
$ perl -ne 's/\s//g;s/^(.*)([[:digit:]])$/\1,\2/;print' input.txt
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1or shorter:
perl -pe 's/\s//g;s/^(.*)([[:digit:]])$/\1,\2/' input.txt Effectivelly this is the opposite of muru's approach: we get rid of all spaces first, then group everything before last item (group \1) and last item (group \2, two which happens to be digit). We replace the line with group \1 and \2 being separated by comma.
Note that ([[:digit:]]) can be changed into (.) to reference any character,in case that's necessary (i.e., if we expect last char to be of any type), or we can use ([[:graph:]]) to deal with only printable chars