M BUZZ CRAZE NEWS
// news

Use sed to replace the last space in each line with a comma, then remove all spaces

By Mia Morrison

I have a two-column, space delimited .txt file, but the first column has spaces (which are errors). I need to convert it to a csv, but I can't just replace all the spaces with commas.

Example input:

gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1

Desired output:

gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

How can I use sed (or anything else) to replace the last space in a row with a comma, then remove all remaining spaces? Would that effectively create a CSV file?

0

4 Answers

Something like:

sed -r 's/(.*) /\1,/; s/ //g'

The first substitution, being greedy, will cover all but the last space in the group, replacing the last with a ,. The second will then eliminate the rest.

5

This would do the job:

sed -r "s/\s([0-9]+$)/,\1/" filename.txt | tr -d ' '

or:

sed -r "s/\s([0-9]+$)/,\1/; s/\s//g" filename.txt

Input example:

gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1

Output:

gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1
1

Here's a geeky way - with a sed loop.

  1. if the pattern contains only a single space, replace it with a comma
  2. (otherwise) replace the first space with nothing and goto 1

which we can write in GNU sed as

sed -e :1 -e '/^[^ ]* [^ ]*$/ s/ /,/' -e 's/ //; t1'

Testing:

$ echo 'gi|118592783|ref|ZP_01550172.1|_biphenyl-2 3-diol_1 2-dioxygenase_[Stappia_aggregata_IAM_12614] 1' | sed -e :1 -e '/^[^ ]* [^ ]*$/ s/ /,/' -e 's/ //; t1'
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

Perl

$ perl -ne 's/\s//g;s/^(.*)([[:digit:]])$/\1,\2/;print' input.txt
gi|118592783|ref|ZP_01550172.1|_biphenyl-23-diol_12-dioxygenase_[Stappia_aggregata_IAM_12614],1

or shorter:

perl -pe 's/\s//g;s/^(.*)([[:digit:]])$/\1,\2/' input.txt 

Effectivelly this is the opposite of muru's approach: we get rid of all spaces first, then group everything before last item (group \1) and last item (group \2, two which happens to be digit). We replace the line with group \1 and \2 being separated by comma.

Note that ([[:digit:]]) can be changed into (.) to reference any character,in case that's necessary (i.e., if we expect last char to be of any type), or we can use ([[:graph:]]) to deal with only printable chars

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy