Replace spaces in a matched substring with underscores
I'm new to the forum so please forgive any syntactical errors in my question.
I'm trying to replace spaces with underscores in a matched substring only. I figured sed would be the best editor for this but I cannot find the proper code to do this.
Sample line from file1 below:
Some text before pattern to match href="./Dynamic Directory name - Junk_files/irrelevant stuff after match">
Would like to change to this:
Some text before pattern to match href="./Dynamic_Directory_name_-_Junk_files/irrelevant stuff after match">
I thought I was close with this cat file1 |sed '/\.\/.*. Junk_files/ { s/ /_/g; }' but all it did was replace all spaces on the matched line with underscores.
Any help with this would be greatly appreciate. Thanks
24 Answers
Try this, it finds the first pair of slashes and removes all spaces between!
awk -F'/' '{for(i=2;i<=NF;i++)if(i==2)gsub(" ","_",$i);}1' OFS="/"Example
file='href="./Dynamic Directory name - Junk_files/irrelevant stuff after match">'
echo $file | awk -F'/' '{for(i=2;i<=NF;i++)if(i==2)gsub(" ","_",$i);}1' OFS="/"
# Output:
href="./Dynamic_Directory_name_-_Junk_files/irrelevant stuff after match"> 8 Through python,
$ echo 'href="./Dynamic Directory name - Junk_files/irrelevant stuff after match"' |
> python -c "import re;
> import sys;
> print re.sub(r'(?<=\./).*?(?=/)', lambda m: m.group().replace(' ', '_'), sys.stdin.read())
> "
href="./Dynamic_Directory_name_-_Junk_files/irrelevant stuff after match"Through perl,
$ echo 'href="./Dynamic Directory name - Junk_files/irrelevant stuff' | perl -pe '
> s/\s(?=(?:(?!\.\/).)*?\/)/_/g
> '
href="./Dynamic_Directory_name_-_Junk_files/irrelevant stuff 2 It's better to use an XML parser.
If you insist on using sed; assuming the pattern stays consistent:
sed -r 's#^([^/]+/[^ ]+) ([^ ]+) ([^ ]+) - ([^ ]+/)#\1_\2_\3_-_\4#' file.txtThis will replace all spaces between two forward slashes (/) with underscores (_). As the input contains /, i have used # as the pattern separator for sed.
Example:
% sed -r 's#^([^/]+/[^ ]+) ([^ ]+) ([^ ]+) - ([^ ]+/)#\1_\2_\3_-_\4#' <<<'Some text before pattern to match href="./Dynamic Directory name - Junk_files/irrelevant stuff after match">'
Some text before pattern to match href="./Dynamic_Directory_name_-_Junk_files/irrelevant stuff after match"> 6 That's HTML, and unless you have a very well defined a simple enough subset of HTML in your file, parsing HTML using regular expressions is a pretty bad idea.
This Perl one-liner works for replacing that substring in that specific context:
printf 'Some text before pattern to match href="./Dynamic Directory name - Junk_files/irrelevant stuff after match">\n' | perl -ne 'if(/(.*?")(.*\/)(.*)/){$x = $1; $y = $2; $z = $3; $y =~ s/ /_/g; print("$x$y$z")}'Meaning: it will just replace spaces with underscores in the first " and / delimited substring found. But that's about it. If you're parsing a complex document, don't use it. You could make the pattern more strict (for example you could use /href=(.*?")(.*\/)(.*)/ and print("href=$x$y$z")), but that could still fail upon any occurrence of /href=(.*?")(.*\/)(.*)/.
Unless you're parsing a very well defined and simple enough subset of HTML in your file and you're sure something like that won't fail, just use an HTML parser.