1. Tutorial

Sed has four memory space: input stream, output stream, pattern space, hold buffer.

Sed operates on the input stream and produces an output stream. Lines from input stream are placed into the pattern space (where they can be modified) and then pattern space is sent to output stream. The hold buffer can be used for temporary storage.

GNU sed manual

2. Command

sed的command分为 zero-addressone-address 以及 address range 三类:
zero-address:不指定地址,指全文每一行都匹配
one-address:匹配某一行的地址 — 指定行,或 /pattern/
address-range:匹配行区间:
#,# #,+x #,/pattern/ /pattern/,# /pattern/,+#
#,~N: #及其后的N的倍数行

匹配之后,匹配条件对应多个操作,应使用 {...} 对多个cmd用大括号包裹,并使用 ; 对多个cmd进行分隔

2.1 Copy(g)/Append(G) hold space to pattern space

g copys hold space to pattern space — override the pattern space;
G appends hold space to pattern space;
And then add a newline followed by the contents of hold buffer to pattern space:
pattern_space + hold_buffer + newline

  1. # Case 1
  2. sed G
  3. # 2 double space a file
  4. sed '/^$/d;G'

Case 1: Once all the commands have been executed (in this case just the Gcommand), sed puts the contents of pattern space to output stream followed by a newline. And there we have it. Every line now is followed by two newlines – one added by the G command and the other by output stream. File has been double spaced.

Case2: Before doing the regular expression match, sed pushes the input line to pattern space. When doing it, sed strips the trailing newline character. The empty lines contain just the newline character, so after they have been put into pattern space, this only character has been removed and pattern space stays empty.
Regular expression /^$/ matches an empty pattern space and sed applies d command on it, “d” command’s functions refers to section 2.2.

2.2 Delete pattern space d and D

d command deletes the current hole pattern space, reads in the next line, puts the newline into the pattern space, and restarts the program cycle without processing the rest of the commands and without printing the pattern space.
D command delete text in the pattern space up to the first newline(just delete the first line if there are serveral lines), and restarts the program cycle without processing the rest of the commands and without printing the pattern space.

2.3 Read(n)/Append(N) the next input line to pattern space

The n command will print out the current pattern space(unless the “-n” flag is used), empty the current pattern space, and read in the next line of input, But **Do Not Stop And Restart Execution At The First Sed Cmomand**!!!
The N command will append the next line of input to pattern space, and continue the sed command.

  1. # print the even line of input
  2. sed -n 'n;p'
  3. # print each line of input
  4. sed -n 'N;p'

2.4 Exchange hold and pattern space(x)

The x command exchanges the hold buffer and the pattern buffer.

  1. # Insert a blank line above every line that matches "regex"
  2. sed '/regex/ {x;p;x}'

2.5 Print the current pattern space (p/P)

If sed wasn’ t started with an “-n” option, the”p” command will duplicate the input.
The p command prints the entire pattern space.
The P command prints the fist line.

2.6 Copy(h)/Append(H) pattern space to hold space

h command copies the pattern space to hold space — override hold space;
H command append pattern space to hold space.

2.7 Print currend line num(=)

Zero-address or One-address command.
The = command prints the current line number to standard output.

  1. # tc.txt
  2. abc
  3. def
  4. ghi
  5. sed = tc.txt | sed 'N;s/\n/\t/'
  6. 1 abc
  7. 2 def
  8. 3 ghi

前一个sed执行结果,输出为
1\nabc\n
2\n
def\n
3\n
ghi\n

后一个sed执行了 N 命令后,输出为:
1\nabc\n
2\ndef\n
3\nghi\n
**
通过替换首个匹配到的\n为\t,格式化整行输出为最终结果
1 abc\n
2 def\n
3 ghi\n

2.8 s/regex/repl/

The substitute command replaces all occurrences of the regular expression(regex) with repl(acement).

  1. # 为文件每行添加行号
  2. sed = tc.txt | sed 'N; s/^/ /; s/ *\(.*\)\n/\1 /'
  3. 1 abc
  4. 2 def
  5. 3 ghi
  6. #使用awk工具实现同样功能:
  7. awk '{print NR" "$0}' tc.txt
  1. # tc.txt
  2. abc
  3. def
  4. ghi
  5. # 在保持源文件格式的前提下,增加非空行的行号
  6. sed '/./=' tc.txt | sed '/./N; s/\n/ /'
  7. 1 abc
  8. 3 def
  9. 6 ghi
  10. #使用awk工具实现同样功能:
  11. awk '/^$/; !/^$/ {print NR" "$0}' tc.txt

2.9 $

Match the last line.

2.10 call shell for help

  1. # Convert Unix newlines (LF) to DOS/Windows newlines (CRLF)
  2. sed "s/$/`echo '\r'`/"

Notice: 必须要使用双引号”” 括住sed的cmd

OR

  1. sed 's/$/\r/'

2.11 t label

You can execute a branch if a pattern is found. You may want to execute a branch only if a substitution is made. The command”t label” will branch to the label if the last substitute command modified the pattern space.

只有substitute实现替换改变了pattern space内容,才会执行跳转到label处

  1. # Align lines right on a 79-column width.
  2. sed -e :a -e 's/^.\{1,78\}$/ &/;ta'

When s/^.\{1,78\}$/ &/ substitute cmd’s regex is meet, modify the pattern space with a white space appended with the matched string(Expressed with &). Until the substitute cmd’s regex meets nothing, stop continue.

& 表示 regex 匹配的字符串。

  1. # Center all text in the middle of 79-column width.
  2. sed -e :a -e 's/^.\{1,77\}$/ & /; ta'
  3. # or 先实现补pre_padding,再将pre_padding减半
  4. sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/'

参考awk使用内部变量协助实现方式:

  1. # 右对齐
  2. awk '{ printf "%79s\n", $0 }'
  3. # 居中
  4. awk '{ l=length(); s=int((79-l)/2); printf "%"(s+l)"s\n", $0 }'

2.12 s/regex/repl/(flag)

The substitute command replaces occurrences of the regular expression(regex) with repl(acement) on each line.
flag default is 1:first occurrence.
To replace all occurrence of each line, set flag to ‘g’

  1. # echo "this is foo and another foo quux"
  2. # 将第一个foo替换为bar
  3. sed 's/foo/bar/'
  4. # 将最后一个foo替换为bar
  5. sed 's/\(.*\)foo/\1bar/'
  6. # 将所有包含 baz 的行中的 foo 替换为 bar
  7. sed '/baz/s/foo/bar/g'
  8. # 将所有不包含 baz 的行中的 foo 替换为 bar
  9. sed '/baz/!s/foo/bar/g'
  10. # Change text "scarlet", "ruby" or "puce" to "red".
  11. sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g'
  12. # If you are using GNU sed, then you can do it simpler:
  13. sed 's/scarlet\|ruby\|puce/red/g'

2.13 Write pattern space to filename(w)

w filename

3. Option

3.1 Silent模式 -n

Do not to send the line to output after it has been processed in the pattern space.
The only way to make sed output anything with the “-n” switch being on is to use a command that modifies the output stream directly (these commands are ‘=’, ‘a’, ‘c’, ‘i’, ‘I’, ‘p’, ‘P’, ‘r’ and ‘w’).

3.2 串联多个sed命令 -e

Add the script to the commands to be executed

sed -e script1 -e script2 …

3.3 原地修改模式 -i

Edit files in place.

3.4 使用扩展正则表达式 -E

Use extended regular expressions in the script.

4. 常用命令

  1. 删除匹配项的行,但仍需要保留某些行,即使他们符合匹配规则 — gsed
  1. # 符合regex模式的所有行中,且不包括x-y的行,执行删除模式空间动作
  2. sed '/regex/{x,yd!}'
  3. # 拆解
  4. # 删除 x-y之间的行
  5. sed 'x,yd'
  6. # 保留 x-y 行
  7. sed 'x,y!d'
  8. # or
  9. sed '{x,y!d}'

练手命令:

  1. # 删除行首空白
  2. sed 's/^[ \t]*//'
  3. # 删除行尾空白
  4. sed '/[ \t]*$//'
  5. # 删除所有行首行尾空白
  6. sed 's/^[ \t]*//; s/[ \t]*$//'
  7. # 将以 "\+换行符" 结尾的行,与之上的行合并
  8. sed -e :a; -e '/\\$/N; s/\\\n//; ta'
  9. # 将 以"="起始的行,合并到上一行
  10. sed -e :a -e '$!N; s/\n=/ /; ta; P ; D'
  11. # 数字分组
  12. # 12345 1234 123 -> 12,345 1,234 123
  13. sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/' -e 'ta'
  14. # 另外一种正则匹配方法, 利用正则表达式的 位置相关的反义符 \B: 匹配不是单词开头或结束的位置
  15. 以及\>: 匹配单词结束的位置
  16. sed -r -e 's/\B[0-9]{3}\b/,&/g'
  17. # 每隔N行执行一个动作
  18. sed '0~5G'
  19. # 打印尾部2行
  20. sed '$!N;$!D'
  21. # or
  22. sed 'x;N;D'
  23. # 打印尾部10行
  24. sed -e :a -e '$q;$!N;11,$D;ba'
  25. # 因为awk是迭代器模式,故并不适合这种需要行间缓存机制的处理
  26. # 打出倒数第2行
  27. sed -n '$!{h;d};x;p'
  28. # or
  29. sed -n '$!N;$!D;P'
  30. # 打倒数第Z行
  31. sed -n -e :a -e '${P;q};$!N;Z+1,$D;ba'
  32. # 打印符合regex条件的行的前一行
  33. sed -n '/regex/{g;1!p;};h'
  34. # Print a paragraph that contains "AAA".
  35. sed -n -e '/./{H;d}' -e 'x;/AAA/{s/\n//;p}'
  36. # Print section of a file from a regex to end of file
  37. sed -n '/regex/,$ p'
  38. # or
  39. sed -n -e '/regex/!d' -e :a -e '$!{N;ba}' -e 'p'
  40. # Delete duplicate, consecutive lines from a file
  41. sed -n -r '$!N; /^(.*)\n\1$/!P; D'
  42. # Delete duplicate, consecutive lines from a file
  43. sed -n -r '$!N; /^(.*)\n\1$/!P; D'
  44. # Delete all lines except duplicate consecutive lines
  45. sed -n -r '$!N; /^(.*)\n\1$/P; D'
  46. # Delete the last 2 lines of the file
  47. sed -n '$!N; $!P; $!D; q'
  48. # Delete the last 10 lines of a file
  49. sed -n -e :a -e '$q;N;2,10ba;' -e 'P;D'
  50. # or as follows: 'condition!{action; D/d-return_back}; action' -- if/else实现
  51. sed -n -e :a -e '1,10!{P;N;D;};N;ba'
  52. # Delete all the leading blank lines
  53. sed -n '/./,$!d;p'
  54. # Delete all the trailing blank lines
  55. sed -n -e :a -e '/^\n*$/N;/\n$/ba;p'
  56. # Delete the last line of each paragraph.
  57. sed -n '/^$/{p;h}; /./{x;//;}'
  58. # Extract subject from an email message.
  59. sed '/^Subject: */!d; s///; q' # s/// -> s/^Subject: *//
  60. # Extract email address from a "Name Surname < email@domain.com > XXX" string.
  61. sed -n -r 's/.*< *| *>.*//gp' -> email@domain.com
  62. # Strip HTML tags.
  63. sed -e :a -e 's/<[^>]*>//g'

// 在sed中,代表上个匹配式

  1. sed '/^$/N;/\n$/N;//D'
  2. # // means /\n$/

An Important Comment About Ranges!

I have an important comment about ranges. Ranges in form “/start/,/finish/“ always match 2 lines or more. If “/finish/“ is on the same line as “/start/“ it will not work. Please see the Sed FAQ 3.3 for more details. — https://catonmat.net/sed-one-liners-explained-part-two

遗留问题

  1. 源文件和目标文件为同一文件时,修改后目标文件变为空
  1. cat 2.txt | sed 's/$/\r/' > 2.txt
  2. # sed的option -i:edit files in place 可实现原地编辑
  3. sed -i 's/$/\r/' 2.txt