实现目标
$ myprog.py 打印帮助
$ myprog.py -h/--help 打印帮助
$ myprog.py input.txt 读取输入文件,输出到标准输出
$ myprog.py input.txt -o output.txt 读取输入文件,输出到文件
$ cat input.txt | myprog.py 读取标准输入,输出到标准输出
$ cat input.txt | myprog.py 读取标准输入,输出到文件
$ cat input.txt | myprog.py | less -N 接管道符
问题关键
没有指定输入文件时,需要判断 sys.stdin
是否为空,为空时打印帮助,不为空时从stdin读取
通过 sys.stdin.isatty()
方法即可判断,True表示 sys.stdin
为空
另外 signal.signal(signal.SIGPIPE, signal.SIG_DFL)
可避免接管道符退出时出现 IOError: [Errno 32] Broken pipe
示例代码
#!/usr/bin/env python2
# encoding: utf-8
'''\033[1;3;32m
normalizing file with multiple encodings to utf8
\033[0m'''
import sys
import signal
import textwrap
import chardet
reload(sys)
sys.setdefaultencoding('utf8')
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
__author__ = 'suqingdong'
__version__ = '1.0'
def main(infile=None, outfile=None):
infile = sys.stdin if infile in ('-', 'stdin') else open(infile)
outfile = sys.stdout if outfile in ('-', 'stdout') else open(outfile, 'w')
with infile as inf, outfile as out:
for line in inf:
enc = chardet.detect(line)['encoding'] or 'gbk'
line = line.decode(enc)
out.write(line)
if __name__ == '__main__':
import argparse
epilog = textwrap.dedent('''
\033[36mexample:
%(prog)s input.txt
%(prog)s input.txt -o output.txt
cat input.txt | %(prog)s
cat input.txt | %(prog)s -o output.txt
\033[33mcontact: {__author__}@novogene.com
''').format(**globals())
parser = argparse.ArgumentParser(
prog='norm_encoding',
description=__doc__,
epilog=epilog,
formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument(
'infile',
nargs='?',
default='stdin',
help='the input file which maybe contains multiple encodings [default: %(default)s]')
parser.add_argument(
'-o',
'--outfile',
default='stdout',
help='the output file [default: %(default)s]')
# 没有输入文件,其标准输入为空时,只打印帮助
if (len(sys.argv) == 1) and sys.stdin.isatty():
parser.print_help()
exit()
args = vars(parser.parse_args())
main(**args)