案例解析 - 73 唐诗三百首 - 《Python程序设计数字教程》

‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪描述

‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪描述

俗话说，“熟读唐诗三百首，不会吟诗也会吟”，请分析附件的唐诗300首文本文件。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
完成下列功能：（部分功能需要使用jieba第三方库）‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
1. 统计每首诗歌的作者，如果第一行输入‘作者’，第二行则输入一个整数n，输出出现最多的作者前n个，每行输出一个名字和出现次数，以空格间隔，程序结束‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
2. 统计出现的人名，如果第一行输入“人物”，第二行则输入一个整数n，输出出现最多的人名前n个，每行输出一个名字和对应出现次数，以空格间隔，程序结束‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
注：有的诗人在诗名或诗句中用到了别的诗人的名字。如“梦李白二首之一”。因此第1，2项目之间的数据可能有所差异。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
3. 如果输入某个字符串编号，范围和格式在“010”-“320”之间（测试用例保证编号存在），输出对应该编号的诗句。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
输出格式：去掉首行诗歌编号，其余格式与文件中诗歌显示格式相同。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
4. 如果输入‘唐诗’，输出文件中的诗词数量，程序结束‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
5. 飞花令，如果第一行输入’飞花’，则可以在第二行输入s中文字符（长度为1），然后按照在文件中出现的顺序，输出唐诗300首文件包含该中文字符的诗句（长度不超过7的诗句），每行一句。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
6. 如果非以上输入，输出“输入错误”，程序结束。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
本题需要用到中文分词，导入jieba库，安装方法是:

pip install jieba

导入中文分词库：

import jieba

读文件到字符串：

def readfile(filename):    # 读文件到字符串
    with open(filename, 'r', encoding='utf-8') as file:
        txt = file.read()  # 读取文件全部内容为一个字符串
    return txt             # 返回字符串

获得作者列表

def author(txt):
    ls = txt.strip().split()                                    # 根据空格将字符串切分为列表
    authorname = [name[3:] for name in ls if name[0].isdigit()] # 得到作者列表
    return authorname                                           # 返回作者列表

统计作者出现的次数

def authorcount(authorname,num):    # 通过键值对的形式存储作者及其出现的次数
    authorcounts = {}               # 空字典
    for name in authorname:         # 遍历作者列表
        authorcounts[name]=authorcounts.get(name,0)+1  # 统计每位作者出现的次数
    authorcounts = sorted(authorcounts.items(),key=lambda item:item[1],reverse=True)[:num] # 排序，取前num位
    for name in authorcounts:       # 输出排名前num个作者的名字及出现次数
        print(name[0],name[1])

统计人物出现的次数

def name(txt,num,authorname):       # 统计作者名字出现次数，包括诗句中出现的次数
    words = jieba.lcut(txt)         # 使用精确模式对文本进行分词
    namecounts = {}                 # 通过键值对的形式存储作者及其出现的次数
    for name in words:              # 遍历切分为词的列表
        if name in authorname:      # 若当前词为作者为之一，统计其次数
            namecounts[name] = namecounts.get(name, 0) + 1
    namecounts = sorted(namecounts.items(), key=lambda item: item[1], reverse=True)[:num]
    for name in namecounts:         # 输出排名前num个作者的名字及出现次数
        print(name[0], name[1])

统计诗的数量，因为每首诗都有作者，所以只计算作者列表的长度即可：

def countPoem(authorname):
    return len(authorname)        # 作者列表的长度即为诗的数量

输出某编号的诗的内容：

def poemout(txt,poemid):          # 输入一个编号，输出其对应的诗句
    start = txt.index(poemid)     # 定位编号出现的位置，以那里为起点
    stop = start + 3              # 终点从起点后3个字符开始，跳过编号
    for c in txt[start+3:]:       # 遍历每个字符
        if c.isdigit() == False:  # 若不是数字，则是当前诗的一部分
            stop = stop + 1       # 终点向后移动一个字符
        else:                     # 若遇到数字，结束遍历
            break
    print(txt[start + 3:stop])    # 输出起点到中间之间的字符串

飞花令，通过遍历文件的方法将诗句加入到列表中，加入时，遇到数字开头的行直接忽略掉。将列表中的元素用空格连接再重新用空格切分，可以将一句诗中上下半句切分开。遍历每句诗，若目标字在诗中存在且诗句长度小于等于7时，输出该诗句。

def flyingflower(word,filename):
    with open(filename, 'r', encoding='utf-8') as file:
        ls = [line.strip() for line in file if line[0].isdigit() == False] # 将非标题行加入列表
        ls = ' '.join(ls).split()                                          # 连接后重新切分
        for poem in ls:                                                    # 遍历列表中的诗句
            if word in poem and len(poem) <=7:                             # 若目标字在诗中存在且诗句长度小于等于7时
                print(poem)                                                # 输出该诗句

完整参考代码：

import jieba
def readfile(filename):
    with open(filename, 'r', encoding='utf-8') as file:
        txt = file.read()
    return txt
def author(txt):
    ls = txt.strip().split()
    authorname = [name[3:] for name in ls if name[0].isdigit()] # 得到作者列表
    return authorname
def authorcount(authorname,num):
    authorcounts = {}  # 通过键值对的形式存储作者及其出现的次数
    for name in authorname:
        authorcounts[name]=authorcounts.get(name,0)+1
    authorcounts = sorted(authorcounts.items(),key=lambda item:item[1],reverse=True)[:num]
    for name in authorcounts:
        print(name[0],name[1])
def name(txt,num,authorname):
    words = jieba.lcut(txt) # 使用精确模式对文本进行分词
    namecounts = {}  # 通过键值对的形式存储作者及其出现的次数
    for name in words:
        if name in authorname:
            namecounts[name] = namecounts.get(name, 0) + 1
    namecounts = sorted(namecounts.items(), key=lambda item: item[1], reverse=True)[:num]
    for name in namecounts:
        print(name[0], name[1])
def countPoem(authorname):
    return len(authorname)
def poemout(txt,poemid):   # 输入一个编号，输出其对应的诗句
    start = txt.index(poemid)
    stop = start + 3
    for c in txt[start+3:]:
        if c.isdigit() == False:
            stop = stop + 1
        else:
            break
    print(txt[start + 3:stop])
def flyingflower(word,filename):
    with open(filename, 'r', encoding='utf-8') as file:
        ls = [line.strip() for line in file if line[0].isdigit() == False]
        ls = ' '.join(ls).split()
        for poem in ls:
            if word in poem and len(poem) <=7:
                print(poem)
if __name__ == '__main__':
    filename = 'poem.txt'
    txt = readfile(filename)   # 文件内容读成字符串
    authorname = author(txt)
    choice = input()
    if choice =='作者':
        n = int(input())
        authorcount(authorname,n)
    elif choice =='人物':
        num = int(input())
        name(txt, num, authorname)
    elif choice.isdigit() == True and len(choice) == 3 and 10 <= int(choice) <= 320:
        poemout(txt, choice)
    elif choice == '飞花':
        word = input()
        if len(word) ==1:
            flyingflower(word,filename)
    elif choice == '唐诗':
        print(countPoem(authorname))
    else:
        print('输入错误')

2022年重写
poem.txt

def read_file():
    """读唐诗文件，返回字符串"""
    with open('poem.txt', 'r', encoding='UTF-8') as f:
        return f.read()


def author_dic(num):
    """统计每个作者诗的数量，以作者为键，诗的数量为值构建字典，返回按值降序排序列表"""
    author_ls = [x[3:] for x in poem_str.split() if x[:3].isdigit()]
    author_set = sorted(set(author_ls), key=author_ls.index)  # 去掉重复作者名，并按出现顺序排序
    author_dict = {x: author_ls.count(x) for x in author_set}  # 构建作者与作品数量的字典
    author = sorted(author_dict.items(), key=lambda x: x[1], reverse=True)[:num]
    return author


def person_dic(num):
    """统计每个作者出现频率，以作者为键，作者出现次数为值构建字典，返回按值降序排序列表"""
    author_ls = [x[3:] for x in poem_str.split() if x[:3].isdigit()]
    author_set = sorted(set(author_ls), key=author_ls.index)  # 去掉重复作者名，并按出现顺序排序
    person_dict = {x: poem_str.count(x) for x in author_set}  # 构建作者与作者名出现数量的字典
    person = sorted(person_dict.items(), key=lambda x: x[1], reverse=True)[:num]
    return person


def count_poems():
    """返回唐诗数量，每遇到3个数字开头为一首诗"""
    return len([x[:3] for x in poem_str.split('\n') if x[:3].isdigit()])


def flying_flower(word):
    """接收字符串为参数，输出包含该字符串且长度小于7的诗句"""
    with open('poem.txt', 'r', encoding='UTF-8') as f:
        poem = sum([x.strip().split() for x in f if not x[:3].isdigit() and x], [])  # 构建一维列表
    [print(line) for line in poem if word in line if len(line) <= 7]

# def get_poem(num):
#     """编号连续时，输入诗的序号，切片返回该诗语句的字符串"""
#     start = poem_str.index(num) + 3
#     if num != '320':
#         end = poem_str.index(str(int(num) + 1)) - 2  # 以下一首诗的序号出现位置为右边界
#     else:  # 序号320时，后面没有新的序号，以诗的长度为右边界
#         end = len(poem_str)
#     return poem_str[start:end]
def get_poem(num):
    """输入诗的序号，切片返回该诗语句的字符串，要求可处理不连续编号"""
    start = poem_str.index(num) + 3  # 编号连续时，可直接索引下一个编号为结束位置
    poem = poem_str[start:]
    if num != '320':
        for i in range(len(poem)):
            if poem[i:i+3].isdigit():
                end = start + i
                break
    else:  # 序号320时，后面没有新的序号，以诗的长度为右边界
        end = len(poem_str)
    return poem_str[start: end]


def judge(choice):
    if choice == '作者':
        n = int(input())
        for i in author_dic(n):
            print(i[0], i[1])
    elif choice == '人物':
        n = int(input())
        for i in person_dic(n):
            print(i[0], i[1])
    elif choice.isdigit() and choice <= '320' and len(choice) == 3:
        print(get_poem(choice))
    elif choice == '唐诗':
        print(count_poems())
    elif choice == '飞花':
        word = input()
        flying_flower(word)
    else:
        print('输入错误')


if __name__ == '__main__':
    poem_str = read_file()
    operation = input()
    judge(operation)