‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪描述

俗话说,“熟读唐诗三百首,不会吟诗也会吟”,请分析附件的唐诗300首文本文件。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
完成下列功能:(部分功能需要使用jieba第三方库)‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
1. 统计每首诗歌的作者,如果第一行输入‘作者’,第二行则输入一个整数n,输出出现最多的作者前n个,每行输出一个名字和出现次数,以空格间隔,程序结束‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
2. 统计出现的人名,如果第一行输入“人物”,第二行则输入一个整数n,输出出现最多的人名前n个,每行输出一个名字和对应出现次数,以空格间隔,程序结束‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
注:有的诗人在诗名或诗句中用到了别的诗人的名字。如“梦李白二首之一”。因此第1,2项目之间的数据可能有所差异。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
3. 如果输入某个字符串编号,范围和格式在“010”-“320”之间(测试用例保证编号存在),输出对应该编号的诗句。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
输出格式:去掉首行诗歌编号,其余格式与文件中诗歌显示格式相同。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
4. 如果输入‘唐诗’,输出文件中的诗词数量,程序结束‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
5. 飞花令,如果第一行输入’飞花’,则可以在第二行输入s中文字符(长度为1),然后按照在文件中出现的顺序,输出唐诗300首文件包含该中文字符的诗句(长度不超过7的诗句),每行一句。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
6. 如果非以上输入,输出“输入错误”,程序结束。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬
本题需要用到中文分词,导入jieba库,安装方法是:

  1. pip install jieba

导入中文分词库:

  1. import jieba

读文件到字符串:

  1. def readfile(filename): # 读文件到字符串
  2. with open(filename, 'r', encoding='utf-8') as file:
  3. txt = file.read() # 读取文件全部内容为一个字符串
  4. return txt # 返回字符串

获得作者列表

  1. def author(txt):
  2. ls = txt.strip().split() # 根据空格将字符串切分为列表
  3. authorname = [name[3:] for name in ls if name[0].isdigit()] # 得到作者列表
  4. return authorname # 返回作者列表

统计作者出现的次数

  1. def authorcount(authorname,num): # 通过键值对的形式存储作者及其出现的次数
  2. authorcounts = {} # 空字典
  3. for name in authorname: # 遍历作者列表
  4. authorcounts[name]=authorcounts.get(name,0)+1 # 统计每位作者出现的次数
  5. authorcounts = sorted(authorcounts.items(),key=lambda item:item[1],reverse=True)[:num] # 排序,取前num位
  6. for name in authorcounts: # 输出排名前num个作者的名字及出现次数
  7. print(name[0],name[1])

统计人物出现的次数

  1. def name(txt,num,authorname): # 统计作者名字出现次数,包括诗句中出现的次数
  2. words = jieba.lcut(txt) # 使用精确模式对文本进行分词
  3. namecounts = {} # 通过键值对的形式存储作者及其出现的次数
  4. for name in words: # 遍历切分为词的列表
  5. if name in authorname: # 若当前词为作者为之一,统计其次数
  6. namecounts[name] = namecounts.get(name, 0) + 1
  7. namecounts = sorted(namecounts.items(), key=lambda item: item[1], reverse=True)[:num]
  8. for name in namecounts: # 输出排名前num个作者的名字及出现次数
  9. print(name[0], name[1])

统计诗的数量,因为每首诗都有作者,所以只计算作者列表的长度即可:

  1. def countPoem(authorname):
  2. return len(authorname) # 作者列表的长度即为诗的数量

输出某编号的诗的内容:

  1. def poemout(txt,poemid): # 输入一个编号,输出其对应的诗句
  2. start = txt.index(poemid) # 定位编号出现的位置,以那里为起点
  3. stop = start + 3 # 终点从起点后3个字符开始,跳过编号
  4. for c in txt[start+3:]: # 遍历每个字符
  5. if c.isdigit() == False: # 若不是数字,则是当前诗的一部分
  6. stop = stop + 1 # 终点向后移动一个字符
  7. else: # 若遇到数字,结束遍历
  8. break
  9. print(txt[start + 3:stop]) # 输出起点到中间之间的字符串

飞花令,通过遍历文件的方法将诗句加入到列表中,加入时,遇到数字开头的行直接忽略掉。将列表中的元素用空格连接再重新用空格切分,可以将一句诗中上下半句切分开。遍历每句诗,若目标字在诗中存在且诗句长度小于等于7时,输出该诗句。

  1. def flyingflower(word,filename):
  2. with open(filename, 'r', encoding='utf-8') as file:
  3. ls = [line.strip() for line in file if line[0].isdigit() == False] # 将非标题行加入列表
  4. ls = ' '.join(ls).split() # 连接后重新切分
  5. for poem in ls: # 遍历列表中的诗句
  6. if word in poem and len(poem) <=7: # 若目标字在诗中存在且诗句长度小于等于7时
  7. print(poem) # 输出该诗句

完整参考代码:

  1. import jieba
  2. def readfile(filename):
  3. with open(filename, 'r', encoding='utf-8') as file:
  4. txt = file.read()
  5. return txt
  6. def author(txt):
  7. ls = txt.strip().split()
  8. authorname = [name[3:] for name in ls if name[0].isdigit()] # 得到作者列表
  9. return authorname
  10. def authorcount(authorname,num):
  11. authorcounts = {} # 通过键值对的形式存储作者及其出现的次数
  12. for name in authorname:
  13. authorcounts[name]=authorcounts.get(name,0)+1
  14. authorcounts = sorted(authorcounts.items(),key=lambda item:item[1],reverse=True)[:num]
  15. for name in authorcounts:
  16. print(name[0],name[1])
  17. def name(txt,num,authorname):
  18. words = jieba.lcut(txt) # 使用精确模式对文本进行分词
  19. namecounts = {} # 通过键值对的形式存储作者及其出现的次数
  20. for name in words:
  21. if name in authorname:
  22. namecounts[name] = namecounts.get(name, 0) + 1
  23. namecounts = sorted(namecounts.items(), key=lambda item: item[1], reverse=True)[:num]
  24. for name in namecounts:
  25. print(name[0], name[1])
  26. def countPoem(authorname):
  27. return len(authorname)
  28. def poemout(txt,poemid): # 输入一个编号,输出其对应的诗句
  29. start = txt.index(poemid)
  30. stop = start + 3
  31. for c in txt[start+3:]:
  32. if c.isdigit() == False:
  33. stop = stop + 1
  34. else:
  35. break
  36. print(txt[start + 3:stop])
  37. def flyingflower(word,filename):
  38. with open(filename, 'r', encoding='utf-8') as file:
  39. ls = [line.strip() for line in file if line[0].isdigit() == False]
  40. ls = ' '.join(ls).split()
  41. for poem in ls:
  42. if word in poem and len(poem) <=7:
  43. print(poem)
  44. if __name__ == '__main__':
  45. filename = 'poem.txt'
  46. txt = readfile(filename) # 文件内容读成字符串
  47. authorname = author(txt)
  48. choice = input()
  49. if choice =='作者':
  50. n = int(input())
  51. authorcount(authorname,n)
  52. elif choice =='人物':
  53. num = int(input())
  54. name(txt, num, authorname)
  55. elif choice.isdigit() == True and len(choice) == 3 and 10 <= int(choice) <= 320:
  56. poemout(txt, choice)
  57. elif choice == '飞花':
  58. word = input()
  59. if len(word) ==1:
  60. flyingflower(word,filename)
  61. elif choice == '唐诗':
  62. print(countPoem(authorname))
  63. else:
  64. print('输入错误')

2022年重写
poem.txt

def read_file():
    """读唐诗文件,返回字符串"""
    with open('poem.txt', 'r', encoding='UTF-8') as f:
        return f.read()


def author_dic(num):
    """统计每个作者诗的数量,以作者为键,诗的数量为值构建字典,返回按值降序排序列表"""
    author_ls = [x[3:] for x in poem_str.split() if x[:3].isdigit()]
    author_set = sorted(set(author_ls), key=author_ls.index)  # 去掉重复作者名,并按出现顺序排序
    author_dict = {x: author_ls.count(x) for x in author_set}  # 构建作者与作品数量的字典
    author = sorted(author_dict.items(), key=lambda x: x[1], reverse=True)[:num]
    return author


def person_dic(num):
    """统计每个作者出现频率,以作者为键,作者出现次数为值构建字典,返回按值降序排序列表"""
    author_ls = [x[3:] for x in poem_str.split() if x[:3].isdigit()]
    author_set = sorted(set(author_ls), key=author_ls.index)  # 去掉重复作者名,并按出现顺序排序
    person_dict = {x: poem_str.count(x) for x in author_set}  # 构建作者与作者名出现数量的字典
    person = sorted(person_dict.items(), key=lambda x: x[1], reverse=True)[:num]
    return person


def count_poems():
    """返回唐诗数量,每遇到3个数字开头为一首诗"""
    return len([x[:3] for x in poem_str.split('\n') if x[:3].isdigit()])


def flying_flower(word):
    """接收字符串为参数,输出包含该字符串且长度小于7的诗句"""
    with open('poem.txt', 'r', encoding='UTF-8') as f:
        poem = sum([x.strip().split() for x in f if not x[:3].isdigit() and x], [])  # 构建一维列表
    [print(line) for line in poem if word in line if len(line) <= 7]

# def get_poem(num):
#     """编号连续时,输入诗的序号,切片返回该诗语句的字符串"""
#     start = poem_str.index(num) + 3
#     if num != '320':
#         end = poem_str.index(str(int(num) + 1)) - 2  # 以下一首诗的序号出现位置为右边界
#     else:  # 序号320时,后面没有新的序号,以诗的长度为右边界
#         end = len(poem_str)
#     return poem_str[start:end]
def get_poem(num):
    """输入诗的序号,切片返回该诗语句的字符串,要求可处理不连续编号"""
    start = poem_str.index(num) + 3  # 编号连续时,可直接索引下一个编号为结束位置
    poem = poem_str[start:]
    if num != '320':
        for i in range(len(poem)):
            if poem[i:i+3].isdigit():
                end = start + i
                break
    else:  # 序号320时,后面没有新的序号,以诗的长度为右边界
        end = len(poem_str)
    return poem_str[start: end]


def judge(choice):
    if choice == '作者':
        n = int(input())
        for i in author_dic(n):
            print(i[0], i[1])
    elif choice == '人物':
        n = int(input())
        for i in person_dic(n):
            print(i[0], i[1])
    elif choice.isdigit() and choice <= '320' and len(choice) == 3:
        print(get_poem(choice))
    elif choice == '唐诗':
        print(count_poems())
    elif choice == '飞花':
        word = input()
        flying_flower(word)
    else:
        print('输入错误')


if __name__ == '__main__':
    poem_str = read_file()
    operation = input()
    judge(operation)