目标

爬取当前时间段豆瓣电影中正在上映的电影的相关信息,如电影名、导演、演员表、上映时间、制作方等信息,然后再通过字典的方式,将其保存在本地文件当中,以便我们查询;

Code

  1. #!/usr/bin/python3
  2. # -*- coding:utf-8 -*-
  3. # @Time : 2018-11-15 8:24
  4. # @Author : Manu
  5. # @Site :
  6. # @File : doubanMovie.py
  7. # @Software: PyCharm
  8. import pprint
  9. import requests
  10. from lxml import etree
  11. '''
  12. 爬取豆瓣电影上当前正在上映的电影信息,并保存到文件中
  13. '''
  14. HEADERS = {
  15. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
  16. 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36',
  17. 'Refer':'https://movie.douban.com/'
  18. }
  19. url = 'https://movie.douban.com/cinema/nowplaying/'
  20. response = requests.get(url, headers=HEADERS)
  21. print(response.text)
  22. text = response.text
  23. html = etree.HTML(text)
  24. ul = html.xpath("//ul[@class='lists']")[0]
  25. lis = ul.xpath('./li')
  26. movies = []
  27. for li in lis:
  28. title = li.xpath('@data-title')[0]
  29. score = li.xpath('@data-score')[0]
  30. release = li.xpath('@data-release')[0]
  31. duration = li.xpath('@data-duration')[0]
  32. region = li.xpath('@data-region')[0]
  33. director = li.xpath('@data-director')[0]
  34. actors = li.xpath('@data-actors')[0]
  35. thumbnail = li.xpath('.//img/@src')[0]
  36. movie = {
  37. '电影名':title,
  38. '评分':score,
  39. "上映时间":release,
  40. '片长':duration,
  41. '制片国家':region,
  42. '导演':director,
  43. '演员表':actors,
  44. '海报':thumbnail
  45. }
  46. movies.append(movie)
  47. pprint.pprint(movies)
  48. with open('豆瓣正在上映.txt', 'w', encoding='utf-8') as movie_file:
  49. for movie in movies:
  50. movie_file.write('电影名:' + movie['电影名'] + '\n')
  51. movie_file.write('评分:' + movie['评分'] + '\n')
  52. movie_file.write('上映时间:' + movie['上映时间'] + '\n')
  53. movie_file.write('片长:' + movie['片长'] + '\n')
  54. movie_file.write('制片国家:' + movie['制片国家'] + '\n')
  55. movie_file.write('导演:' + movie['导演'] + '\n')
  56. movie_file.write('演员表:' + movie['演员表'] + '\n')
  57. movie_file.write('海报:' + movie['海报'] + '\n')
  58. movie_file.write('\n')

结果

爬取豆瓣当前上映电影信息 - 图1