Day1

爬虫基础

  1. import requests
  2. url = 'https://www.baidu.com/'
  3. headers = {
  4. "Cookie":'dBIDUPSID=BB53A6DDC9F7BA5782320D71FFD89911; PSTM=1644676835; BAIDUID=BB53A6DDC9F7BA579591DB130C95E10E:FG=1; BD_UPN=123253; __yjs_duid=1_e335e53b7dd5b5505fa8e3dc4fca37bc1644676872577; BDSFRCVID_BFESS=dCLOJeC62GzRBsnD3FI-h_5rWkngKp3TH6aow3oSM89su-fJVgadEG0P5x8g0KCM3_9PogKKymOTHuKF_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF_BFESS=tb-eoKPhfI03HJRxM-Laq4kVMMjHKD62aKDs0pO1BhcqEIL4Qn6CLRtHQablBCr3XbTH-bv85hv2hUbSj4Qo5Tte0PFOqRJtHCcZ-J3q5p5nhMtG257JDMPdXHbdqlOy523iXR6vQpnhOpQ3DRoWXPIqbN7P-p5Z5mAqKl0MLPbtbb0xb6_0-nDSHHL8q68f3j; sugstore=1; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_PSSID=35839_34429_35105_31253_35768_34584_35842_35948_35931_35954_35984_35320_26350_35940; H_PS_645EC=d270Q18YwpdaTf3NmICutgnE5dhkspXYvObRVENjvJyRJiWZ%2FS%2FbR45zhQ0; BA_HECTOR=840l8405a404a1a52t1h1u2820r',
  5. "User-Agent":'''Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.109 Safari/537.36''',
  6. #若字符串中包含单引号和双引号则可用三引号括起字符串
  7. } #headers 后为字典类型,键值对儿形式存放
  8. response = requests.get(url,headers=headers)
  9. response.encoding = response.apparent_encoding#设置编码格式
  10. print(response) #返回状态码
  11. print(response.text)

根据页面元素信息伪造身份,找到request_headers中的cookie以及user_agent并放入代码中
image.png

python基础

查看系统保留关键字的方法:

  1. import keyword
  2. print(keyword.kwlist)

control+鼠标左键 点击包名,可以查看包的内容

数据类型:
number类型:整形,浮点型,布尔型,复数
字符串声明:单引号,双引号,三引号(三单三双均可)
单引号对单个字符进行转义输出,元字符,通过添加r/R输出
type( )查看数据类型
set{ }定义集合类型,根据集合的唯一性,使用print打印出时会自动去重

Day2

debug断点调试
image.png
单击行标号后进行下断点操作,右键代码进行调试,下方的箭头可以单步执行代码并显示出当前代码执行的细则

  1. a,b,c = 1,2,3 #多项赋值
  2. print(a,b,c)
  3. a,b = b,a #交换值
  4. print(a,b,c)

执行结果为:1 2 3
2 1 3

0-100求和:

  1. sum = 0
  2. i = 0
  3. while i<=100:
  4. sum+=i
  5. i+=1
  6. print(sum)

0-100求所有偶数的和:

  1. sum = 0
  2. i = 0
  3. while i<=100:
  4. if i%2 == 0:
  5. sum += i
  6. else:
  7. pass
  8. i+=1
  9. print(sum)

输出1-5行递增数量的*:

  1. #方法1
  2. i = 0
  3. while i < 5:
  4. i+=1
  5. print(i*'*',end="")
  6. print("")
  1. #方法2
  2. x = 1
  3. while x<=5:
  4. y = 1
  5. while x>=y:
  6. print('*',end="")
  7. y+=1
  8. print("")
  9. x+=1

for循环遍历list列表与字典:

  1. #遍历输出列表
  2. list = ["alice","cindy","witherc"]
  3. for i in list:
  4. print(i)
  5. #遍历字典并输出键值对
  6. list2 = {
  7. "a":1,
  8. "b":2
  9. }
  10. for i in list2:
  11. print(i,list2[i])

range格式:

  1. #格式range(start,stop,step)
  2. for i in range(10): # 0-9 默认从0
  3. print(i)
  4. for i in range(2,10): #2-9 从2到10减1
  5. print(i)
  6. for i in range(2,10,3): #3为步长 即2 5 8
  7. print(i)

循环输出并跳过33,55,77:

  1. i = 0
  2. while i <= 100:
  3. if i == 33 or i == 55 or i == 77:
  4. i += 1
  5. else:
  6. print(i," ", end="")
  7. i += 1

自定义函数的定义与调用:

  1. #无默认值型
  2. def func(num1,num2): #自定义函数包含两个参数,计算两数之和
  3. print(num1+num2)
  4. func(1,2) #调用参数时给函数传参
  5. #若使用默认值则为把num1,num2直接改为固定的值
  6. #若在调用默认值型函数时输入了新值,则会覆盖默认值

传递参数时,若不加关键字,则可能会出现错误,所以传参时可以带关键字传参
即:func(num1=1,num2=2)

关键字收集参数与非关键字收集参数:
定义非关键字参数: *arg
定义非关键字参数: **arg

  1. def func(num1,num2): #自定义函数包含两个参数,计算两数之和
  2. print(num1+num2)
  3. func(num1=1,num2=2) #调用参数时给函数传参
  4. #非关键字收集参数,不用加关键字传参
  5. def func2(*arg):
  6. for i in arg:
  7. print(i)
  8. func2(1,2,3,4,5,6)
  9. #关键字收集参数,传参时使用关键字穿惨
  10. def func3(**arg):
  11. print(arg)
  12. func3(a="1",b="2",c="3") #打印出为字典格式

函数的返回值:

  1. def func(num1,num2):
  2. num3 = num1+num2
  3. return num3
  4. result = func(num1=2,num2=3)
  5. print(result)