Abstract
- https://selenium-python-zh.readthedocs.io/en/latest/">Selenium with Python中文翻译文档:https://selenium-python-zh.readthedocs.io/en/latest/
- http://www.testclass.net/selenium_python/">测试教程网上的Selenium教程：http://www.testclass.net/selenium_python/
- https://github.com/xuyichenmo/selenium-document">Github上的案例文档：https://github.com/xuyichenmo/selenium-document
- https://www.yuque.com/jhongtao/zr9a1x/fo1">当然还有我的Selenium实战啦：https://www.yuque.com/jhongtao/zr9a1x/fo1
Selenium和PhantomJS分手了,怎么办！
- selenium版本降级
- 使用无界面浏览器
  - Selenium+Headless Firefox
  - Selenium+Headless Chrome
常用库
创建浏览器对象driver
driver对象常用方法和属性
首先要引入Keys包
模你Ctrl+A全选文本输入框的内容
模你Ctrl+X剪切操作
模你Ctrl+Enter操作
driver对象的方法和属性详情
页面的前进和后退
获取页面Cookies
页面等待
参考文献

Abstract

Selenium with Python中文翻译文档:https://selenium-python-zh.readthedocs.io/en/latest/

测试教程网上的Selenium教程：http://www.testclass.net/selenium_python/

Github上的案例文档：https://github.com/xuyichenmo/selenium-document

当然还有我的Selenium实战啦：https://www.yuque.com/jhongtao/zr9a1x/fo1

Selenium和PhantomJS分手了,怎么办！

selenium版本降级

通过pip show selenium显示，默认安装版本为3.8.1。
将其卸载pip uninstall selenium，重新安装并指定版本号pip install selenium==2.48.0。
再次运行，发现没有报错，搞定！

使用无界面浏览器

Selenium+Headless Firefox

Selenium+Headless Firefox和Selenium+Firefox，区别就是实例option的时候设置-headless参数。

前提条件：
- 本地安装Firefox浏览器
- 本地需要geckodriver驱动器文件，如果不配置环境变量的话，需要手动指定executable_path参数。
示例代码：

from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
def main():
    options = Options()
    options.add_argument('-headless')
    # 我把geckodriver.exe文件放在了“E:\ProgramFiles\firefox\geckodriver”文件夹下面，根据实际情况修改executable_path参数值
    driver = Firefox(executable_path='E:/ProgramFiles/firefox/geckodriver/geckodriver.exe', firefox_options=options)
    driver.get("https://www.baidu.com")
    print(driver.page_source)
    driver.close()
if __name__ == '__main__':
    main()

Selenium+Headless Chrome

与Firefox类似，双手奉上。

前提条件：
- 本地安装Chrome浏览器
- 本地需要chromedriver驱动器文件(需要翻墙，如果不能翻墙，建议用Firefox)，如果不配置环境变量的话，需要手动指定executable_path参数。
- Chromedrive国内镜像：http://npm.taobao.org/mirrors/chromedriver/
示例：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def main():
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path='./chromedriver', chrome_options=chrome_options)
    driver.get("https://www.baidu.com")
    print(driver.page_source)
    driver.close()
if __name__ == '__main__':
    main()

常用库

from selenium import webdriver  # 导入webdriver模块
from selenium.webdriver import ActionChains  # 导入动作链类，动作链可以储存鼠标的动作，并一起执行
from selenium.webdriver.common.key import Key  # 键盘操作使用的是Keys类,一般配合send_keys使用
from selenium.webdriver.support.select import Select  # 下拉框的操作都交由Select类进行处理
from selenium.webdriver.common.by import By        # 获取元素
from selenium.webdriver.support.ui import WebDriverWait        # 显示等待
from selenium.webdriver.support import expected_conditions as EC     # 显示等待使用的类

创建浏览器对象driver

from selenium.webdriver import Firefox    # 从selenium中import firefox
from selenium.webdriver.firefox.options import Options    #导入 firefox.options
options = Options() # 创建Options对象
options.add_argument('-headless')   # 设置浏览的方式为无界面方式，这样可以加快爬虫速度
# 我把geckodriver.exe文件放在了E:\ProgramFiles\firefox\geckodriver，根据实际情况修改executable_path参数值
driver = Firefox(executable_path=in_k.executable_path, firefox_options=options) # 创建driver浏览器对象

driver对象常用方法和属性

driver.get(url)获取网页

driver.get(url)    #参数url为需要爬取的网页地址

driver.page_source获取网页源代码

printf(driver.page_source)    # 打印网页源代码

获取标签中的文本内容

driver.find_element_by_id("id_name").text    # 通过网页元素的id获取标签中的文本内容

driver.title获取网页的标题

print(driver.title)    #打印网页的标题

driver.save_screenshout(“img_name.img”)保存当前网页为图片

driver.save_screenshout("img_name.png")    # 将当前网页的页面效果保存为png，img_name.png为图片文件的名称

button.click()模你鼠标单击事件

button = driver.find_element_by_id("id_name")    # 获取id为id_name的按钮
button.click()    # 模你鼠标单击事件

input.send_keys(u “keyword”)给输入框添加内容

input = driver.find_element_by_id("id_name")    # 获取页面中id为in_name的元素
keyword = "百度”    # 设置需要输入的内容
input.send_keys(u keyword)    # 给input输入框添加输入内容keyword

element.send(Keys.value, ‘letter’)模你键盘操作

send_keys(Keys.BACK_SPACE) 删除键（BackSpace）
send_keys(Keys.SPACE) 空格键(Space)
send_keys(Keys.TAB) 制表键(Tab)
send_keys(Keys.ESCAPE) 回退键（Esc）
send_keys(Keys.ENTER) 回车键（Enter）
send_keys(Keys.CONTROL,‘a’) 全选（Ctrl+A）
send_keys(Keys.CONTROL,‘c’) 复制（Ctrl+C）
send_keys(Keys.CONTROL,‘x’) 剪切（Ctrl+X）
send_keys(Keys.CONTROL,‘v’) 粘贴（Ctrl+V） ```python
首先要引入Keys包
from selenium.webdriver.common.keys import Keys

模你Ctrl+A全选文本输入框的内容

input = driver.find_element_by_id(“id_name”) # 获取页面中id为in_name的元素 input.send_keys(Keys.CONTROL,’a’) # Ctrl+A操作

模你Ctrl+X剪切操作

input.send_keys(Keys.CONTROL,’x’) #Ctrl+X操作

模你Ctrl+Enter操作

keyword = “百度” # 设置需要输入的内容 input.send_keys(u keyword) # 给input输入框添加输入内容keyword input.send_keys(Keys.RETURN) # Keys.RETURN 模你Enter操作

<a name="xtREJ"></a>
## input.clear()清除输入框内容
```python
input = driver.find_element_by_id("id_name")    # 获取页面中id为in_name的元素
input.clear()    # 清除输入框内容

driver.get_cookies()获取当前页面的Cookie

print(driver.get_cookies())    #获取当前页面的Cooke

driver.current_url获取当前页面的网址

print(driver.current_url)    # 打印当前网页的地址

driver.close()关闭当前页面

# 关闭当前页面，当获得页面之后，需要关闭页面，以减少内存开销
# 如果当前浏览器只打开了一个页面，会同时关闭到浏览器，也就是释放掉driver对象
driver.close()

driver.quit()关闭浏览器

driver.quit()    #当浏览器使用完毕后要记得关闭浏览器

driver对象的方法和属性详情

Element元素获取

获取方式的两种形式

1.直接通过元素的属性值获取

2.通过By的方式获取：必须导入from selenium.webdriver.common.by import By

# 通过id获取页面元素
<div id = "id_name"></div>
div = driver.find_element_by_id("id_name")
#By的方式
from  selenium.webdriver.common.by import By
div = driver.find_element(by = By.ID,value = "id_name")    #方式一
div = driver.find_element(By.ID,"id_name")    #形式二
# 通过name标签获取元素
<input name = "name" type = "text"/>
input = driver.find_element_by_name("name")
# 通过标签名获取元素
<iframe src "#"></iframe>
iframe = driver.find_element_by_tage_name("iframe")
# 通过XPanth来获取页面元素
<input type = "text" name = "example"/>
<INPUT type = "text" name = "other"/>
inputs = driver.find_element_by_xpath("//input")
# 通过链接文本获取页面元素
<a href="#">百度</a>
baidu = drever.find_element_by_link_text("百度")
# 通过部分链接文本获取页面元素
<a href="#">baidu google sogou</a>
baidu = drever.find_element_by_link_text("google")
# 通过css样式名称来获取页面元素
# 类似于使用css的选择器
<div id = "food">
    <span class = "dairy">milk</span>
    <span class = "dairy aged">cheese</span>
</div>
cheese = driver.find_element_by_css_selector("#food span.daiiri.aged")

元素的操作

对元素的相关操作，一般要先获取到元素，再调用相关方法 element = driver.find_element_by_xxx(value)
点击操作　　　　element.click()
清空输入框　　　element.clear()
输入框输入数据　element.send_keys(data)
获取文本内容(既开闭标签之间的内容)　　element.text
获取属性值(获取element元素的value属性的值)　　element.get_attribute(value)

get_attribute()参数说明

get_attribute(‘textContent’)    # 获取图内的"文章管理"文字
get_attribute('innerHTML')        # 获取元素内的全部HTML
get_attribute('outerHTML')        # 获取包含选中元素的HTML：

ActionChains类实现鼠标操作

# 非常重要使用之前必须先引入ActionChains类
from selenium.webdriver import ActionChains
# 鼠标移动到指定的元素
# 1.获取指定的元素
button = driver.find_element_by_id("id_botton")    #获取id为id_name的按钮元素
# 实现鼠标移动到button按钮上
# 传参：第一个参数为driver对象，第二个参数为元素对象
ActionChains(driver).move_to_element(button).perform()    
# 在元素位置单击
# 参数说明：第一个参数为driver对象，第二个参数和第三个参数都是需要点击的元素对象
# 单击操作逻辑：首先需要先获取到元素，然后移动到元素上，最后在该元素上单击
ActionChains(driver).move_to_element(button).click(button).perform()
# 在元素位置单击并保持按住,与单击类似
ActionChains(driver).move_to_element(button).click_and_hold(button).perform()
# 在元素位置双击,与单击类似
ActionChains(driver).move_to_element(button).double_click(button).perform()
# 在元素位置右击,与单击类似
ActionChains(driver).move_to_element(button).context_click(button).perform()
# 将元素A移动到元素B的位置
e_a = driver.find_element_by_id("A")
e_b = driver.find_element_by_id("B")
ActionChains(driver).drag_and_drop(e_a,e_b).perform()

Select类实现表单的填充

<select id = "status" >
    <option value = "0">北京</option>
    <option value = "1">上海</option>
    <option value = "2">深圳</option>
</select>
# 导入Select类
from selenium.webdriver.support.ui import Select
# 操作下拉列表
# 1.获取下拉列表元素
select_element = driver.find_element_by_id("status")
# 2.创建Select对象，传入初始化参数，也就是下拉列表元素
select = Select(select_element)
# 选择下拉列表框的选项
select.select_by_index(0)            # 根据索引选择,index的值从0开始
select.select_by_value("1")          #根据value值选择
select.select_by_visible_text(u "深圳")    # 根据文字内容选择
# 取消选择
select.deselect_all()

switch_to_alert()方法处理弹窗

alert = driver.switch_to_alert()    #获取页面弹窗

浏览器页面切换

# 方法一：
driver.switch_to_window("window_name")    # 参数说明：window_name为窗口的名称
# 方法二：
# 通过使用window_handles()方法来获取每个窗口的操作对象
for handle in driver.window_handles:
    driver.switch_to_window(handle)
# 例
from selenium import webdriver
browser = webdriver.Chrome()
browser.get(“http://xdyc.echehua.com/login/index”)
js = " window.open(‘http://xdyc.echehua.com/login/verify’)" #可以看到是打开新的标签页 不是窗口
browser.execute_script(js)
browser.close() #关掉第一个页面
# 例
from selenium import webdriver
browser = webdriver.Chrome()
browser.get(“http://xdyc.echehua.com/login/index”)
js = " window.open(‘http://xdyc.echehua.com/login/verify’)" #可以看到是打开新的标签页 不是窗口
browser.execute_script(js)
browser.close() #关掉第一个页面

多窗口切换案例

# 多窗口切换
import time
from selenium import webdriver
browser = webdriver.Chrome()
# 在当前浏览器中访问百度
browser.get('https://www.baidu.com')
# 新开一个窗口，通过执行js来新开一个窗口
js = 'window.open("https://www.sogou.com");'
browser.execute_script(js)
# 输出当前窗口句柄（百度）
baidu_handle = browser.current_window_handle
# 获取当前窗口句柄集合（列表类型）
handles = browser.window_handles
print(handles)  # 输出句柄集合
# ['CDwindow-E9B85270B51D42AF7369D81B9AA70FFE',
# 'CDwindow-90004FD79A0F59EE057846B34D0E7327']
# 获取搜狗窗口
sogou_handle = None
for handle in handles:
    if handle != baidu_handle:
        sogou_handle = handle
# 输出当前窗口句柄（搜狗）
print('switch to ', handle)
browser.switch_to.window(sogou_handle)
time.sleep(10)
browser.close() #关闭当前窗口（搜狗）
# 切换回百度窗口
browser.switch_to.window(baidu_handle)
time.sleep(10)
browser.quit()

页面的前进和后退

# forward()实现前进操作
driver.forward()
# back()实现后退操作
driver.back()

获取页面Cookies

# 获取页面Cookie
for cookie in driver.get_cookies():
    prin(cookie['name'])
# 删除Cookies
# 1.根据Cookies名称删除cookie
driver。delete_cookie("cookie_name")    #参数说明：cookie_name为cookie的名称
# 2.删除页面上的所有cookie
driver.delete_all_cookies()

页面等待

隐式页面等待

WebDriverWait类实现显示等待

显示等待指定某个条件，然后设置最长等待时间，如果这个时间结束时还没有找到元素，就会抛出异常

WebDriverWait(driver,timeout,poll_frequency=0.5,ignored_exceptions = None)
# 参数说明
# driver：WebDriver的浏览器驱动程序
# timeout:最长超时时间，默认以秒为单位
# poll_frequency:休眠时间的间隔（步长），默认为0.5s
# ignored_exceptions:超时后的异常信息，默认情况下抛出NoSuchElementException异常
# WebDriverWait对象一般与until()或者until_not()方法配合使用：
    使用这两个函数的时候需要引入：expected_conditions
        from selenium.webdriver.support import expected_conditions as EC
    通过expected_conditions提供的参数程序为until()或者until_not()提供参数
    until(EC.VALUE):调用该方法提供的驱动程序作为一个参数，直到返回值不为False
    until_not(EC.VALUE):调用该方法的驱动程序作为一个参数，直到返回值为False
    驱动程序参数列表：
# 实例
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from  selenium.webdriver.common.by import By
# WebDriverWait库，负责循环等待
from  selenium.webdriver.support.ui import WebDriverWait
# expected_conditions 负责条件出发
from selenium.webdriver.support import expected_conditions as EC
executable_path = "E:/ProgramFiles/firefox/geckodriver/geckodriver.exe"
url = "https://www.baidu.com/"
options = Options()  # 创建Options对象
options.add_argument('-headless')  # 设置浏览的方式为无界面方式，这样可以加快爬虫速度
# 我把geckodriver.exe文件放在了E:\ProgramFiles\firefox\geckodriver，根据实际情况修改executable_path参数值
driver = Firefox(executable_path=executable_path, firefox_options=options)  # 创建driver浏览器对象
driver.get(url)
try:
    # 查找页面输入框 id = 'kw',知道找到了才返回，如果在10s内都没有找到，则抛出异常
    input = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,'kw')))
finally:
    driver.quit()

expected_conditions 驱动程序参数列表

class selenium.webdriver.support.expected_conditions.alert_is_present1

alert是否出现。

class selenium.webdriver.support.expected_conditions.element_located_selection_state_to_be(locator, is_selected)1

定位一个元素检查的它的状态是否和期待的一样。locator是定位器，是(by,path)的元组，is_selected是布尔值。
class selenium.webdriver.support.expected_conditions.element_located_to_be_selected(locator)
定位的元素是否被选择。

class selenium.webdriver.support.expected_conditions.element_selection_state_to_be(element, is_selected)1

定位一个元素检查的它的状态是否和期待的一样。element是网页元素，is_selected是布尔值。

class selenium.webdriver.support.expected_conditions.element_to_be_clickable(locator)1

元素是否是可见的或者是否是有效的，比如可以点击。locator是定位器。

class selenium.webdriver.support.expected_conditions.element_to_be_selected(element)1

检查元素是否是被选中的。element是网页元素。

class selenium.webdriver.support.expected_conditions.frame_to_be_available_and_switch_to_it(locator)1

判断该frame是否可以switch进去，如果可以的话，返回True并且switch进去，否则返回False。locator是定位器。

class selenium.webdriver.support.expected_conditions.invisibility_of_element_located(locator)1

检查一个元素是否不可见或者在DOM中没出现。locator是定位器。

class selenium.webdriver.support.expected_conditions.new_window_is_opened(current_handles)1

新的窗口是否打开，current_handles是当前窗口的句柄。

class selenium.webdriver.support.expected_conditions.number_of_windows_to_be(num_windows)1

打开的窗口是否满足期待。

class selenium.webdriver.support.expected_conditions.presence_of_all_elements_located(locator)1

判断网页上是否存在至少一个定位的元素。locator是定位器。

class selenium.webdriver.support.expected_conditions.presence_of_element_located(locator)1

判断元素是否存在DOM中，并不代表一定可见。locator是定位器。

class selenium.webdriver.support.expected_conditions.staleness_of(element)1

等待元素不再依附于DOM，即从DOM中删除。如果在DOM中返回False，否则返回True。element是网页元素。

class selenium.webdriver.support.expected_conditions.text_to_be_present_in_element(locator, text_)1

检查元素中的文本内容是否存在指定的内容。locator是定位器。

class selenium.webdriver.support.expected_conditions.text_to_be_present_in_element_value(locator, text_)1

检查元素的value值中是否存在指定的内容。locator是定位器。

class selenium.webdriver.support.expected_conditions.title_contains(title)1

判断当前页面的title是否包含预期字符串。是就返回True，否则False。

class selenium.webdriver.support.expected_conditions.title_is(title)1

判断当前页面的title是否精确等于预期。是就返回True，否则False。

class selenium.webdriver.support.expected_conditions.visibility_of(element)1

检查元素是否在DOM中可见，可见的意思是不仅是显示出来了，而且还有大于0的宽和高。element是网页元素。

class selenium.webdriver.support.expected_conditions.visibility_of_any_elements_located(locator)1

检查网页上至少存在一个可见的指定元素。locator是定位器。

class selenium.webdriver.support.expected_conditions.visibility_of_element_located(locator)1

检查元素是否在DOM中可见，可见的意思是不仅是显示出来了，而且还有大于0的宽和高。locator是定位器。

expected_conditions案例

#coding=utf-8
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
base_url = "http://www.baidu.com"
driver = webdriver.Firefox()
driver.implicitly_wait(5)
'''隐式等待和显示等待都存在时，超时时间取二者中较大的'''
locator = (By.ID,'kw')
driver.get(base_url)
WebDriverWait(driver,10).until(EC.title_is(u"百度一下，你就知道"))
'''判断title,返回布尔值'''
WebDriverWait(driver,10).until(EC.title_contains(u"百度一下"))
'''判断title，返回布尔值'''
WebDriverWait(driver,10).until(EC.presence_of_element_located((By.ID,'kw')))
'''判断某个元素是否被加到了dom树里，并不代表该元素一定可见，如果定位到就返回WebElement'''
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.ID,'su')))
'''判断某个元素是否被添加到了dom里并且可见，可见代表元素可显示且宽和高都大于0'''
WebDriverWait(driver,10).until(EC.visibility_of(driver.find_element(by=By.ID,value='kw')))
'''判断元素是否可见，如果可见就返回这个元素'''
WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.mnav')))
'''判断是否至少有1个元素存在于dom树中，如果定位到就返回列表'''
WebDriverWait(driver,10).until(EC.visibility_of_any_elements_located((By.CSS_SELECTOR,'.mnav')))
'''判断是否至少有一个元素在页面中可见，如果定位到就返回列表'''
WebDriverWait(driver,10).until(EC.text_to_be_present_in_element((By.XPATH,"//*[@id='u1']/a[8]"),u'设置'))
'''判断指定的元素中是否包含了预期的字符串，返回布尔值'''
WebDriverWait(driver,10).until(EC.text_to_be_present_in_element_value((By.CSS_SELECTOR,'#su'),u'百度一下'))
'''判断指定元素的属性值中是否包含了预期的字符串，返回布尔值'''
#WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it(locator))
'''判断该frame是否可以switch进去，如果可以的话，返回True并且switch进去，否则返回False'''
#注意这里并没有一个frame可以切换进去
WebDriverWait(driver,10).until(EC.invisibility_of_element_located((By.CSS_SELECTOR,'#swfEveryCookieWrap')))
'''判断某个元素在是否存在于dom或不可见,如果可见返回False,不可见返回这个元素'''
#注意#swfEveryCookieWrap在此页面中是一个隐藏的元素
WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//*[@id='u1']/a[8]"))).click()
'''判断某个元素中是否可见并且是enable的，代表可点击'''
driver.find_element_by_xpath("//*[@id='wrapper']/div[6]/a[1]").click()
#WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//*[@id='wrapper']/div[6]/a[1]"))).click()
#WebDriverWait(driver,10).until(EC.staleness_of(driver.find_element(By.ID,'su')))
'''等待某个元素从dom树中移除'''
#这里没有找到合适的例子
WebDriverWait(driver,10).until(EC.element_to_be_selected(driver.find_element(By.XPATH,"//*[@id='nr']/option[1]")))
'''判断某个元素是否被选中了,一般用在下拉列表'''
WebDriverWait(driver,10).until(EC.element_selection_state_to_be(driver.find_element(By.XPATH,"//*[@id='nr']/option[1]"),True))
'''判断某个元素的选中状态是否符合预期'''
WebDriverWait(driver,10).until(EC.element_located_selection_state_to_be((By.XPATH,"//*[@id='nr']/option[1]"),True))
'''判断某个元素的选中状态是否符合预期'''
driver.find_element_by_xpath(".//*[@id='gxszButton']/a[1]").click()
instance = WebDriverWait(driver,10).until(EC.alert_is_present())
'''判断页面上是否存在alert,如果有就切换到alert并返回alert的内容'''
print instance.text
instance.accept()
driver.close()

Selenium笔记