JAVA 端爬虫 - 《👨‍🔧 实战笔记》

selenium：自动化测试 JAVA 版
jsop：html 解析

要获取那种异步加载的网页 html 可以使用 selenium 加载网页，然后得到 html 后用 jsop 解析

package com.meshop.crm;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import java.io.IOException;
public class DemoTest {
    public static void main(String[] args) throws IOException {
        WebDriver driver = new ChromeDriver();
        driver.get("https://www.chinamoney.com.cn/chinese/sddshl/");
        // Document doc = Jsoup.connect("https://www.chinamoney.com.cn/chinese/sddshl/").get();
        final Document doc = Jsoup.parse(driver.getPageSource());
        final Elements elementsByAttribute = doc.getElementsByAttribute("data-value=\"USD/CNY\"");
        System.out.println();
    }
}

比如上面这个，但是这个有一个缺点就是需要依赖本地无头浏览器程序，所以还需要下载配置套系统的浏览器程序，上面代码