4.1、概述
通过上述3章的简单入门,可以大概了解 es 的入门使用,至此做一个小实战项目,模仿 “京东商品搜索”。
实战要求:
- 能准备、分词筛选出商品信息(名称,价格,图片等)
- 搜索词高亮
4.2、项目介绍
因为要模仿京东商品搜索,所以就需要通过爬虫从京东上拉部分数据下来,并且保存到es中,便于后续项目的搜索。并且因为还需要用到前端交互,所以引入基本的 vue 操作。以下是项目的大致步骤
- 爬取京东商品数据,并解析(jsoup 能做基础爬虫操作)
- 将数据录入 ES 中(数据批量存入ES)
- 编写暴露ES 查询接口(全量或者分页)
- 编写前端代码
前后端联调(引入 Axios,便于前端 Ajax 网络请求)
4.3、项目编写
4.3.1、项目环境
软件说明:
Java:1.8
- Elastic Search:7.16.3
- kibana:7.16.3
- elasticsearch-head-master
- node
- vue
- axios
4.3.2、后端
4.3.2.1、Maven
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.es</groupId><artifactId>springboot-es-jd</artifactId><version>0.0.1-SNAPSHOT</version><name>springboot-es-jd</name><description>springboot-es-jd</description><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.6.3</version><relativePath/></parent><properties><java.version>1.8</java.version><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding><spring-boot.version>2.4.1</spring-boot.version><fast-json.version>1.2.72</fast-json.version><jsoup.version>1.14.3</jsoup.version><fast-json.version>1.2.79</fast-json.version></properties><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-thymeleaf</artifactId></dependency><!-- web 相关依赖--><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><!-- java 爬虫--><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>${jsoup.version}</version></dependency><!-- es 相关依赖--><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId></dependency><!-- json --><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>${fast-json.version}</version></dependency><!-- lombok--><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><!-- 常规基础依赖--><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-devtools</artifactId><optional>true</optional></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId></dependency></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><configuration><excludes><exclude><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></exclude></excludes></configuration></plugin></plugins></build></project>
4.3.2.2、配置文件
修改配置文件,去除 thymeleaf 的页面缓存。还有一个ElasticSearchConfig ES 配置类,但是内容和 3_SpringBoot集成ES 里面内容一致,就不再重复
spring:thymeleaf:cache: false
4.3.2.3、爬取京东商品数据
通过 jsoup 工具进行简易的网页数据爬取,如果需要进一步加强,可先爬取解析页面的分页信息,再通过请求地址 配置分页数据循环爬取数据,es 也支持 from 和 size 进行后续数据分页查询
package com.es.jsoup;import com.es.data.ESDataOperator;import com.es.model.Goods;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Component;import java.io.IOException;import java.net.URL;import java.util.ArrayList;import java.util.List;/*** 爬取,解析京东数据*/@Componentpublic class JDData {@Autowiredprivate ESDataOperator esDataOperator;static String url = "https://search.jd.com/Search?enc=utf-8&keyword=";/*** 爬取数据** @param keyword* @return*/private Document getJDData(String keyword) {try {return Jsoup.parse(new URL(url + keyword), 1000 * 30);} catch (IOException e) {e.printStackTrace();}return null;}/*** 解析数据** @param keyword* @return*/private List<Goods> analysis(String keyword) throws IOException {Document document = getJDData(keyword);//获取商品列表Element goodsElement = document.getElementById("J_goodsList");//商品一个个项List<Element> goodsLi = goodsElement.getElementsByTag("li");//解析List<Goods> goodsList = new ArrayList<>();for (Element element : goodsLi) {goodsList.add(parse2Goods(element));}return goodsList;}/*** li 节点解析数据** @param element* @return*/private Goods parse2Goods(Element element) {Goods goods = new Goods();//图片Elements imgElements = element.getElementsByTag("img");String img = imgElements.first().attr("data-lazy-img");goods.setImg(img);//名称Elements nameElement = element.select(".p-name");String name = nameElement.tagName("a").text();goods.setName(name);//价格String price = element.select(".p-price").tagName("i").text();goods.setPrice(price);return goods;}public List<Goods> search(String keyword) throws IOException {return esDataOperator.search(keyword);}/*** 组装数据** @param keyword* @return*/public void add2ES(String keyword) throws IOException {//判断库是否存在Boolean exists = esDataOperator.indexExists();if (!exists) {esDataOperator.createIndex();}//不存在则新增List<Goods> goodsList = analysis(keyword);//批量入库esDataOperator.bulkAdd(goodsList);}}
4.3.3.4、商品实体类
@Data@ToStringpublic class Goods {private String name;private String price;private String img;}
4.3.3.5、ES数据操作
package com.es.data;import com.alibaba.fastjson.JSON;import com.es.model.Goods;import org.elasticsearch.action.bulk.BulkRequest;import org.elasticsearch.action.bulk.BulkResponse;import org.elasticsearch.action.index.IndexRequest;import org.elasticsearch.action.search.SearchRequest;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.client.RequestOptions;import org.elasticsearch.client.RestHighLevelClient;import org.elasticsearch.client.indices.CreateIndexRequest;import org.elasticsearch.client.indices.CreateIndexResponse;import org.elasticsearch.client.indices.GetIndexRequest;import org.elasticsearch.common.text.Text;import org.elasticsearch.common.xcontent.XContentType;import org.elasticsearch.index.query.MatchQueryBuilder;import org.elasticsearch.index.query.QueryBuilders;import org.elasticsearch.search.SearchHit;import org.elasticsearch.search.builder.SearchSourceBuilder;import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Component;import java.io.IOException;import java.util.ArrayList;import java.util.List;import java.util.Map;/*** 数据操作*/@Componentpublic class ESDataOperator {@Autowiredprivate RestHighLevelClient restHighLevelClient;private final static String GOODS_INDEX = "goods";/*** 判断是否存在** @return* @throws IOException*/public Boolean indexExists() throws IOException {GetIndexRequest indexRequest = new GetIndexRequest(GOODS_INDEX);return restHighLevelClient.indices().exists(indexRequest, RequestOptions.DEFAULT);}/*** 创建索引** @return* @throws IOException*/public Boolean createIndex() throws IOException {CreateIndexRequest indexRequest = new CreateIndexRequest(GOODS_INDEX);CreateIndexResponse response = restHighLevelClient.indices().create(indexRequest, RequestOptions.DEFAULT);return response.isAcknowledged();}/*** 批量追加 es 数据** @param goodsList* @return*/public boolean bulkAdd(List<Goods> goodsList) throws IOException {BulkRequest bulkRequest = new BulkRequest();for (Goods goods : goodsList) {IndexRequest indexRequest = new IndexRequest(GOODS_INDEX);indexRequest.source(JSON.toJSONString(goods), XContentType.JSON);bulkRequest.add(indexRequest);}BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);return !response.hasFailures();}public List<Goods> search(String keyword) throws IOException {SearchRequest searchRequest = new SearchRequest(GOODS_INDEX);SearchSourceBuilder requestBuilder = new SearchSourceBuilder();MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("name", keyword);//高亮HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.preTags("<span style='color:red'>");highlightBuilder.postTags("</span>");highlightBuilder.field("name");//设置查询requestBuilder.highlighter(highlightBuilder);requestBuilder.query(queryBuilder);searchRequest.source(requestBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);SearchHit[] searchHits = searchResponse.getHits().getHits();List<Goods> goodsList = new ArrayList<>();//常规// for (int i = 0; i < searchHits.length; i++) {// Goods goods = new Goods();// Map<String, Object> sourceMap = searchHits[i].getSourceAsMap();// goods.setName((String) sourceMap.get("name"));// goods.setImg((String) sourceMap.get("img"));// goods.setPrice((String) sourceMap.get("price"));// goodsList.add(goods);// }//设置替换高亮for (int i = 0; i < searchHits.length; i++) {Goods goods = new Goods();Map<String, Object> sourceMap = searchHits[i].getSourceAsMap();goods.setName(highlighter(searchHits[i], "name"));goods.setImg((String) sourceMap.get("img"));goods.setPrice((String) sourceMap.get("price"));goodsList.add(goods);}return goodsList;}/*** 高亮转换** @param searchHit* @param fileName* @return*/private String highlighter(SearchHit searchHit, String fileName) {Map<String, HighlightField> highlighterMap = searchHit.getHighlightFields();HighlightField highlightField = highlighterMap.get("name");StringBuffer sb = new StringBuffer();if (highlightField != null) {Text[] texts = highlightField.fragments();for (Text text : texts) {sb.append(text.toString());}}return sb.toString();}}
4.3.3.6、编写后端业务(新增、查询)
package com.es.service;import com.es.jsoup.JDData;import com.es.model.Goods;import org.springframework.beans.factory.annotation.Autowired;import org.springframework.stereotype.Service;import java.io.IOException;import java.util.List;@Servicepublic class SearchService {@Autowiredprivate JDData jdData;public List<Goods> search(String keyword) throws IOException {return jdData.search(keyword);}public void add2ES(String keyword) throws IOException {jdData.add2ES(keyword);}}
4.3.3.7、后端接口
提供两个接口
- add2ES:根据搜索词,从京东爬取第一页商品部分数据
- search:根据搜索词,搜索本地 ES 服务数据 ```java package com.es.controller;
import com.es.model.Goods; import com.es.service.SearchService; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.*;
import java.io.IOException; import java.util.List;
@RestController public class SearchController {
@Autowiredprivate SearchService service;@GetMapping("/search/{keyword}")public List<Goods> search(@PathVariable("keyword") String keyword) throws IOException {return service.search(keyword);}@GetMapping("/{keyword}")public void add2ES(@PathVariable("keyword") String keyword) throws IOException {service.add2ES(keyword);}
}
<a name="ceJUb"></a>### 4.3.3、前端前端通过 vue 进行快速页面开发,并借助 axios 进行快速 ajax 请求发送,文件结构如下:<br /><a name="psaAa"></a>#### 4.3.3.1、css```css/*app*/#app {width: 100%;height: 100%;}/*header*/.header {background: #e3e4e5;color: #999;height: 23px;padding-top: 5px;font-size: 6px;}.header .left {padding-left: 50px;float: left;}.header .left span {margin-right: 20px;font-size: 14px;}.header .right {float: right;padding-right: 50px;}.header .right span {font-size: 14px;margin-right: 15px;}/*search*/.search {display: flex;padding-top: 10px;padding-bottom: 10px;}.search > .logo {width: 20%;margin: 20px 30px;}.logo img {width: 120px;}.search-content {display: grid;padding-top: 20px;}.search-component {display: flex;}.search-content .search-input input {height: 28px;width: 626px;border: solid red;}.search-content .search-btn button {height: 35px;width: 80px;border: solid red;background-color: red;color: white;font-size: 16px;}.host-keyword span {color: #999;padding-right: 10px;margin-right: 10px;border-right: solid #e3e4e5;font-size: 12px;}.line {height: 2px;background-color: red;}/*body*/.body {display: flex;padding-top: 20px;flex-wrap: wrap;padding-left: 50px;padding-right: 50px;}.body .goods-item {width: 17%;padding: 15px;border: 2px solid rgba(100, 100, 100, 0);}.goods-item .book-price {color: red;}.goods-item .book-name {font-size: 12px;color: #666;padding: 2px 5px;}.body .goods-item:hover {border: solid 2px #e3e4e5;color: red;}
4.3.3.2 html
页面需要放在 template 中,防止不被识别
<!DOCTYPE html><html lang="en" xmlns="http://www.w3.org/1999/html"><head><meta charset="UTF-8"><title>Title</title><script th:src="@{/js/axios.min.js}"></script><script th:src="@{/js/vue.js}"></script><link type="text/css" rel="styleSheet" th:href="@{/css/index.css}"/><!-- <script src="../static/js/vue.js"></script>--><!-- <script src="../static/js/axios.min.js"></script>--><!-- <link rel="stylesheet" href="../static/css/index.css">--></head><body><div id="app"><div class="header"><div class="left"><span>京东首页</span><span>福建</span></div><div class="right"><span>你好 请登录</span><span>免费注册</span><span>我的订单</span><span>我的京东</span><span>京东会员</span><span>企业采购</span><span>客户服务</span><span>网站导航</span></div></div><div class="search"><div class="logo"><img th:src="@{/img/logo.png}"><!-- <img src="../static/img/logo.png">--></div><div class="search-content"><div class="search-component"><div class="search-input"><input v-model="keyword"/></div><div class="search-btn"><button @click="search">搜索</button></div></div><div class="host-keyword"><span v-for="key in hostKeyword">{{key}}</span></div></div></div><div class="line"></div><div class="body"><div v-for="item in result" class="goods-item"><div class="book-img"><img :src="item.img"></div><div class="book-price"><div class="price-text">{{item.price}}</div></div><div class="book-name"><div class="price-text" v-html="item.name"></div></div></div></div></div></body><script>new Vue({el: "#app",data: {keyword: "",hostKeyword: ["python", "spring", "mysql", "spring boot", "java核心技术", "java web", "c++", "vue", "linux", "java从入门到精通"],result: []},methods: {search: function () {let url = "http://localhost:8080/search/" + this.keyword;axios.get(url).then(res => {console.log(res.data)this.result = res.data})}}})</script></html>
4.3.4、测试
- 启动 es
- 启动 springboot-es-jd 项目
- 先通过浏览器请求 http://localhost:8080/java 进行从京东抓取部分数据入es,其中 java 可以是任何关键字
浏览器访问:http://localhost:8080/,在搜索框中输入搜索关键字,效果如下
