4.1、概述

通过上述3章的简单入门,可以大概了解 es 的入门使用,至此做一个小实战项目,模仿 “京东商品搜索”。
实战要求:

  • 能准备、分词筛选出商品信息(名称,价格,图片等)
  • 搜索词高亮

大致的效果如下
image.png

4.2、项目介绍

因为要模仿京东商品搜索,所以就需要通过爬虫从京东上拉部分数据下来,并且保存到es中,便于后续项目的搜索。并且因为还需要用到前端交互,所以引入基本的 vue 操作。以下是项目的大致步骤

  • 爬取京东商品数据,并解析(jsoup 能做基础爬虫操作)
  • 将数据录入 ES 中(数据批量存入ES)
  • 编写暴露ES 查询接口(全量或者分页)
  • 编写前端代码
  • 前后端联调(引入 Axios,便于前端 Ajax 网络请求)

    4.3、项目编写

    4.3.1、项目环境

    软件说明:

  • Java:1.8

  • Elastic Search:7.16.3
  • kibana:7.16.3
  • elasticsearch-head-master
  • node
  • vue
  • axios

image.png

4.3.2、后端

4.3.2.1、Maven

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  3. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  4. <modelVersion>4.0.0</modelVersion>
  5. <groupId>com.es</groupId>
  6. <artifactId>springboot-es-jd</artifactId>
  7. <version>0.0.1-SNAPSHOT</version>
  8. <name>springboot-es-jd</name>
  9. <description>springboot-es-jd</description>
  10. <parent>
  11. <groupId>org.springframework.boot</groupId>
  12. <artifactId>spring-boot-starter-parent</artifactId>
  13. <version>2.6.3</version>
  14. <relativePath/>
  15. </parent>
  16. <properties>
  17. <java.version>1.8</java.version>
  18. <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  19. <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
  20. <spring-boot.version>2.4.1</spring-boot.version>
  21. <fast-json.version>1.2.72</fast-json.version>
  22. <jsoup.version>1.14.3</jsoup.version>
  23. <fast-json.version>1.2.79</fast-json.version>
  24. </properties>
  25. <dependencies>
  26. <dependency>
  27. <groupId>org.springframework.boot</groupId>
  28. <artifactId>spring-boot-starter-thymeleaf</artifactId>
  29. </dependency>
  30. <!-- web 相关依赖-->
  31. <dependency>
  32. <groupId>org.springframework.boot</groupId>
  33. <artifactId>spring-boot-starter-web</artifactId>
  34. </dependency>
  35. <!-- java 爬虫-->
  36. <dependency>
  37. <groupId>org.jsoup</groupId>
  38. <artifactId>jsoup</artifactId>
  39. <version>${jsoup.version}</version>
  40. </dependency>
  41. <!-- es 相关依赖-->
  42. <dependency>
  43. <groupId>org.springframework.boot</groupId>
  44. <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
  45. </dependency>
  46. <!-- json -->
  47. <dependency>
  48. <groupId>com.alibaba</groupId>
  49. <artifactId>fastjson</artifactId>
  50. <version>${fast-json.version}</version>
  51. </dependency>
  52. <!-- lombok-->
  53. <dependency>
  54. <groupId>org.projectlombok</groupId>
  55. <artifactId>lombok</artifactId>
  56. <optional>true</optional>
  57. </dependency>
  58. <!-- 常规基础依赖-->
  59. <dependency>
  60. <groupId>org.springframework.boot</groupId>
  61. <artifactId>spring-boot-devtools</artifactId>
  62. <optional>true</optional>
  63. </dependency>
  64. <dependency>
  65. <groupId>org.springframework.boot</groupId>
  66. <artifactId>spring-boot-starter-test</artifactId>
  67. </dependency>
  68. </dependencies>
  69. <build>
  70. <plugins>
  71. <plugin>
  72. <groupId>org.springframework.boot</groupId>
  73. <artifactId>spring-boot-maven-plugin</artifactId>
  74. <configuration>
  75. <excludes>
  76. <exclude>
  77. <groupId>org.projectlombok</groupId>
  78. <artifactId>lombok</artifactId>
  79. </exclude>
  80. </excludes>
  81. </configuration>
  82. </plugin>
  83. </plugins>
  84. </build>
  85. </project>

4.3.2.2、配置文件

修改配置文件,去除 thymeleaf 的页面缓存。还有一个ElasticSearchConfig ES 配置类,但是内容和 3_SpringBoot集成ES 里面内容一致,就不再重复

  1. spring:
  2. thymeleaf:
  3. cache: false

4.3.2.3、爬取京东商品数据

通过 jsoup 工具进行简易的网页数据爬取,如果需要进一步加强,可先爬取解析页面的分页信息,再通过请求地址 配置分页数据循环爬取数据,es 也支持 from 和 size 进行后续数据分页查询

  1. package com.es.jsoup;
  2. import com.es.data.ESDataOperator;
  3. import com.es.model.Goods;
  4. import org.jsoup.Jsoup;
  5. import org.jsoup.nodes.Document;
  6. import org.jsoup.nodes.Element;
  7. import org.jsoup.select.Elements;
  8. import org.springframework.beans.factory.annotation.Autowired;
  9. import org.springframework.stereotype.Component;
  10. import java.io.IOException;
  11. import java.net.URL;
  12. import java.util.ArrayList;
  13. import java.util.List;
  14. /**
  15. * 爬取,解析京东数据
  16. */
  17. @Component
  18. public class JDData {
  19. @Autowired
  20. private ESDataOperator esDataOperator;
  21. static String url = "https://search.jd.com/Search?enc=utf-8&keyword=";
  22. /**
  23. * 爬取数据
  24. *
  25. * @param keyword
  26. * @return
  27. */
  28. private Document getJDData(String keyword) {
  29. try {
  30. return Jsoup.parse(new URL(url + keyword), 1000 * 30);
  31. } catch (IOException e) {
  32. e.printStackTrace();
  33. }
  34. return null;
  35. }
  36. /**
  37. * 解析数据
  38. *
  39. * @param keyword
  40. * @return
  41. */
  42. private List<Goods> analysis(String keyword) throws IOException {
  43. Document document = getJDData(keyword);
  44. //获取商品列表
  45. Element goodsElement = document.getElementById("J_goodsList");
  46. //商品一个个项
  47. List<Element> goodsLi = goodsElement.getElementsByTag("li");
  48. //解析
  49. List<Goods> goodsList = new ArrayList<>();
  50. for (Element element : goodsLi) {
  51. goodsList.add(parse2Goods(element));
  52. }
  53. return goodsList;
  54. }
  55. /**
  56. * li 节点解析数据
  57. *
  58. * @param element
  59. * @return
  60. */
  61. private Goods parse2Goods(Element element) {
  62. Goods goods = new Goods();
  63. //图片
  64. Elements imgElements = element.getElementsByTag("img");
  65. String img = imgElements.first().attr("data-lazy-img");
  66. goods.setImg(img);
  67. //名称
  68. Elements nameElement = element.select(".p-name");
  69. String name = nameElement.tagName("a").text();
  70. goods.setName(name);
  71. //价格
  72. String price = element.select(".p-price").tagName("i").text();
  73. goods.setPrice(price);
  74. return goods;
  75. }
  76. public List<Goods> search(String keyword) throws IOException {
  77. return esDataOperator.search(keyword);
  78. }
  79. /**
  80. * 组装数据
  81. *
  82. * @param keyword
  83. * @return
  84. */
  85. public void add2ES(String keyword) throws IOException {
  86. //判断库是否存在
  87. Boolean exists = esDataOperator.indexExists();
  88. if (!exists) {
  89. esDataOperator.createIndex();
  90. }
  91. //不存在则新增
  92. List<Goods> goodsList = analysis(keyword);
  93. //批量入库
  94. esDataOperator.bulkAdd(goodsList);
  95. }
  96. }

4.3.3.4、商品实体类

  1. @Data
  2. @ToString
  3. public class Goods {
  4. private String name;
  5. private String price;
  6. private String img;
  7. }

4.3.3.5、ES数据操作

  1. package com.es.data;
  2. import com.alibaba.fastjson.JSON;
  3. import com.es.model.Goods;
  4. import org.elasticsearch.action.bulk.BulkRequest;
  5. import org.elasticsearch.action.bulk.BulkResponse;
  6. import org.elasticsearch.action.index.IndexRequest;
  7. import org.elasticsearch.action.search.SearchRequest;
  8. import org.elasticsearch.action.search.SearchResponse;
  9. import org.elasticsearch.client.RequestOptions;
  10. import org.elasticsearch.client.RestHighLevelClient;
  11. import org.elasticsearch.client.indices.CreateIndexRequest;
  12. import org.elasticsearch.client.indices.CreateIndexResponse;
  13. import org.elasticsearch.client.indices.GetIndexRequest;
  14. import org.elasticsearch.common.text.Text;
  15. import org.elasticsearch.common.xcontent.XContentType;
  16. import org.elasticsearch.index.query.MatchQueryBuilder;
  17. import org.elasticsearch.index.query.QueryBuilders;
  18. import org.elasticsearch.search.SearchHit;
  19. import org.elasticsearch.search.builder.SearchSourceBuilder;
  20. import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
  21. import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
  22. import org.springframework.beans.factory.annotation.Autowired;
  23. import org.springframework.stereotype.Component;
  24. import java.io.IOException;
  25. import java.util.ArrayList;
  26. import java.util.List;
  27. import java.util.Map;
  28. /**
  29. * 数据操作
  30. */
  31. @Component
  32. public class ESDataOperator {
  33. @Autowired
  34. private RestHighLevelClient restHighLevelClient;
  35. private final static String GOODS_INDEX = "goods";
  36. /**
  37. * 判断是否存在
  38. *
  39. * @return
  40. * @throws IOException
  41. */
  42. public Boolean indexExists() throws IOException {
  43. GetIndexRequest indexRequest = new GetIndexRequest(GOODS_INDEX);
  44. return restHighLevelClient.indices().exists(indexRequest, RequestOptions.DEFAULT);
  45. }
  46. /**
  47. * 创建索引
  48. *
  49. * @return
  50. * @throws IOException
  51. */
  52. public Boolean createIndex() throws IOException {
  53. CreateIndexRequest indexRequest = new CreateIndexRequest(GOODS_INDEX);
  54. CreateIndexResponse response = restHighLevelClient.indices().create(indexRequest, RequestOptions.DEFAULT);
  55. return response.isAcknowledged();
  56. }
  57. /**
  58. * 批量追加 es 数据
  59. *
  60. * @param goodsList
  61. * @return
  62. */
  63. public boolean bulkAdd(List<Goods> goodsList) throws IOException {
  64. BulkRequest bulkRequest = new BulkRequest();
  65. for (Goods goods : goodsList) {
  66. IndexRequest indexRequest = new IndexRequest(GOODS_INDEX);
  67. indexRequest.source(JSON.toJSONString(goods), XContentType.JSON);
  68. bulkRequest.add(indexRequest);
  69. }
  70. BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
  71. return !response.hasFailures();
  72. }
  73. public List<Goods> search(String keyword) throws IOException {
  74. SearchRequest searchRequest = new SearchRequest(GOODS_INDEX);
  75. SearchSourceBuilder requestBuilder = new SearchSourceBuilder();
  76. MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("name", keyword);
  77. //高亮
  78. HighlightBuilder highlightBuilder = new HighlightBuilder();
  79. highlightBuilder.preTags("<span style='color:red'>");
  80. highlightBuilder.postTags("</span>");
  81. highlightBuilder.field("name");
  82. //设置查询
  83. requestBuilder.highlighter(highlightBuilder);
  84. requestBuilder.query(queryBuilder);
  85. searchRequest.source(requestBuilder);
  86. SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
  87. SearchHit[] searchHits = searchResponse.getHits().getHits();
  88. List<Goods> goodsList = new ArrayList<>();
  89. //常规
  90. // for (int i = 0; i < searchHits.length; i++) {
  91. // Goods goods = new Goods();
  92. // Map<String, Object> sourceMap = searchHits[i].getSourceAsMap();
  93. // goods.setName((String) sourceMap.get("name"));
  94. // goods.setImg((String) sourceMap.get("img"));
  95. // goods.setPrice((String) sourceMap.get("price"));
  96. // goodsList.add(goods);
  97. // }
  98. //设置替换高亮
  99. for (int i = 0; i < searchHits.length; i++) {
  100. Goods goods = new Goods();
  101. Map<String, Object> sourceMap = searchHits[i].getSourceAsMap();
  102. goods.setName(highlighter(searchHits[i], "name"));
  103. goods.setImg((String) sourceMap.get("img"));
  104. goods.setPrice((String) sourceMap.get("price"));
  105. goodsList.add(goods);
  106. }
  107. return goodsList;
  108. }
  109. /**
  110. * 高亮转换
  111. *
  112. * @param searchHit
  113. * @param fileName
  114. * @return
  115. */
  116. private String highlighter(SearchHit searchHit, String fileName) {
  117. Map<String, HighlightField> highlighterMap = searchHit.getHighlightFields();
  118. HighlightField highlightField = highlighterMap.get("name");
  119. StringBuffer sb = new StringBuffer();
  120. if (highlightField != null) {
  121. Text[] texts = highlightField.fragments();
  122. for (Text text : texts) {
  123. sb.append(text.toString());
  124. }
  125. }
  126. return sb.toString();
  127. }
  128. }

4.3.3.6、编写后端业务(新增、查询)

  1. package com.es.service;
  2. import com.es.jsoup.JDData;
  3. import com.es.model.Goods;
  4. import org.springframework.beans.factory.annotation.Autowired;
  5. import org.springframework.stereotype.Service;
  6. import java.io.IOException;
  7. import java.util.List;
  8. @Service
  9. public class SearchService {
  10. @Autowired
  11. private JDData jdData;
  12. public List<Goods> search(String keyword) throws IOException {
  13. return jdData.search(keyword);
  14. }
  15. public void add2ES(String keyword) throws IOException {
  16. jdData.add2ES(keyword);
  17. }
  18. }

4.3.3.7、后端接口

提供两个接口

  • add2ES:根据搜索词,从京东爬取第一页商品部分数据
  • search:根据搜索词,搜索本地 ES 服务数据 ```java package com.es.controller;

import com.es.model.Goods; import com.es.service.SearchService; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.*;

import java.io.IOException; import java.util.List;

@RestController public class SearchController {

  1. @Autowired
  2. private SearchService service;
  3. @GetMapping("/search/{keyword}")
  4. public List<Goods> search(@PathVariable("keyword") String keyword) throws IOException {
  5. return service.search(keyword);
  6. }
  7. @GetMapping("/{keyword}")
  8. public void add2ES(@PathVariable("keyword") String keyword) throws IOException {
  9. service.add2ES(keyword);
  10. }

}

  1. <a name="ceJUb"></a>
  2. ### 4.3.3、前端
  3. 前端通过 vue 进行快速页面开发,并借助 axios 进行快速 ajax 请求发送,文件结构如下:<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/25709938/1643340486657-ef846e74-895a-4934-b812-04b70d88ea1f.png#clientId=u85e1f7b6-f9fa-4&from=paste&height=200&id=u99f759b5&margin=%5Bobject%20Object%5D&name=image.png&originHeight=400&originWidth=646&originalType=binary&ratio=1&size=27889&status=done&style=none&taskId=u74562e91-7d11-40db-be77-2900800560d&width=323)
  4. <a name="psaAa"></a>
  5. #### 4.3.3.1、css
  6. ```css
  7. /*app*/
  8. #app {
  9. width: 100%;
  10. height: 100%;
  11. }
  12. /*header*/
  13. .header {
  14. background: #e3e4e5;
  15. color: #999;
  16. height: 23px;
  17. padding-top: 5px;
  18. font-size: 6px;
  19. }
  20. .header .left {
  21. padding-left: 50px;
  22. float: left;
  23. }
  24. .header .left span {
  25. margin-right: 20px;
  26. font-size: 14px;
  27. }
  28. .header .right {
  29. float: right;
  30. padding-right: 50px;
  31. }
  32. .header .right span {
  33. font-size: 14px;
  34. margin-right: 15px;
  35. }
  36. /*search*/
  37. .search {
  38. display: flex;
  39. padding-top: 10px;
  40. padding-bottom: 10px;
  41. }
  42. .search > .logo {
  43. width: 20%;
  44. margin: 20px 30px;
  45. }
  46. .logo img {
  47. width: 120px;
  48. }
  49. .search-content {
  50. display: grid;
  51. padding-top: 20px;
  52. }
  53. .search-component {
  54. display: flex;
  55. }
  56. .search-content .search-input input {
  57. height: 28px;
  58. width: 626px;
  59. border: solid red;
  60. }
  61. .search-content .search-btn button {
  62. height: 35px;
  63. width: 80px;
  64. border: solid red;
  65. background-color: red;
  66. color: white;
  67. font-size: 16px;
  68. }
  69. .host-keyword span {
  70. color: #999;
  71. padding-right: 10px;
  72. margin-right: 10px;
  73. border-right: solid #e3e4e5;
  74. font-size: 12px;
  75. }
  76. .line {
  77. height: 2px;
  78. background-color: red;
  79. }
  80. /*body*/
  81. .body {
  82. display: flex;
  83. padding-top: 20px;
  84. flex-wrap: wrap;
  85. padding-left: 50px;
  86. padding-right: 50px;
  87. }
  88. .body .goods-item {
  89. width: 17%;
  90. padding: 15px;
  91. border: 2px solid rgba(100, 100, 100, 0);
  92. }
  93. .goods-item .book-price {
  94. color: red;
  95. }
  96. .goods-item .book-name {
  97. font-size: 12px;
  98. color: #666;
  99. padding: 2px 5px;
  100. }
  101. .body .goods-item:hover {
  102. border: solid 2px #e3e4e5;
  103. color: red;
  104. }

4.3.3.2 html

页面需要放在 template 中,防止不被识别

  1. <!DOCTYPE html>
  2. <html lang="en" xmlns="http://www.w3.org/1999/html">
  3. <head>
  4. <meta charset="UTF-8">
  5. <title>Title</title>
  6. <script th:src="@{/js/axios.min.js}"></script>
  7. <script th:src="@{/js/vue.js}"></script>
  8. <link type="text/css" rel="styleSheet" th:href="@{/css/index.css}"/>
  9. <!-- <script src="../static/js/vue.js"></script>-->
  10. <!-- <script src="../static/js/axios.min.js"></script>-->
  11. <!-- <link rel="stylesheet" href="../static/css/index.css">-->
  12. </head>
  13. <body>
  14. <div id="app">
  15. <div class="header">
  16. <div class="left">
  17. <span>京东首页</span>
  18. <span>福建</span>
  19. </div>
  20. <div class="right">
  21. <span>你好 请登录</span>
  22. <span>免费注册</span>
  23. <span>我的订单</span>
  24. <span>我的京东</span>
  25. <span>京东会员</span>
  26. <span>企业采购</span>
  27. <span>客户服务</span>
  28. <span>网站导航</span>
  29. </div>
  30. </div>
  31. <div class="search">
  32. <div class="logo">
  33. <img th:src="@{/img/logo.png}">
  34. <!-- <img src="../static/img/logo.png">-->
  35. </div>
  36. <div class="search-content">
  37. <div class="search-component">
  38. <div class="search-input">
  39. <input v-model="keyword"/>
  40. </div>
  41. <div class="search-btn">
  42. <button @click="search">搜索</button>
  43. </div>
  44. </div>
  45. <div class="host-keyword">
  46. <span v-for="key in hostKeyword">
  47. {{key}}
  48. </span>
  49. </div>
  50. </div>
  51. </div>
  52. <div class="line"></div>
  53. <div class="body">
  54. <div v-for="item in result" class="goods-item">
  55. <div class="book-img">
  56. <img :src="item.img">
  57. </div>
  58. <div class="book-price">
  59. <div class="price-text">{{item.price}}</div>
  60. </div>
  61. <div class="book-name">
  62. <div class="price-text" v-html="item.name"></div>
  63. </div>
  64. </div>
  65. </div>
  66. </div>
  67. </body>
  68. <script>
  69. new Vue({
  70. el: "#app",
  71. data: {
  72. keyword: "",
  73. hostKeyword: ["python", "spring", "mysql", "spring boot", "java核心技术", "java web", "c++", "vue", "linux", "java从入门到精通"],
  74. result: []
  75. },
  76. methods: {
  77. search: function () {
  78. let url = "http://localhost:8080/search/" + this.keyword;
  79. axios.get(url).then(res => {
  80. console.log(res.data)
  81. this.result = res.data
  82. })
  83. }
  84. }
  85. })
  86. </script>
  87. </html>

4.3.4、测试

  • 启动 es
  • 启动 springboot-es-jd 项目
  • 先通过浏览器请求 http://localhost:8080/java 进行从京东抓取部分数据入es,其中 java 可以是任何关键字

浏览器访问:http://localhost:8080/,在搜索框中输入搜索关键字,效果如下
image.png