1 评分机制详解

1.1 评分机制TF/IDF

1.1.1 算法介绍

  • ElasticSearch使用的是term frequency/inverse document frequency算法,简称IF/IDF算法。TF:词频(term frequency),IDF:逆向文件频率(inverse document frequency)。
  • TF(Term Frequency):搜索文本中的各个词条在每个文档的字段中出现了多少次,出现次数越多,就越相关。

Term Frequency.png

  • TF例子:index:medicine,搜索请求:阿莫西林。
    • doc1:阿莫西林胶囊是什么。。。阿莫西林胶囊能做什么。。。阿莫西林胶囊结构。
    • doc2:本药店有阿莫西林胶囊、红霉素胶囊、青霉素胶囊。。。。
    • 结果:doc1>doc2。
  • IDF(Inverse Document Frequency):搜索文本中的各个词条在整个索引的所有文档中出现了多少次,出现的次数越多,就越不相关。

Inverse Document Frequency.png

  • IDF例子:index:medicine,搜索请求:A市阿莫西林胶囊。
    • doc1:A市健康大药房简介。本药店有阿莫西林胶囊、红霉素胶囊、青霉素胶囊。。。。
    • doc2:B市民生大药房简介。本药店有阿莫西林胶囊、红霉素胶囊、青霉素胶囊。。。。
    • doc3:C市未来大药房简介。本药店有阿莫西林胶囊、红霉素胶囊、青霉素胶囊。。。。
    • 结论:整个索引库中出现少的词,相关度权重高。
  • Field-length norm:field长度,field越长,相关度越弱。
  • Field-length norm例子:index:medicine,搜索请求:A市阿莫西林胶囊。
    • doc1:{"title":"A市健康大药房简介","content":"本药店有红霉素胶囊、青霉素胶囊。。。一万字"}
    • doc2:{"title":"B市民生大药房简介","content":"本药店有阿莫西林胶囊、红霉素胶囊、青霉素胶囊。。。一万字"}
    • 结论:doc1>doc2。

1.1.2 _score是如何被计算出来的?

  • ELK05 - 图3对用户输入的关键字进行分词。
  • ELK05 - 图4每个分词分别计算对每个匹配文档的tf值和idf值。
  • ELK05 - 图5综合每个分词的tf/idf值,利用公式计算每个文档的总分。
  • 示例:
  • 开启explain执行计划:
  1. GET /book/_search?explain=true
  2. {
  3. "query": {
  4. "match": {
  5. "description": "java程序员"
  6. }
  7. }
  8. }
  • 返回:
  1. {
  2. "took" : 655,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 3,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 2.2765667,
  16. "hits" : [
  17. {
  18. "_shard" : "[book][0]",
  19. "_node" : "MTYvUxUWRSqSE2GLsX8_zg",
  20. "_index" : "book",
  21. "_type" : "_doc",
  22. "_id" : "3",
  23. "_score" : 2.2765667,
  24. "_source" : {
  25. "name" : "spring开发基础",
  26. "description" : "spring 在java领域非常流行,java程序员都在用。",
  27. "studymodel" : "201001",
  28. "price" : 88.6,
  29. "timestamp" : "2019-08-24 19:11:35",
  30. "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
  31. "tags" : [
  32. "spring",
  33. "java"
  34. ]
  35. },
  36. "_explanation" : {
  37. "value" : 2.2765667,
  38. "description" : "sum of:",
  39. "details" : [
  40. {
  41. "value" : 0.79664993,
  42. "description" : "weight(description:java in 1) [PerFieldSimilarity], result of:",
  43. "details" : [
  44. {
  45. "value" : 0.79664993,
  46. "description" : "score(freq=2.0), computed as boost * idf * tf from:",
  47. "details" : [
  48. {
  49. "value" : 2.2,
  50. "description" : "boost",
  51. "details" : [ ]
  52. },
  53. {
  54. "value" : 0.47000363,
  55. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  56. "details" : [
  57. {
  58. "value" : 2,
  59. "description" : "n, number of documents containing term",
  60. "details" : [ ]
  61. },
  62. {
  63. "value" : 3,
  64. "description" : "N, total number of documents with field",
  65. "details" : [ ]
  66. }
  67. ]
  68. },
  69. {
  70. "value" : 0.7704485,
  71. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  72. "details" : [
  73. {
  74. "value" : 2.0,
  75. "description" : "freq, occurrences of term within document",
  76. "details" : [ ]
  77. },
  78. {
  79. "value" : 1.2,
  80. "description" : "k1, term saturation parameter",
  81. "details" : [ ]
  82. },
  83. {
  84. "value" : 0.75,
  85. "description" : "b, length normalization parameter",
  86. "details" : [ ]
  87. },
  88. {
  89. "value" : 16.0,
  90. "description" : "dl, length of field",
  91. "details" : [ ]
  92. },
  93. {
  94. "value" : 48.666668,
  95. "description" : "avgdl, average length of field",
  96. "details" : [ ]
  97. }
  98. ]
  99. }
  100. ]
  101. }
  102. ]
  103. },
  104. {
  105. "value" : 0.18407845,
  106. "description" : "weight(description:程 in 1) [PerFieldSimilarity], result of:",
  107. "details" : [
  108. {
  109. "value" : 0.18407845,
  110. "description" : "score(freq=1.0), computed as boost * idf * tf from:",
  111. "details" : [
  112. {
  113. "value" : 2.2,
  114. "description" : "boost",
  115. "details" : [ ]
  116. },
  117. {
  118. "value" : 0.13353139,
  119. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  120. "details" : [
  121. {
  122. "value" : 3,
  123. "description" : "n, number of documents containing term",
  124. "details" : [ ]
  125. },
  126. {
  127. "value" : 3,
  128. "description" : "N, total number of documents with field",
  129. "details" : [ ]
  130. }
  131. ]
  132. },
  133. {
  134. "value" : 0.62660944,
  135. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  136. "details" : [
  137. {
  138. "value" : 1.0,
  139. "description" : "freq, occurrences of term within document",
  140. "details" : [ ]
  141. },
  142. {
  143. "value" : 1.2,
  144. "description" : "k1, term saturation parameter",
  145. "details" : [ ]
  146. },
  147. {
  148. "value" : 0.75,
  149. "description" : "b, length normalization parameter",
  150. "details" : [ ]
  151. },
  152. {
  153. "value" : 16.0,
  154. "description" : "dl, length of field",
  155. "details" : [ ]
  156. },
  157. {
  158. "value" : 48.666668,
  159. "description" : "avgdl, average length of field",
  160. "details" : [ ]
  161. }
  162. ]
  163. }
  164. ]
  165. }
  166. ]
  167. },
  168. {
  169. "value" : 0.6479192,
  170. "description" : "weight(description:序 in 1) [PerFieldSimilarity], result of:",
  171. "details" : [
  172. {
  173. "value" : 0.6479192,
  174. "description" : "score(freq=1.0), computed as boost * idf * tf from:",
  175. "details" : [
  176. {
  177. "value" : 2.2,
  178. "description" : "boost",
  179. "details" : [ ]
  180. },
  181. {
  182. "value" : 0.47000363,
  183. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  184. "details" : [
  185. {
  186. "value" : 2,
  187. "description" : "n, number of documents containing term",
  188. "details" : [ ]
  189. },
  190. {
  191. "value" : 3,
  192. "description" : "N, total number of documents with field",
  193. "details" : [ ]
  194. }
  195. ]
  196. },
  197. {
  198. "value" : 0.62660944,
  199. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  200. "details" : [
  201. {
  202. "value" : 1.0,
  203. "description" : "freq, occurrences of term within document",
  204. "details" : [ ]
  205. },
  206. {
  207. "value" : 1.2,
  208. "description" : "k1, term saturation parameter",
  209. "details" : [ ]
  210. },
  211. {
  212. "value" : 0.75,
  213. "description" : "b, length normalization parameter",
  214. "details" : [ ]
  215. },
  216. {
  217. "value" : 16.0,
  218. "description" : "dl, length of field",
  219. "details" : [ ]
  220. },
  221. {
  222. "value" : 48.666668,
  223. "description" : "avgdl, average length of field",
  224. "details" : [ ]
  225. }
  226. ]
  227. }
  228. ]
  229. }
  230. ]
  231. },
  232. {
  233. "value" : 0.6479192,
  234. "description" : "weight(description:员 in 1) [PerFieldSimilarity], result of:",
  235. "details" : [
  236. {
  237. "value" : 0.6479192,
  238. "description" : "score(freq=1.0), computed as boost * idf * tf from:",
  239. "details" : [
  240. {
  241. "value" : 2.2,
  242. "description" : "boost",
  243. "details" : [ ]
  244. },
  245. {
  246. "value" : 0.47000363,
  247. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  248. "details" : [
  249. {
  250. "value" : 2,
  251. "description" : "n, number of documents containing term",
  252. "details" : [ ]
  253. },
  254. {
  255. "value" : 3,
  256. "description" : "N, total number of documents with field",
  257. "details" : [ ]
  258. }
  259. ]
  260. },
  261. {
  262. "value" : 0.62660944,
  263. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  264. "details" : [
  265. {
  266. "value" : 1.0,
  267. "description" : "freq, occurrences of term within document",
  268. "details" : [ ]
  269. },
  270. {
  271. "value" : 1.2,
  272. "description" : "k1, term saturation parameter",
  273. "details" : [ ]
  274. },
  275. {
  276. "value" : 0.75,
  277. "description" : "b, length normalization parameter",
  278. "details" : [ ]
  279. },
  280. {
  281. "value" : 16.0,
  282. "description" : "dl, length of field",
  283. "details" : [ ]
  284. },
  285. {
  286. "value" : 48.666668,
  287. "description" : "avgdl, average length of field",
  288. "details" : [ ]
  289. }
  290. ]
  291. }
  292. ]
  293. }
  294. ]
  295. }
  296. ]
  297. }
  298. },
  299. {
  300. "_shard" : "[book][0]",
  301. "_node" : "MTYvUxUWRSqSE2GLsX8_zg",
  302. "_index" : "book",
  303. "_type" : "_doc",
  304. "_id" : "1",
  305. "_score" : 0.94958127,
  306. "_source" : {
  307. "name" : "Bootstrap开发",
  308. "description" : "Bootstrap是由Twitter推出的一个前台页面开发css框架,是一个非常流行的开发框架,此框架集成了多种页面效果。此开发框架包含了大量的CSS、JS程序代码,可以帮助开发者(尤其是不擅长css页面开发的程序人员)轻松的实现一个css,不受浏览器限制的精美界面css效果。",
  309. "studymodel" : "201002",
  310. "price" : 38.6,
  311. "timestamp" : "2019-08-25 19:11:35",
  312. "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
  313. "tags" : [
  314. "bootstrap",
  315. "dev"
  316. ]
  317. },
  318. "_explanation" : {
  319. "value" : 0.94958127,
  320. "description" : "sum of:",
  321. "details" : [
  322. {
  323. "value" : 0.13911864,
  324. "description" : "weight(description:程 in 0) [PerFieldSimilarity], result of:",
  325. "details" : [
  326. {
  327. "value" : 0.13911864,
  328. "description" : "score(freq=2.0), computed as boost * idf * tf from:",
  329. "details" : [
  330. {
  331. "value" : 2.2,
  332. "description" : "boost",
  333. "details" : [ ]
  334. },
  335. {
  336. "value" : 0.13353139,
  337. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  338. "details" : [
  339. {
  340. "value" : 3,
  341. "description" : "n, number of documents containing term",
  342. "details" : [ ]
  343. },
  344. {
  345. "value" : 3,
  346. "description" : "N, total number of documents with field",
  347. "details" : [ ]
  348. }
  349. ]
  350. },
  351. {
  352. "value" : 0.47356468,
  353. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  354. "details" : [
  355. {
  356. "value" : 2.0,
  357. "description" : "freq, occurrences of term within document",
  358. "details" : [ ]
  359. },
  360. {
  361. "value" : 1.2,
  362. "description" : "k1, term saturation parameter",
  363. "details" : [ ]
  364. },
  365. {
  366. "value" : 0.75,
  367. "description" : "b, length normalization parameter",
  368. "details" : [ ]
  369. },
  370. {
  371. "value" : 104.0,
  372. "description" : "dl, length of field (approximate)",
  373. "details" : [ ]
  374. },
  375. {
  376. "value" : 48.666668,
  377. "description" : "avgdl, average length of field",
  378. "details" : [ ]
  379. }
  380. ]
  381. }
  382. ]
  383. }
  384. ]
  385. },
  386. {
  387. "value" : 0.48966968,
  388. "description" : "weight(description:序 in 0) [PerFieldSimilarity], result of:",
  389. "details" : [
  390. {
  391. "value" : 0.48966968,
  392. "description" : "score(freq=2.0), computed as boost * idf * tf from:",
  393. "details" : [
  394. {
  395. "value" : 2.2,
  396. "description" : "boost",
  397. "details" : [ ]
  398. },
  399. {
  400. "value" : 0.47000363,
  401. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  402. "details" : [
  403. {
  404. "value" : 2,
  405. "description" : "n, number of documents containing term",
  406. "details" : [ ]
  407. },
  408. {
  409. "value" : 3,
  410. "description" : "N, total number of documents with field",
  411. "details" : [ ]
  412. }
  413. ]
  414. },
  415. {
  416. "value" : 0.47356468,
  417. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  418. "details" : [
  419. {
  420. "value" : 2.0,
  421. "description" : "freq, occurrences of term within document",
  422. "details" : [ ]
  423. },
  424. {
  425. "value" : 1.2,
  426. "description" : "k1, term saturation parameter",
  427. "details" : [ ]
  428. },
  429. {
  430. "value" : 0.75,
  431. "description" : "b, length normalization parameter",
  432. "details" : [ ]
  433. },
  434. {
  435. "value" : 104.0,
  436. "description" : "dl, length of field (approximate)",
  437. "details" : [ ]
  438. },
  439. {
  440. "value" : 48.666668,
  441. "description" : "avgdl, average length of field",
  442. "details" : [ ]
  443. }
  444. ]
  445. }
  446. ]
  447. }
  448. ]
  449. },
  450. {
  451. "value" : 0.3207929,
  452. "description" : "weight(description:员 in 0) [PerFieldSimilarity], result of:",
  453. "details" : [
  454. {
  455. "value" : 0.3207929,
  456. "description" : "score(freq=1.0), computed as boost * idf * tf from:",
  457. "details" : [
  458. {
  459. "value" : 2.2,
  460. "description" : "boost",
  461. "details" : [ ]
  462. },
  463. {
  464. "value" : 0.47000363,
  465. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  466. "details" : [
  467. {
  468. "value" : 2,
  469. "description" : "n, number of documents containing term",
  470. "details" : [ ]
  471. },
  472. {
  473. "value" : 3,
  474. "description" : "N, total number of documents with field",
  475. "details" : [ ]
  476. }
  477. ]
  478. },
  479. {
  480. "value" : 0.31024224,
  481. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  482. "details" : [
  483. {
  484. "value" : 1.0,
  485. "description" : "freq, occurrences of term within document",
  486. "details" : [ ]
  487. },
  488. {
  489. "value" : 1.2,
  490. "description" : "k1, term saturation parameter",
  491. "details" : [ ]
  492. },
  493. {
  494. "value" : 0.75,
  495. "description" : "b, length normalization parameter",
  496. "details" : [ ]
  497. },
  498. {
  499. "value" : 104.0,
  500. "description" : "dl, length of field (approximate)",
  501. "details" : [ ]
  502. },
  503. {
  504. "value" : 48.666668,
  505. "description" : "avgdl, average length of field",
  506. "details" : [ ]
  507. }
  508. ]
  509. }
  510. ]
  511. }
  512. ]
  513. }
  514. ]
  515. }
  516. },
  517. {
  518. "_shard" : "[book][0]",
  519. "_node" : "MTYvUxUWRSqSE2GLsX8_zg",
  520. "_index" : "book",
  521. "_type" : "_doc",
  522. "_id" : "2",
  523. "_score" : 0.7534219,
  524. "_source" : {
  525. "name" : "java编程思想",
  526. "description" : "java语言是世界第一编程语言,在软件开发领域使用人数最多。",
  527. "studymodel" : "201001",
  528. "price" : 68.6,
  529. "timestamp" : "2019-08-25 19:11:35",
  530. "pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
  531. "tags" : [
  532. "java",
  533. "dev"
  534. ]
  535. },
  536. "_explanation" : {
  537. "value" : 0.7534219,
  538. "description" : "sum of:",
  539. "details" : [
  540. {
  541. "value" : 0.5867282,
  542. "description" : "weight(description:java in 0) [PerFieldSimilarity], result of:",
  543. "details" : [
  544. {
  545. "value" : 0.5867282,
  546. "description" : "score(freq=1.0), computed as boost * idf * tf from:",
  547. "details" : [
  548. {
  549. "value" : 2.2,
  550. "description" : "boost",
  551. "details" : [ ]
  552. },
  553. {
  554. "value" : 0.47000363,
  555. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  556. "details" : [
  557. {
  558. "value" : 2,
  559. "description" : "n, number of documents containing term",
  560. "details" : [ ]
  561. },
  562. {
  563. "value" : 3,
  564. "description" : "N, total number of documents with field",
  565. "details" : [ ]
  566. }
  567. ]
  568. },
  569. {
  570. "value" : 0.567431,
  571. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  572. "details" : [
  573. {
  574. "value" : 1.0,
  575. "description" : "freq, occurrences of term within document",
  576. "details" : [ ]
  577. },
  578. {
  579. "value" : 1.2,
  580. "description" : "k1, term saturation parameter",
  581. "details" : [ ]
  582. },
  583. {
  584. "value" : 0.75,
  585. "description" : "b, length normalization parameter",
  586. "details" : [ ]
  587. },
  588. {
  589. "value" : 25.0,
  590. "description" : "dl, length of field",
  591. "details" : [ ]
  592. },
  593. {
  594. "value" : 48.666668,
  595. "description" : "avgdl, average length of field",
  596. "details" : [ ]
  597. }
  598. ]
  599. }
  600. ]
  601. }
  602. ]
  603. },
  604. {
  605. "value" : 0.16669367,
  606. "description" : "weight(description:程 in 0) [PerFieldSimilarity], result of:",
  607. "details" : [
  608. {
  609. "value" : 0.16669367,
  610. "description" : "score(freq=1.0), computed as boost * idf * tf from:",
  611. "details" : [
  612. {
  613. "value" : 2.2,
  614. "description" : "boost",
  615. "details" : [ ]
  616. },
  617. {
  618. "value" : 0.13353139,
  619. "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
  620. "details" : [
  621. {
  622. "value" : 3,
  623. "description" : "n, number of documents containing term",
  624. "details" : [ ]
  625. },
  626. {
  627. "value" : 3,
  628. "description" : "N, total number of documents with field",
  629. "details" : [ ]
  630. }
  631. ]
  632. },
  633. {
  634. "value" : 0.567431,
  635. "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
  636. "details" : [
  637. {
  638. "value" : 1.0,
  639. "description" : "freq, occurrences of term within document",
  640. "details" : [ ]
  641. },
  642. {
  643. "value" : 1.2,
  644. "description" : "k1, term saturation parameter",
  645. "details" : [ ]
  646. },
  647. {
  648. "value" : 0.75,
  649. "description" : "b, length normalization parameter",
  650. "details" : [ ]
  651. },
  652. {
  653. "value" : 25.0,
  654. "description" : "dl, length of field",
  655. "details" : [ ]
  656. },
  657. {
  658. "value" : 48.666668,
  659. "description" : "avgdl, average length of field",
  660. "details" : [ ]
  661. }
  662. ]
  663. }
  664. ]
  665. }
  666. ]
  667. }
  668. ]
  669. }
  670. }
  671. ]
  672. }
  673. }

1.1.3 生产环境下判断某一个文档是否匹配上

  • 示例:
  1. GET /book/_explain/3
  2. {
  3. "query": {
  4. "match": {
  5. "description": "java程序员"
  6. }
  7. }
  8. }

1.2 Doc value

1.2.1 概述

  • 搜索的时候,需要依靠倒排索引;排序的时候,需要依靠正排索引,看到每个document的每个field,然后进行排序,所谓的正排索引,其实就是doc value。
  • 在建立索引的时候,一方面会建立倒排索引,以供搜索用;另一方面会建立正排索引,即doc value,以供排序、聚合、过滤等操作使用。
  • doc value是被保存在磁盘上的,如果内存足够,ES会自动将其缓存到内存中,性能会很高;但是,如果内存不够,OS会将其写入到磁盘上。

1.2.2 倒排索引应用案例

  • 示例:
  • doc1:hello world you and me
  • doc2:hi, world, how are you。 | term | doc1 | doc2 | | —- | —- | —- | | hello | | | | world | | | | you | | | | and | | | | me | | | | hi | | | | how | | | | are | | |
  • 搜索的步骤:
  • ①搜索hello you,会将hello you进行分词,分为hello和you。
  • ②hello对应doc1。
  • ③you对应doc1和doc2。
  • 但是排序的时候,会出现问题。

1.2.3 正排索引的案例

  • 示例:
  • doc1:{ "name": "jack", "age": 27 }
  • doc2:{ "name": "tom", "age": 30 }。 | document | name | age | | —- | —- | —- | | doc1 | jack | 27 | | doc2 | tom | 30 |
  • 正排索引在建立索引的时候,会将字段类似数据库表中的字段一样保存起来,也就是不进行分词,所以可以进行排序、聚合、过滤等操作。

1.3 ES分页的终极解决方案

1.3.1 前言

  • 假设现在有3个主分片,分别为P0、P1和P2,每个分片上有1亿数据,用户需要查询开始索引为9999,条数为10,并且name的名称是java的数据。

ES如何分页前言假设条件.png

  • ES会分为两个步骤:
    • query phase:ES会将前10000条的id,score等少量信息汇总到协调节点进行排序,然后获取到指定条数的数据的id,score等信息。
    • fetch phase:协调节点将指定条数的数据的id、score等信息构建mget到各个分片上去获取数据,返回给客户端。

ES如果不这样做的原因:如果一次性的将前10000条数据全部汇总到ES的协调节点,一旦每条数据的信息过大,会造成协调节点宕机。

1.3.2 query phase

  • query phase工作流程:
    • 搜索请求发送到某一个coordinate node,构建一个priority queue,长度以paging操作from和size为准,默认为10。
    • coordinate node将请求转发到所有shard,每个shard本地搜索,并构建一个本地的priority queue。
    • 各个shard将自己的priority queue返回给coordinate node,并构建一个全局的priority queue。
  • replica shard如何提升搜索吞吐量:一次请求要打到所有shard的一个replica/primary上去,如果每个shard都有多个replica,那么同时并发过来的搜索请求可以同时打到其他的replica上去。

1.3.3 fetch phase

  • fetch phbase工作流程:
    • coordinate node构建完priority queue之后,就发送mget请求去所有shard上获取对应的document。
    • 各个shard将document返回给coordinate node。
    • coordinate node将合并后的document结果返回给client客户端。
  • 一般搜索,如果不加from和size,就默认搜索前10条,按照_score排序。

1.4 搜索参数小结

1.4.1 preference

  • bouncing results问题(跳跃结果问题):两个document排序,field值相同;不同的shard上,可能排序不同;每次请求轮询打到不同的replica shard上,每次页面看到的搜索结果的排序都不一样。
  • 解决方案:将preference设置为一个字符串,比如说user_id,让每个user每次搜索的时候,都使用同一个replica shard去执行,就不会看到bouncing results问题。
  • preference参数决定了那些shard会被用来执行搜索问题。有_primary, _primary_first, _local,_only_node:xyz, _prefer_node:xyz,_shards:2,3
  1. GET /_search?preference=_shards:2,3

1.4.2 timeout

  • 限定在一定时间内,将部分获取的数据直接返回,避免查询耗时过长。
  1. GET /_search?timeout=10ms

1.4.3 routing

  • document文档路由,默认是_id路由,可以使用routing设置文档路由,让同一个请求发送到一个shard上去。
  1. GET /_search?routing=user123

2 聚合入门

2.1 聚合查询

2.1.1 分组查询

  • SQL写法:
  1. SELECT studymodel,count(*) as `count`
  2. FROM book
  3. group by studymodel;
  • Query DSL写法:
  1. GET /book/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "count": {
  9. "terms": {
  10. "field": "studymodel"
  11. }
  12. }
  13. }
  14. }

2.1.2 Terms分组查询

  • 准备工作:
  1. PUT /book/_mapping
  2. {
  3. "properties": {
  4. "tags": {
  5. "type": "text",
  6. "fielddata": true
  7. }
  8. }
  9. }
  • 示例:按照tags进行分组查询
  1. GET /book/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "count": {
  9. "terms": {
  10. "field": "tags"
  11. }
  12. }
  13. }
  14. }

2.1.3 加上搜索条件Terms分组查询

  • 示例:按照tags进行分组,并查询name中包含java的信息
  1. GET /book/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match": {
  6. "name": "java"
  7. }
  8. },
  9. "aggs": {
  10. "count": {
  11. "terms": {
  12. "field": "tags"
  13. }
  14. }
  15. }
  16. }

2.1.4 多字段分组查询

  • 示例:先按照tags进行分组,再计算每个tag下面的平均价格
  1. GET /book/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group_by_tags": {
  9. "terms": {
  10. "field": "tags"
  11. },
  12. "aggs": {
  13. "avg_price": {
  14. "avg": {
  15. "field": "price"
  16. }
  17. }
  18. }
  19. }
  20. }
  21. }

2.1.5 多字段分组查询排序

  • 示例:先按照tags进行分组,计算每个tag下面的平均价格,并按照平均价格降序
  1. GET /book/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group_by_tags": {
  9. "terms": {
  10. "field": "tags",
  11. "order": {
  12. "avg_price": "desc"
  13. }
  14. },
  15. "aggs": {
  16. "avg_price": {
  17. "avg": {
  18. "field": "price"
  19. }
  20. }
  21. }
  22. }
  23. }
  24. }

2.1.6 按照指定的价格范围进行分组查询

  • 示例:
  1. GET /book/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "group_by_price": {
  6. "range": {
  7. "field": "price",
  8. "ranges": [
  9. {
  10. "from": 0,
  11. "to": 40
  12. },
  13. {
  14. "from": 40,
  15. "to": 60
  16. },
  17. {
  18. "from": 60,
  19. "to": 80
  20. }
  21. ]
  22. }
  23. }
  24. }
  25. }

2.2 核心概念:bucket和metric

2.2.1 bucket:一个数据分组

city name
北京 张三
北京 李四
天津 王五
天津 赵六
天津 田七
  • 按照城市分组,就是分桶。select city,count(*) from table group by city
  • 划分出来两个bucket,一个就是北京bucket,另一个就是天津bucket。
  • 北京bucket:包含2个人,张三、李四。
  • 天津bucket:包含3个人,王五、赵六、田七。

2.2.2 metric:对一个数据分组执行的统计

  • metric:就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值和求最小值等。
  1. select city,count(*) from table group by city;
  • bucket:group by city指的是那些city相同的数据,就会被划分到一个bucket中。
  • metric:count(*),对每个bucket中的所有数据,进行统计。

3 电视案例

3.1 准备工作

  • 示例:创建索引和映射
  1. PUT /tvs
  2. {
  3. "mappings": {
  4. "properties": {
  5. "price": {
  6. "type": "long"
  7. },
  8. "color": {
  9. "type": "keyword"
  10. },
  11. "brand": {
  12. "type": "keyword"
  13. },
  14. "sold_date": {
  15. "type": "date"
  16. }
  17. }
  18. }
  19. }
  • 示例:插入数据
  1. PUT /tvs/_bulk
  2. { "index": {}}
  3. { "price" : 1000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-10-28" }
  4. { "index": {}}
  5. { "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-11-05" }
  6. { "index": {}}
  7. { "price" : 3000, "color" : "绿色", "brand" : "小米", "sold_date" : "2019-05-18" }
  8. { "index": {}}
  9. { "price" : 1500, "color" : "蓝色", "brand" : "TCL", "sold_date" : "2019-07-02" }
  10. { "index": {}}
  11. { "price" : 1200, "color" : "绿色", "brand" : "TCL", "sold_date" : "2019-08-19" }
  12. { "index": {}}
  13. { "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-11-05" }
  14. { "index": {}}
  15. { "price" : 8000, "color" : "红色", "brand" : "三星", "sold_date" : "2020-01-01" }
  16. { "index": {}}
  17. { "price" : 2500, "color" : "蓝色", "brand" : "小米", "sold_date" : "2020-02-12" }

3.2 统计那种颜色的电视销量最高

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "popular_color": {
  9. "terms": {
  10. "field": "color"
  11. }
  12. }
  13. }
  14. }

3.3 统计每种颜色电视的平均价格

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "color": {
  9. "terms": {
  10. "field": "color"
  11. },
  12. "aggs": {
  13. "avg_price": {
  14. "avg": {
  15. "field": "price"
  16. }
  17. }
  18. }
  19. }
  20. }
  21. }

3.4 统计每种颜色、平均价格下及每个颜色,每种品牌的平均价格

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group_by_color": {
  9. "terms": {
  10. "field": "color"
  11. },
  12. "aggs": {
  13. "avg_color_price": {
  14. "avg": {
  15. "field": "price"
  16. }
  17. },
  18. "group_by_brand": {
  19. "terms": {
  20. "field": "brand"
  21. },
  22. "aggs": {
  23. "avg_brand_price": {
  24. "avg": {
  25. "field": "price"
  26. }
  27. }
  28. }
  29. }
  30. }
  31. }
  32. }
  33. }

3.5 统计每个颜色的销售数量、平均价格、最大价格、最小价格、价格总和

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group_by_color": {
  9. "terms": {
  10. "field": "color"
  11. },
  12. "aggs": {
  13. "avg_price": {
  14. "avg": {
  15. "field": "price"
  16. }
  17. },
  18. "max_price":{
  19. "max": {
  20. "field": "price"
  21. }
  22. },
  23. "min_price":{
  24. "min": {
  25. "field": "price"
  26. }
  27. },
  28. "sum_price":{
  29. "sum": {
  30. "field": "price"
  31. }
  32. }
  33. }
  34. }
  35. }
  36. }

3.6 划分范围 histogram

  • 示例:按照价格每20一个区间,求每个区间的销售总额
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "price":{
  6. "histogram": {
  7. "field": "price",
  8. "interval": 20
  9. },
  10. "aggs": {
  11. "sum_price": {
  12. "sum": {
  13. "field": "price"
  14. }
  15. }
  16. }
  17. }
  18. }
  19. }

3.7 按照日期分组聚合

  • 示例:求每个月的销售个数
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "sales": {
  6. "date_histogram": {
  7. "field": "sold_date",
  8. "interval": "month",
  9. "format": "yyyy-MM-dd",
  10. "min_doc_count": 0,
  11. "extended_bounds": {
  12. "min": "2019-01-01",
  13. "max": "2019-12-31"
  14. }
  15. }
  16. }
  17. }
  18. }

3.8 统计每个季度每个品牌的销售总额,以及每个季度的销售总额

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "group_by_sold_date": {
  6. "date_histogram": {
  7. "field": "sold_date",
  8. "interval": "quarter",
  9. "format": "yyyy-MM-dd",
  10. "min_doc_count": 0,
  11. "extended_bounds": {
  12. "min": "2019-01-01",
  13. "max": "2019-12-31"
  14. }
  15. },
  16. "aggs": {
  17. "group_by_brand": {
  18. "terms": {
  19. "field": "brand"
  20. },
  21. "aggs": {
  22. "sum_price": {
  23. "sum": {
  24. "field": "price"
  25. }
  26. }
  27. }
  28. },
  29. "total_price":{
  30. "sum": {
  31. "field": "price"
  32. }
  33. }
  34. }
  35. }
  36. }
  37. }

3.9 查询某个品牌按颜色的销量

  • 示例:小米品牌的各种颜色的销量
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "term": {
  6. "brand": {
  7. "value": "小米"
  8. }
  9. }
  10. },
  11. "aggs": {
  12. "group_by_color": {
  13. "terms": {
  14. "field": "color"
  15. }
  16. }
  17. }
  18. }

3.10 单个品牌和所有品牌的均价对比

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "term": {
  6. "brand": {
  7. "value": "小米"
  8. }
  9. }
  10. },
  11. "aggs": {
  12. "single_brand_avg_price": {
  13. "avg": {
  14. "field": "price"
  15. }
  16. },
  17. "all": {
  18. "global": {},
  19. "aggs": {
  20. "all_brand_avg_price": {
  21. "avg": {
  22. "field": "price"
  23. }
  24. }
  25. }
  26. }
  27. }
  28. }

3.11 统计价格大于1200的电视平均价格

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "constant_score": {
  6. "filter": {
  7. "range": {
  8. "price": {
  9. "gte": 1200
  10. }
  11. }
  12. }
  13. }
  14. },
  15. "aggs": {
  16. "avg_price": {
  17. "avg": {
  18. "field": "price"
  19. }
  20. }
  21. }
  22. }

3.12 统计某品牌最近一个月的平均价格

  • 示例:
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "term": {
  6. "brand": {
  7. "value": "小米"
  8. }
  9. }
  10. },
  11. "aggs": {
  12. "recent_150d": {
  13. "filter": {
  14. "range": {
  15. "sold_date": {
  16. "gte": "now-150d"
  17. }
  18. }
  19. },
  20. "aggs": {
  21. "recent_150d_avg_price": {
  22. "avg": {
  23. "field": "price"
  24. }
  25. }
  26. }
  27. },
  28. "recent_140d": {
  29. "filter": {
  30. "range": {
  31. "sold_date": {
  32. "gte": "now-140d"
  33. }
  34. }
  35. },
  36. "aggs": {
  37. "recent_140d_avg_price": {
  38. "avg": {
  39. "field": "price"
  40. }
  41. }
  42. }
  43. },
  44. "recent_130d": {
  45. "filter": {
  46. "range": {
  47. "sold_date": {
  48. "gte": "now-130d"
  49. }
  50. }
  51. },
  52. "aggs": {
  53. "recent_130d_avg_price": {
  54. "avg": {
  55. "field": "price"
  56. }
  57. }
  58. }
  59. }
  60. }
  61. }

4 Java API实现聚合

4.1 按照颜色分组,计算每个颜色卖出的个数

  • 示例:Query DSL
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group by color": {
  9. "terms": {
  10. "field": "color"
  11. }
  12. }
  13. }
  14. }
  • 示例:
  1. package com.sunxaiping.elk;
  2. import org.elasticsearch.action.search.SearchRequest;
  3. import org.elasticsearch.action.search.SearchResponse;
  4. import org.elasticsearch.client.RequestOptions;
  5. import org.elasticsearch.client.RestHighLevelClient;
  6. import org.elasticsearch.index.query.QueryBuilders;
  7. import org.elasticsearch.search.aggregations.AggregationBuilders;
  8. import org.elasticsearch.search.aggregations.Aggregations;
  9. import org.elasticsearch.search.aggregations.bucket.terms.Terms;
  10. import org.elasticsearch.search.builder.SearchSourceBuilder;
  11. import org.junit.jupiter.api.Test;
  12. import org.springframework.beans.factory.annotation.Autowired;
  13. import org.springframework.boot.test.context.SpringBootTest;
  14. import java.io.IOException;
  15. import java.util.List;
  16. /**
  17. * Query DSL
  18. */
  19. @SpringBootTest
  20. public class ElkApplicationTests {
  21. @Autowired
  22. private RestHighLevelClient client;
  23. /**
  24. *
  25. <pre>
  26. * GET /tvs/_search
  27. * {
  28. * "size": 0,
  29. * "query": {
  30. * "match_all": {}
  31. * },
  32. * "aggs": {
  33. * "group_by_color": {
  34. * "terms": {
  35. * "field": "color"
  36. * }
  37. * }
  38. * }
  39. * }
  40. * </pre>
  41. */
  42. @Test
  43. public void test() throws IOException {
  44. //创建请求
  45. SearchRequest searchRequest = new SearchRequest("tvs");
  46. //请求体
  47. SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
  48. searchSourceBuilder.size(0);
  49. searchSourceBuilder.query(QueryBuilders.matchAllQuery());
  50. searchSourceBuilder.aggregation(AggregationBuilders.terms("group_by_color").field("color"));
  51. searchRequest.source(searchSourceBuilder);
  52. //发送请求
  53. SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
  54. //获取结果
  55. Aggregations aggregations = response.getAggregations();
  56. Terms terms = aggregations.get("group_by_color");
  57. List<? extends Terms.Bucket> buckets = terms.getBuckets();
  58. for (Terms.Bucket bucket : buckets) {
  59. String key = bucket.getKeyAsString();
  60. System.out.println("key = " + key);
  61. long docCount = bucket.getDocCount();
  62. System.out.println("docCount = " + docCount);
  63. }
  64. }
  65. }

4.2 按照颜色分组,计算每个颜色卖出的个数以及每个颜色卖出的平均价格

  • 示例:Query DSL
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group_by_color": {
  9. "terms": {
  10. "field": "color"
  11. },"aggs": {
  12. "avg_price": {
  13. "avg": {
  14. "field": "price"
  15. }
  16. }
  17. }
  18. }
  19. }
  20. }
  • 示例:
  1. package com.sunxaiping.elk;
  2. import org.elasticsearch.action.search.SearchRequest;
  3. import org.elasticsearch.action.search.SearchResponse;
  4. import org.elasticsearch.client.RequestOptions;
  5. import org.elasticsearch.client.RestHighLevelClient;
  6. import org.elasticsearch.index.query.QueryBuilders;
  7. import org.elasticsearch.search.aggregations.AggregationBuilders;
  8. import org.elasticsearch.search.aggregations.Aggregations;
  9. import org.elasticsearch.search.aggregations.bucket.terms.Terms;
  10. import org.elasticsearch.search.aggregations.metrics.Avg;
  11. import org.elasticsearch.search.builder.SearchSourceBuilder;
  12. import org.junit.jupiter.api.Test;
  13. import org.springframework.beans.factory.annotation.Autowired;
  14. import org.springframework.boot.test.context.SpringBootTest;
  15. import java.io.IOException;
  16. import java.util.List;
  17. /**
  18. * Query DSL
  19. */
  20. @SpringBootTest
  21. public class ElkApplicationTests {
  22. @Autowired
  23. private RestHighLevelClient client;
  24. /**
  25. *
  26. <pre>
  27. * GET /tvs/_search
  28. * {
  29. * "size": 0,
  30. * "query": {
  31. * "match_all": {}
  32. * },
  33. * "aggs": {
  34. * "group_by_color": {
  35. * "terms": {
  36. * "field": "color"
  37. * },"aggs": {
  38. * "avg_price": {
  39. * "avg": {
  40. * "field": "price"
  41. * }
  42. * }
  43. * }
  44. * }
  45. * }
  46. * }
  47. * </pre>
  48. */
  49. @Test
  50. public void test() throws IOException {
  51. //创建请求
  52. SearchRequest searchRequest = new SearchRequest("tvs");
  53. //请求体
  54. SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
  55. searchSourceBuilder.size(0);
  56. searchSourceBuilder.query(QueryBuilders.matchAllQuery());
  57. searchSourceBuilder.aggregation(AggregationBuilders.terms("group_by_color").field("color").subAggregation(AggregationBuilders.avg("avg_price").field("price")));
  58. searchRequest.source(searchSourceBuilder);
  59. //发送请求
  60. SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
  61. //获取结果
  62. Aggregations aggregations = response.getAggregations();
  63. Terms terms = aggregations.get("group_by_color");
  64. List<? extends Terms.Bucket> buckets = terms.getBuckets();
  65. for (Terms.Bucket bucket : buckets) {
  66. String key = bucket.getKeyAsString();
  67. System.out.println("key = " + key);
  68. long docCount = bucket.getDocCount();
  69. System.out.println("docCount = " + docCount);
  70. Avg avgPrice = bucket.getAggregations().get("avg_price");
  71. double value = avgPrice.getValue();
  72. System.out.println("value = " + value);
  73. }
  74. }
  75. }

4.3 按照颜色分组,计算每个颜色卖出的个数,每个颜色卖出价格的平均值、最大值、最小值和总和

  • 示例:Query DSL
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "group_by_color": {
  9. "terms": {
  10. "field": "color"
  11. },
  12. "aggs": {
  13. "avg_price": {
  14. "avg": {
  15. "field": "price"
  16. }
  17. },
  18. "max_price": {
  19. "max": {
  20. "field": "price"
  21. }
  22. },
  23. "min_price": {
  24. "min": {
  25. "field": "price"
  26. }
  27. },
  28. "sum_price":{
  29. "sum": {
  30. "field": "price"
  31. }
  32. }
  33. }
  34. }
  35. }
  36. }
  • 示例:
  1. package com.sunxaiping.elk;
  2. import org.elasticsearch.action.search.SearchRequest;
  3. import org.elasticsearch.action.search.SearchResponse;
  4. import org.elasticsearch.client.RequestOptions;
  5. import org.elasticsearch.client.RestHighLevelClient;
  6. import org.elasticsearch.index.query.QueryBuilders;
  7. import org.elasticsearch.search.aggregations.AggregationBuilders;
  8. import org.elasticsearch.search.aggregations.Aggregations;
  9. import org.elasticsearch.search.aggregations.bucket.terms.Terms;
  10. import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
  11. import org.elasticsearch.search.aggregations.metrics.Avg;
  12. import org.elasticsearch.search.aggregations.metrics.Max;
  13. import org.elasticsearch.search.aggregations.metrics.Min;
  14. import org.elasticsearch.search.aggregations.metrics.Sum;
  15. import org.elasticsearch.search.builder.SearchSourceBuilder;
  16. import org.junit.jupiter.api.Test;
  17. import org.springframework.beans.factory.annotation.Autowired;
  18. import org.springframework.boot.test.context.SpringBootTest;
  19. import java.io.IOException;
  20. import java.util.List;
  21. /**
  22. * Query DSL
  23. */
  24. @SpringBootTest
  25. public class ElkApplicationTests {
  26. @Autowired
  27. private RestHighLevelClient client;
  28. /**
  29. *
  30. <pre>
  31. * GET /tvs/_search
  32. * {
  33. * "size": 0,
  34. * "query": {
  35. * "match_all": {}
  36. * },
  37. * "aggs": {
  38. * "group_by_color": {
  39. * "terms": {
  40. * "field": "color"
  41. * },
  42. * "aggs": {
  43. * "avg_price": {
  44. * "avg": {
  45. * "field": "price"
  46. * }
  47. * },
  48. * "max_price": {
  49. * "max": {
  50. * "field": "price"
  51. * }
  52. * },
  53. * "min_price": {
  54. * "min": {
  55. * "field": "price"
  56. * }
  57. * },
  58. * "sum_price":{
  59. * "sum": {
  60. * "field": "price"
  61. * }
  62. * }
  63. * }
  64. * }
  65. * }
  66. * }
  67. * </pre>
  68. */
  69. @Test
  70. public void test() throws IOException {
  71. //创建请求
  72. SearchRequest searchRequest = new SearchRequest("tvs");
  73. //请求体
  74. SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
  75. searchSourceBuilder.size(0);
  76. searchSourceBuilder.query(QueryBuilders.matchAllQuery());
  77. TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("group_by_color").field("color");
  78. termsAggregationBuilder.subAggregation(AggregationBuilders.avg("avg_price").field("price"));
  79. termsAggregationBuilder.subAggregation(AggregationBuilders.max("max_price").field("price"));
  80. termsAggregationBuilder.subAggregation(AggregationBuilders.min("min_price").field("price"));
  81. termsAggregationBuilder.subAggregation(AggregationBuilders.sum("sum_price").field("price"));
  82. searchSourceBuilder.aggregation(termsAggregationBuilder);
  83. searchRequest.source(searchSourceBuilder);
  84. //发送请求
  85. SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
  86. //获取结果
  87. Aggregations aggregations = response.getAggregations();
  88. Terms terms = aggregations.get("group_by_color");
  89. List<? extends Terms.Bucket> buckets = terms.getBuckets();
  90. for (Terms.Bucket bucket : buckets) {
  91. String key = bucket.getKeyAsString();
  92. System.out.println("key = " + key);
  93. long docCount = bucket.getDocCount();
  94. System.out.println("docCount = " + docCount);
  95. Avg avgPrice = bucket.getAggregations().get("avg_price");
  96. System.out.println("avgPrice = " + avgPrice.getValue());
  97. Max maxPrice = bucket.getAggregations().get("max_price");
  98. System.out.println("maxPrice = " + maxPrice.getValue());
  99. Min minPrice = bucket.getAggregations().get("min_price");
  100. System.out.println("minPrice = " + minPrice.getValue());
  101. Sum sumPrice = bucket.getAggregations().get("sum_price");
  102. System.out.println("sumPrice = " + sumPrice.getValue());
  103. }
  104. }
  105. }

4.4 按照售价每2000价格划分范围,算出每个区间的销售总额

  • 示例:Query DSL
  1. GET /tvs/_search
  2. {
  3. "size": 0,
  4. "query": {
  5. "match_all": {}
  6. },
  7. "aggs": {
  8. "price": {
  9. "histogram": {
  10. "field": "price",
  11. "interval": 2000
  12. },
  13. "aggs": {
  14. "sum_price": {
  15. "sum": {
  16. "field": "price"
  17. }
  18. }
  19. }
  20. }
  21. }
  22. }
  • 示例:
  1. package com.sunxaiping.elk;
  2. import org.elasticsearch.action.search.SearchRequest;
  3. import org.elasticsearch.action.search.SearchResponse;
  4. import org.elasticsearch.client.RequestOptions;
  5. import org.elasticsearch.client.RestHighLevelClient;
  6. import org.elasticsearch.index.query.QueryBuilders;
  7. import org.elasticsearch.search.aggregations.AggregationBuilders;
  8. import org.elasticsearch.search.aggregations.Aggregations;
  9. import org.elasticsearch.search.aggregations.bucket.histogram.Histogram;
  10. import org.elasticsearch.search.aggregations.bucket.histogram.HistogramAggregationBuilder;
  11. import org.elasticsearch.search.aggregations.metrics.Sum;
  12. import org.elasticsearch.search.builder.SearchSourceBuilder;
  13. import org.junit.jupiter.api.Test;
  14. import org.springframework.beans.factory.annotation.Autowired;
  15. import org.springframework.boot.test.context.SpringBootTest;
  16. import java.io.IOException;
  17. import java.util.List;
  18. /**
  19. * Query DSL
  20. */
  21. @SpringBootTest
  22. public class ElkApplicationTests {
  23. @Autowired
  24. private RestHighLevelClient client;
  25. /**
  26. *
  27. <pre>
  28. * GET /tvs/_search
  29. * {
  30. * "size": 0,
  31. * "query": {
  32. * "match_all": {}
  33. * },
  34. * "aggs": {
  35. * "price": {
  36. * "histogram": {
  37. * "field": "price",
  38. * "interval": 2000
  39. * },
  40. * "aggs": {
  41. * "sum_price": {
  42. * "sum": {
  43. * "field": "price"
  44. * }
  45. * }
  46. * }
  47. * }
  48. * }
  49. * }
  50. * </pre>
  51. */
  52. @Test
  53. public void test() throws IOException {
  54. //创建请求
  55. SearchRequest searchRequest = new SearchRequest("tvs");
  56. //请求体
  57. SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
  58. searchSourceBuilder.size(0);
  59. searchSourceBuilder.query(QueryBuilders.matchAllQuery());
  60. HistogramAggregationBuilder histogramAggregationBuilder = AggregationBuilders.histogram("price").field("price").interval(2000);
  61. histogramAggregationBuilder.subAggregation(AggregationBuilders.sum("sum_price").field("price"));
  62. searchSourceBuilder.aggregation(histogramAggregationBuilder);
  63. searchRequest.source(searchSourceBuilder);
  64. //发送请求
  65. SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
  66. //获取结果
  67. Aggregations aggregations = response.getAggregations();
  68. Histogram histogram = aggregations.get("price");
  69. List<? extends Histogram.Bucket> buckets = histogram.getBuckets();
  70. for (Histogram.Bucket bucket : buckets) {
  71. long docCount = bucket.getDocCount();
  72. System.out.println("docCount = " + docCount);
  73. Sum sumPrice = bucket.getAggregations().get("sum_price");
  74. System.out.println("sumPrice = " + sumPrice.getValue());
  75. }
  76. }
  77. }

5 ES7新特性之SQL

5.1 快速入门

  • 示例:
  1. POST /_sql?format=txt
  2. {
  3. "query": "SELECT * FROM tvs"
  4. }

5.2 获取结果方式

  • HTTP请求。
  • 客户端:elasticsearch-sql-cli.bat。
  • 代码,类似于JDBC。

5.3 响应数据格式

响应数据格式.png

5.4 SQL翻译

  • 示例:
  • 验证SQL翻译:
  1. POST /_sql/translate
  2. {
  3. "query": "SELECT * FROM tvs"
  4. }
  • 返回:
  1. {
  2. "size" : 1000,
  3. "_source" : {
  4. "includes" : [
  5. "price"
  6. ],
  7. "excludes" : [ ]
  8. },
  9. "docvalue_fields" : [
  10. {
  11. "field" : "brand"
  12. },
  13. {
  14. "field" : "color"
  15. },
  16. {
  17. "field" : "sold_date",
  18. "format" : "epoch_millis"
  19. }
  20. ],
  21. "sort" : [
  22. {
  23. "_doc" : {
  24. "order" : "asc"
  25. }
  26. }
  27. ]
  28. }

5.5 和其他DSL的结合

  • 示例:
  1. POST /_sql?format=txt
  2. {
  3. "query": "SELECT * FROM tvs",
  4. "filter":{
  5. "range": {
  6. "price": {
  7. "gte": 1000,
  8. "lte": 2000
  9. }
  10. }
  11. }
  12. }

5.6 Java代码实现SQL功能

5.6.1 前提

  • 需要开启ES的白金版功能。

开启ES的白金版功能1.png

开启ES的白金版功能2.png

大型企业可以购买白金版,增加machine Learning,高级安全性x-pack。

5.6.2 准备工作

  • 导入相关依赖:
  1. <dependency>
  2. <groupId>org.elasticsearch.plugin</groupId>
  3. <artifactId>x-pack-sql-jdbc</artifactId>
  4. <version>7.10.0</version>
  5. </dependency>

5.6.3 应用示例

  • 示例:
  1. package com.sunxaiping.elk;
  2. import org.junit.jupiter.api.Test;
  3. import org.springframework.boot.test.context.SpringBootTest;
  4. import java.sql.*;
  5. @SpringBootTest
  6. public class ElkApplicationTests {
  7. @Test
  8. public void test() throws Exception {
  9. Class.forName("org.elasticsearch.xpack.sql.jdbc.EsDriver");
  10. Connection connection = DriverManager.getConnection("jdbc:es://http://localhost:9200");
  11. Statement statement = connection.createStatement();
  12. ResultSet resultSet = statement.executeQuery("SELECT brand,color,price,sold_date FROM tvs");
  13. while (resultSet.next()) {
  14. System.out.println(resultSet.getString(1));
  15. System.out.println(resultSet.getString(2));
  16. System.out.println(resultSet.getDouble(3));
  17. System.out.println(resultSet.getDate(4));
  18. }
  19. }
  20. }