1. run script ```powershell

    start

    on Unix or MacOS

    bin/solr start -e cloud

    on Windows

    bin\solr.cmd start -e cloud

restart

If you have defined ZK_HOST in solr.in.sh/solr.in.cmd (see instructions)

you can omit -z from the above command.

./bin/solr start -c -p 8983 -s example/cloud/node1/solr ./bin/solr start -c -p 7574 -s example/cloud/node2/solr -z localhost:9983

  1. 2. Because we are starting in SolrCloud mode, and did not define any details about an external ZooKeeper cluster, Solr launches its own ZooKeeper and connects both nodes to it.
  2. 2. Let's name our collection "techproducts"
  3. 2. A collection must have a configset, which at a minimum includes the two main configuration files for Solr: the schema file (named either managed-schema or schema.xml), and solrconfig.xml.
  4. sample_techproducts_configs is specifically designed to support the sample data we want to use.
  5. 5. [http://localhost:8983/solr/#/~cloud?view=nodes](http://localhost:8983/solr/#/~cloud?view=nodes)![image.png](https://cdn.nlark.com/yuque/0/2022/png/436938/1645691704228-12ee9588-a9c4-4301-b4a8-eb5f88f2709a.png#clientId=uaf9b3075-4f94-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=210&id=u96038e2a&margin=%5Bobject%20Object%5D&name=image.png&originHeight=210&originWidth=919&originalType=binary&ratio=1&rotation=0&showTitle=false&size=30382&status=done&style=none&taskId=uad1fffed-3e57-4773-9f6f-cd499e2cd8c&title=&width=919)
  6. <a name="FgxyF"></a>
  7. ## Index the Techproducts Data
  8. 6. Solr includes the bin/post tool in order to facilitate indexing various types of documents easily.
  9. ```powershell
  10. # Linux/Mac
  11. solr-8.11.0:$ bin/post -c techproducts example/exampledocs/*
  12. # Windows
  13. C:\solr-8.11.0> java -jar -Dc=techproducts -Dauto example\exampledocs\post.jar example\exampledocs\*

Basic Searching

Solr can be queried via REST clients, curl, wget, Chrome POSTMAN, etc., as well as via native clients available for many programming languages.

Search for a Single Term

  • q=Foundation

[http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=Foundation](http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=Foundation)

  • f1=id,cat

[http://localhost:8983/solr/techproducts/select?fl=id%2Ccat&indent=true&q.op=OR&q=*%3A*](http://localhost:8983/solr/techproducts/select?fl=id%2Ccat&indent=true&q.op=OR&q=*%3A*)

Field Searches

  • q=cat:electronics

[http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=cat%3Aelectronics](http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=cat%3Aelectronics)

Phrase Search

  • q=”multiple terms here”

[http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%22CAS%20latency%22](http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%22CAS%20latency%22)

  • q=\”multiple+terms+here\”

curl "http://localhost:8983/solr/techproducts/select?q=\"CAS+latency\""

Combining Searches

+should be encoded in url and - no need.

  • q=+electronics +music

[http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%2Belectronics%20%2Bmusic](http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%2Belectronics%20%2Bmusic)

  • q=+electronics -music

[http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%2Belectronics%20-music](http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%2Belectronics%20-music)

  • q=%2Belectronics%20%2Bmusic

curl "http://localhost:8983/solr/techproducts/select?q=%2Belectronics%20%2Bmusic"

  • q=%2Belectronics+-music

中间的+们没有encode是因为它是作为phrase的空格替代
curl "http://localhost:8983/solr/techproducts/select?q=%2Belectronics+-music"

Delete collection, stop nodes, stop all

  1. # delete this collection
  2. bin/solr delete -c techproducts
  3. # create a new collection:
  4. bin/solr create -c <yourCollection> -s 2 -rf 2

The following command line will stop Solr and remove the directories for each of the two nodes that were created all the way back in Exercise 1:

  1. bin/solr stop -all;
  2. rm -Rf example/cloud/

Exercise 2: Modify the Schema and Index Films Data

  • a feature in Solr called “field guessing“, Solr attempts to guess what type of data is in a field while it’s indexing it, also automatically creates new fields in the schema for new fields that appear in incoming documents. This mode is called “Schemaless

    What is a “schema” and why do I need one?

    Solr’s schema is a single file (in XML) that stores the details about the fields and field types Solr is expected to understand. The schema defines not only the field or field type names, but also any modifications that should happen to a field before it is indexed. For example, if you want to ensure that a user who enters “abc” and a user who enters “ABC” can both find a document containing the term “ABC”, you will want to normalize (lower-case it, in this case) “ABC” when it is indexed, and normalize the user query to be sure of a match. These rules are defined in your schema.

    Earlier in the tutorial we mentioned copy fields, which are fields made up of data that originated from other fields. You can also define dynamic fields, which use wildcards (such as _t or _s) to dynamically create fields of a specific field type. These types of rules are also defined in the schema.

Create a New Collection

use a configset that has a very minimal schema and let Solr figure out from the data what fields to add,_default configset is the default and is used if you don’t specify one at all.

  1. # linux
  2. bin/solr create -c films -s 2 -rf 2
  3. # windows
  4. bin/solr.cmd create -c films -s 2 -rf 2

Preparing Schemaless for the Films Data

Create the “names” Field

_default configset two parallel things happening

  • managed schema.only be modified by Solr’s Schema API
  • field guessing. configured in the solrconfig.xml

the schemaless features are fine to start with, but not recommend going to production without a schema that you have defined yourself.

  • uses the Schema API

curl "[http://localhost:8983/solr/films/schema"](http://localhost:8983/solr/films/schema") -X POST -H "Content-type:application/json" --data-binary "{\"add-field\": {\"name\":\"name\", \"type\":\"text_general\", \"multiValued\":false, \"stored\":true}}"

  • use the Admin UI to create fields

image.pngimage.png

Create a “catchall” Copy Field

相当于q=foo在所有fields中查询
in last exercise’s configuration, copy fields into a text field, and that field was the default when no other field was defined in the query.
now, set up a “catchall field” by defining a copy field that will take all data from all fields and index it into a field named text.
curl "[http://localhost:8983/solr/films/schema](http://localhost:8983/solr/films/schema)"-X POST -H "Content-type:application/json" --data-binary "{\"add-copy-field\" : {\"source\":\"*\",\"dest\":\"_text_\"}}"
image.png

Index Sample Film Data

  1. bin/post -c films example/films/films.json
  2. C:\solr-8.11.0> java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films\*.json
  • -c films: this is the Solr collection to index data to. ```shell bin/post -c films example/films/films.xml

C:\solr-8.11.0> java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films*.xml

  1. ```shell
  2. bin/post -c films example/films/films.csv -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
  3. java -jar -Dc=films -Dparams="f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|" -Dauto .\example\exampledocs\post.jar .\example\films\*.csv

Note the CSV command includes extra parameters. This is to ensure multi-valued entries in the “genre” and “directed_by” columns are split by the pipe (|) character, used in this file as a separator. Telling Solr to split these columns this way will ensure proper indexing of the data.

csv文件中词语之间的空格用|分隔的

Faceting

Field Facets

facet counts for the genre_str field
curl "[http://localhost:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.field=genre_str"](http://localhost:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.field=genre_str")
limit the facets to only those that contain a certain number
curl "[http://localhost:8983/solr/films/select?=&q=*:*&facet.field=genre_str&facet.mincount=200&facet=on&rows=0"](http://localhost:8983/solr/films/select?=&q=*:*&facet.field=genre_str&facet.mincount=200&facet=on&rows=0")

Range Facets

numeric range and date range
curl '[http://localhost:8983/solr/films/select?q=*:*&row=0&facet=true&facet.range=initial_release_date&facet.range.start=NOW-20YEAR&facet.range.end=NOW&facet.range.gap=%2B1YEAR'](http://localhost:8983/solr/films/select?q=*:*&row=0&facet=true&facet.range=initial_release_date&facet.range.start=NOW-20YEAR&facet.range.end=NOW&facet.range.gap=%2B1YEAR')

Pivot Facets

“decision trees”,所谓的导航查询
curl "[http://localhost:8983/solr/films/select?q=*:*&rows=0&facet=on&facet.pivot=genre_str,directed_by_str"](http://localhost:8983/solr/films/select?q=*:*&rows=0&facet=on&facet.pivot=genre_str,directed_by_str")

Exercise 3: Index Your Own Data

Create Your Own Collection

./bin/solr create -c localDocs -s 2 -rf 2

Indexing Ideas

Local Files with bin/post

DataImportHandler

数据库导入
or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database.
Data Import Handler (DIH)
The README.txt file in example/example-DIH will give you details on how to start working with this tool.

SolrJ

Java API方式

  1. <dependency>
  2. <groupId>org.apache.solr</groupId>
  3. <artifactId>solr-solrj</artifactId>
  4. <version>8.11.0</version>
  5. </dependency>

Documents Screen

Use the Admin UI Documents tab
image.png

Updating Data

  • uniqueKey 默认id

the example Solr schema (a file named either managed-schema or schema.xml) specifies a uniqueKey field called id. Whenever you POST commands to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you
core-specific Overview section of the Solr Admin UI
image.png

Deleting Data

merely removing documents doesn’t change the underlying field definitions. Essentially, this will allow
you to reindex your data after making changes to fields for your needs.

  • Execute the following command to delete a specific document:

bin/post -c localDocs -d "<delete><id>SP2514N</id></delete>"

  • To delete all documents, you can use “delete-by-query” command like:

bin/post -c localDocs -d "<delete><query>*:*</query></delete>"

new learned Searching

q.op

  • q.op=OR
  • q.op=AND

[http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%2B(id%3A0812521390%20id%3A0441385532)](http://localhost:8983/solr/techproducts/select?indent=true&q.op=OR&q=%2B(id%3A0812521390%20id%3A0441385532))
两个id中间的空格代表And或者Or,由q.op决定

fq

fq
fq are cached independently of the main query. When a later query uses the same filter, there’s a cache hit.
fq=popularity:[10 TO *]&fq=section:0
fq=+popularity:[10 TO *] +section:0

filter

filter
与fq效果一样

  • Support for a special filter(…) syntax to indicate that some query clauses should be cached in the filter cache (as a constant score boolean query). This allows sub-queries to be cached and re-used in other queries. For example inStock:true will be cached and re-used in all three of the queries below:
    • q=features:songs OR filter(inStock:true)
    • q=+manu:Apple +filter(inStock:true)
    • q=+manu:Apple & fq=inStock:true
  • This can even be used to cache individual clauses of complex filter queries. In the first query below, 3 items will be added to the filter cache (the top level fq and both filter(…) clauses) and in the second query, there will be 2 cache hits, and one new cache insertion (for the new top level fq):
    • q=features:songs & fq=+filter(inStock:true) +filter(price:[* TO 100])
    • q=manu:Apple & fq=-filter(inStock:true) -filter(price:[* TO 100])

Demo

(orderType: ("8" OR "1108") AND (*:* NOT deleteDatetime:*) AND (*:* NOT closedDate:*) AND (*:* NOT salesRelDate:*) AND (*:* NOT creditRelDate:*) AND !fromRefType:("370" OR "390") AND boExpireDate:[* TO NOW] )

Other

http://localhost:8983/solr/techproducts/browse