inotgo.com

Step 1 : about JDK edition   
Step 2 : Lucene concept   
Step 3 : Run... First , See the effect , Learn again   
Step 4 : Imitation and troubleshooting   
Step 5 : Lucene edition   
Step 6 : jar Bag   
Step 7 : TestLucene.java   
Step 8 : Tokenizer   
Step 9 : Create index   
Step 10 : Create a query   
Step 11 : Perform a search   
Step 12 : Display query results   
Step 13 : Operation results   
Step 14 : and like The difference between   
Step 15 : Train of thought diagram   

At least use JDK8 edition , Please download JDK8 Or later : Download and configure JDK environment
Lucene This open source project , Make Java Developers can easily get things like search engines google baidu That kind of search effect .
Step 3 :

Run... First , See the effect , Learn again

edit top fracture
The old rule , First download the runnable project in the upper right corner , Configure to run , After confirming availability , Then learn what steps have been taken to achieve this effect .
function TestLucene class , Expect to see the effect shown in the figure .
Altogether 10 Data bar , Query by keyword 6 The result of life , Different hit results have different match scores , For example, the first , Hits are high , Existing Eye protection , Also With light source . Other hits are relatively low , No matching of eye protection keywords , only light source Keyword matching .
 Run... First , See the effect , Learn again
Step 4 :

Imitation and troubleshooting

edit top fracture
After ensuring that the runnable project can run correctly , Then strictly follow the steps in the tutorial , Imitate the code again .
The imitation process inevitably has code differences , As a result, the expected operation results cannot be obtained , At this moment, by comparison The correct answer ( Runnable project ) And your own code , To locate the problem .
In this way , Learning is effective , Troubleshooting is efficient , It can obviously improve the learning speed , Across all the barriers on the way to learning .

It is recommended to use diffmerge Software , Compare folders . Put your own project folder , Compare with my runnable project folder .
This software is awesome , You can know which two files in the folder are wrong , And clearly marked
Here is a green installation and use tutorial : diffmerge Download and use tutorials
Currently used Lucene The version is as of 2018.3.9 The latest version 7.2.1
A series of required jar The bags are all in the project , Just use it directly , Including compatibility lucene 7.2.1 Chinese word splitter
jar  Bag
This is TestLucene.java Complete code , The code will be explained in detail later
package com.how2java; import java.io.IOException; import java.io.StringReader; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.TextField; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexableField; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.highlight.Highlighter; import org.apache.lucene.search.highlight.QueryScorer; import org.apache.lucene.search.highlight.SimpleHTMLFormatter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.wltea.analyzer.lucene.IKAnalyzer; public class TestLucene { public static void main(String[] args) throws Exception { // 1. Prepare Chinese word splitter IKAnalyzer analyzer = new IKAnalyzer(); // 2. Indexes List<String> productNames = new ArrayList<>(); productNames.add(" Philips led Light bulb: e27 Screw mouth warm white bulb lamp household lighting super bright energy-saving bulb to color temperature bulb "); productNames.add(" Philips led Light bulb: e14 Screw candle bulb 3W Sharp bubble tail pull energy-saving bulb warm yellow light source Lamp"); productNames.add(" Rex lighting LED Light bulb: e27 Large screw mouth energy-saving lamp 3W Bulb Lamp led Energy saving light bulbs "); productNames.add(" Philips led Light bulb: e27 Screw mouth household 3w Warm white bulb lamp, energy-saving lamp 5W Light bulb: LED Single lamp 7w"); productNames.add(" Philips led Vesicle e14 Screw mouth 4.5w Transparent style led Energy saving bulb lighting source lamp Single lamp "); productNames.add(" Philips dandelion eye protection table lamp work, learn to read energy-saving lamps 30508 With light source "); productNames.add(" Opp lighting led Light bulbs, candles, energy-saving light bulbs e14 Screw mouth bulb lamp super bright lighting single lamp light source "); productNames.add(" Opp lighting led Energy saving bulb super bright light source e14e27 Spiral mouth small ball bubble warm yellow "); productNames.add(" Juop lighting led Energy saving light bulb e27 Screw mouth ball bubble led Lighting single lamp super bright light source "); Directory index = createIndex(analyzer, productNames); // 3. Interrogator String keyword = " Eye protection with light source "; Query query = new QueryParser("name", analyzer).parse(keyword); // 4. Search IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); int numberPerPage = 1000; System.out.printf(" At present, there are %d Data bar %n",productNames.size()); System.out.printf(" The query keyword is :\"%s\"%n",keyword); ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs; // 5. Display query results showSearchResults(searcher, hits, query, analyzer); // 6. Close query reader.close(); } private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception { System.out.println(" find " + hits.length + " Hits ."); System.out.println(" Serial number \t Match score \t result "); for (int i = 0; i < hits.length; ++i) { ScoreDoc scoreDoc= hits[i]; int docId = scoreDoc.doc; Document d = searcher.doc(docId); List<IndexableField> fields = d.getFields(); System.out.print((i + 1)); System.out.print("\t" + scoreDoc.score); for (IndexableField f : fields) { System.out.print("\t" + d.get(f.name())); } System.out.println(); } } private static Directory createIndex(IKAnalyzer analyzer, List<String> products) throws IOException { Directory index = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter writer = new IndexWriter(index, config); for (String name : products) { addDoc(writer, name); } writer.close(); return index; } private static void addDoc(IndexWriter w, String name) throws IOException { Document doc = new Document(); doc.add(new TextField("name", name, Field.Store.YES)); w.addDocument(doc); } }
Prepare Chinese word splitter , More concepts about word segmentation are in Word breaker concept It is explained in detail in , Let's start with
// 1. Prepare Chinese word splitter IKAnalyzer analyzer = new IKAnalyzer();
// 1.  Prepare Chinese word splitter 
IKAnalyzer analyzer = new IKAnalyzer();
1. First prepare 10 Data bar
this 10 Each piece of data is a string , Equivalent to the data in the product table
2. Through createIndex method , Add it to the index

Create memory index , Why? Lucene Will be faster than the database ? Because it checks from memory , Naturally, it's much faster than in the database

Directory index = new RAMDirectory();

Create configuration object according to Chinese word segmentation

IndexWriterConfig config = new IndexWriterConfig(analyzer);

Create index writer

IndexWriter writer = new IndexWriter(index, config);

Traverse that 10 Data bar , Put them in the index one by one

for (String name : products) {
addDoc(writer, name);
}

Create one for each data Document, And put this Document Put it in the index . this Document There is a field , be called "name". TestLucene.java The first 49 Row create query , You will specify the query field

private static void addDoc(IndexWriter w, String name) throws IOException {
Document doc = new Document();
doc.add(new TextField("name", name, Field.Store.YES));
w.addDocument(doc);
}
// 2. Indexes List<String> productNames = new ArrayList<>(); productNames.add(" Philips led Light bulb: e27 Screw mouth warm white bulb lamp household lighting super bright energy-saving bulb to color temperature bulb "); productNames.add(" Philips led Light bulb: e14 Screw candle bulb 3W Sharp bubble tail pull energy-saving bulb warm yellow light source Lamp"); productNames.add(" Rex lighting LED Light bulb: e27 Large screw mouth energy-saving lamp 3W Bulb Lamp led Energy saving light bulbs "); productNames.add(" Philips led Light bulb: e27 Screw mouth household 3w Warm white bulb lamp, energy-saving lamp 5W Light bulb: LED Single lamp 7w"); productNames.add(" Philips led Vesicle e14 Screw mouth 4.5w Transparent style led Energy saving bulb lighting source lamp Single lamp "); productNames.add(" Philips dandelion eye protection table lamp work, learn to read energy-saving lamps 30508 With light source "); productNames.add(" Opp lighting led Light bulbs, candles, energy-saving light bulbs e14 Screw mouth bulb lamp super bright lighting single lamp light source "); productNames.add(" Opp lighting led Energy saving bulb super bright light source e14e27 Spiral mouth small ball bubble warm yellow "); productNames.add(" Juop lighting led Energy saving light bulb e27 Screw mouth ball bubble led Lighting single lamp super bright light source "); Directory index = createIndex(analyzer, productNames);
private static Directory createIndex(IKAnalyzer analyzer, List<String> products) throws IOException { Directory index = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter writer = new IndexWriter(index, config); for (String name : products) { addDoc(writer, name); } writer.close(); return index; }
private static void addDoc(IndexWriter w, String name) throws IOException { Document doc = new Document(); doc.add(new TextField("name", name, Field.Store.YES)); w.addDocument(doc); }
According to the keyword Eye protection with light source , be based on "name" Field . this "name" The field is in Create index Each step Document of "name" field , Equivalent to the field name of the table
String keyword = " Eye protection with light source "; Query query = new QueryParser("name", analyzer).parse(keyword);
		String keyword = " Eye protection with light source ";
		Query query = new QueryParser("name", analyzer).parse(keyword);
Then perform a search :
Create index reader:

IndexReader reader = DirectoryReader.open(index);

be based on reader Create a searcher :

IndexSearcher searcher = new IndexSearcher(reader);

Specify how many pieces of data to display per page :

int numberPerPage = 1000;

Perform a search

ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
// 4. Search IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); int numberPerPage = 1000; System.out.printf(" At present, there are %d Data bar %n",productNames.size()); System.out.printf(" The query keyword is :\"%s\"%n",keyword); ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
		// 4.  Search 
		IndexReader reader = DirectoryReader.open(index);
		IndexSearcher searcher = new IndexSearcher(reader);
		int numberPerPage = 1000;
		System.out.printf(" At present, there are %d Data bar %n",productNames.size());
		System.out.printf(" The query keyword is :\"%s\"%n",keyword);
		ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
every last ScoreDoc[] hits It's a search result , First, traverse him out

for (int i = 0; i < hits.length; ++i) {
ScoreDoc scoreDoc= hits[i];

Then get the current result docid, this docid This is the primary key of the data in the index

int docId = scoreDoc.doc;

Then according to the primary key docid, Search the corresponding... From the index through the searcher Document take out

Document d = searcher.doc(docId);

Then print out this Document The data inside . Although currently Document only name A field , But the code is still in the form of traversing all fields , Print out the value inside , So when Docment When there are multiple fields , The code doesn't have to be modified , Better compatibility .
scoreDoc.score Indicates the match score of the current hit , The higher the, the higher the degree of matching

List<IndexableField> fields = d.getFields();
System.out.print((i + 1));
System.out.print("\t" + scoreDoc.score);
for (IndexableField f : fields) {
System.out.print("\t" + d.get(f.name()));
}
private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception { System.out.println(" find " + hits.length + " Hits ."); System.out.println(" Serial number \t Match score \t result "); for (int i = 0; i < hits.length; ++i) { ScoreDoc scoreDoc= hits[i]; int docId = scoreDoc.doc; Document d = searcher.doc(docId); List<IndexableField> fields = d.getFields(); System.out.print((i + 1)); System.out.print("\t" + scoreDoc.score); for (IndexableField f : fields) { System.out.print("\t" + d.get(f.name())); } System.out.println(); } }
	private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer)
			throws Exception {
		System.out.println(" find  " + hits.length + "  Hits .");
		System.out.println(" Serial number \t Match score \t result ");
		for (int i = 0; i < hits.length; ++i) {
			ScoreDoc scoreDoc= hits[i];
			int docId = scoreDoc.doc;
			Document d = searcher.doc(docId);
			List<IndexableField> fields = d.getFields();
			System.out.print((i + 1));
			System.out.print("\t" + scoreDoc.score);
			for (IndexableField f : fields) {
				System.out.print("\t" + d.get(f.name()));
			}
			System.out.println();
		}
	}
As shown in the figure , Altogether 10 Data bar , Query by keyword 6 The result of life , Different hit results have different match scores , For example, the first , Hits are high , Existing Eye protection , Also With light source . Other hits are relatively low , No matching of eye protection keywords , only light source Keyword matching .
 Operation results
Step 14 :

and like The difference between

edit top fracture
like You can also query , Then use lucene What's the difference between the way ? There are two main points :
1. Correlation degree
Through observation Operation results , You can see that the results of different correlation degrees will be queried , But use like, You can't do that
2. performance
When the amount of data is small ,like Will also have a good performance , But the amount of data is huge ,like Your performance is much worse . In the next tutorial, we will demonstrate the right 14 Ten thousand pieces of data Query of
Now I did it by myself Lucene Yes , With perceptual knowledge , Then let's tidy up and do Lucene Ideas .
1. First collect data
Data can be file systems , database , On the network , Manually entered , Or write directly in memory as in this example
2. Create an index from data
3. The user enters a keyword
4. Create a query by keyword
5. Get data from the query to the index
6. Then display the query results in front of the user
 Train of thought diagram


The official account of programming , Follow and get the latest tutorials and promotions in real time , thank you .


Q & A area    
2020-10-05 Just like the students in front , I also read the basic concept outside the station, and then I'll read it again here , The idea is clearer
sparksun007

Not that the webmaster doesn't understand , Indeed, many basic concepts are not clearly explained here , It's really obscure for beginners . This blog post I refer to : https://blog.csdn.net/weixin_42633131/article/details/82873731 Very basic , Added inotgo.com Explain the basic concept of , In this way, you can see what each class does , Then know the inverted indexing technology , The establishment of various indexes 、 The query process is very clear , The logic of the code is easier to read and write , I hope the webmaster can absorb the teaching methods when compiling the tutorial , I don't mean to complain , I really have a feeling. I hope you can understand , We all want to learn well , I believe the webmaster also wants to make sense ~ In my personal experience, I think it can help later people learn more , The webmaster hopes to use it for reference !




1 One answer

gjian707
Answer time :2021-06-24
A good man lives safely



The answer has been submitted successfully , Auditing . Please My answer Check the answer record at , thank you
answer Or code please Fill in at least one , If you have a problem , Please ask again , Otherwise, the webmaster may not see




2019-12-19 Based only on the current tutorial , Welcome to correct
hx1176406648




Documents There is a sentence in the class :Documents are the unit of indexing and search. Search is based on these Document ,Document Is a set of fields , The data to be queried is stored in the field . Create an index output stream configuration object through a word splitter , Then create the index output stream , The index output stream is used to write data to the index . Create a text field for each data , Put this field in Document Inside , And then Document Write into the index . Through DirectoryReader Open the index input stream of the index , Create searcher based on index input stream , Search according to the query and get the results ScoreDoc Array . Each... In the index Document All have unique identifiers , and ScoreDoc The unique identifier and the clause are stored in Document The matching hit score of a specified query field and query keyword in .
 Loading
 
               
 
               





The answer has been submitted successfully , Auditing . Please My answer Check the answer record at , thank you
answer Or code please Fill in at least one , If you have a problem , Please ask again , Otherwise, the webmaster may not see





2019-05-13 Personal thoughts
2019-04-18 stationmaster , The index is created here using a word splitter , Create a word breaker for the query , Display the search results with a word splitter . These three steps must be word segmentation , Or is it the rule .
2019-02-17 Obviously 9 A piece of data ..


Too many questions , Page rendering is too slow , To speed up rendering , Only a few questions are displayed on this page at most . also 1 Previous questions , please Click to view

Please... Before asking questions land
The question has been submitted successfully , Auditing . Please My question Check the question record at , thank you
about Tools and Middleware - Search engine technology - introduction Your questions

Try to provide Screenshot code and Abnormal information , Help to analyze and solve problems . You can also enter this station QQ Group communication : 496725845
Ask questions and try to provide complete code , Environment description , The more conducive to the recurrence of the problem , The faster your question can be answered .
Have questions about the code in the tutorial , Please provide which step , Which line is in doubt , This makes it easy to quickly locate the problem , Improve the speed at which questions are answered
In the thousands of questions that already exist , A considerable proportion , Because of the use of and webmaster Different versions of the development environment Resulting in , For example jdk, eclpise, idea, mysql,tomcat Wait, the version of the software is inconsistent .
Please use the same version as the webmaster , You can save yourself a lot of learning time . The webmaster sorted out the software versions used in teaching , It's all here , Convenient for everyone to download : /k/helloworld/helloworld-version/1718.html

Upload screenshot