Step 2 : Lucene concept Step 3 : Run... First , See the effect , Learn again Step 4 : Imitation and troubleshooting Step 5 : Lucene edition Step 6 : jar Bag Step 7 : TestLucene.java Step 8 : Tokenizer Step 9 : Create index Step 10 : Create a query Step 11 : Perform a search Step 12 : Display query results Step 13 : Operation results Step 14 : and like The difference between Step 15 : Train of thought diagram
At least use JDK8 edition , Please download JDK8 Or later :
Download and configure JDK environment
Lucene This open source project , Make Java Developers can easily get things like search engines google baidu That kind of search effect .
The old rule , First download the runnable project in the upper right corner , Configure to run , After confirming availability , Then learn what steps have been taken to achieve this effect .
function TestLucene class , Expect to see the effect shown in the figure . Altogether 10 Data bar , Query by keyword 6 The result of life , Different hit results have different match scores , For example, the first , Hits are high , Existing Eye protection , Also With light source . Other hits are relatively low , No matching of eye protection keywords , only light source Keyword matching .
After ensuring that the runnable project can run correctly , Then strictly follow the steps in the tutorial , Imitate the code again .
The imitation process inevitably has code differences , As a result, the expected operation results cannot be obtained , At this moment, by comparison The correct answer ( Runnable project ) And your own code , To locate the problem . In this way , Learning is effective , Troubleshooting is efficient , It can obviously improve the learning speed , Across all the barriers on the way to learning . It is recommended to use diffmerge Software , Compare folders . Put your own project folder , Compare with my runnable project folder . This software is awesome , You can know which two files in the folder are wrong , And clearly marked Here is a green installation and use tutorial : diffmerge Download and use tutorials
Currently used Lucene The version is as of 2018.3.9 The latest version 7.2.1
A series of required jar The bags are all in the project , Just use it directly , Including compatibility lucene 7.2.1 Chinese word splitter
This is TestLucene.java Complete code , The code will be explained in detail later
package com.how2java; import java.io.IOException; import java.io.StringReader; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.TextField; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexableField; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.highlight.Highlighter; import org.apache.lucene.search.highlight.QueryScorer; import org.apache.lucene.search.highlight.SimpleHTMLFormatter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.wltea.analyzer.lucene.IKAnalyzer; public class TestLucene { public static void main(String[] args) throws Exception { // 1. Prepare Chinese word splitter IKAnalyzer analyzer = new IKAnalyzer(); // 2. Indexes List<String> productNames = new ArrayList<>(); productNames.add(" Philips led Light bulb: e27 Screw mouth warm white bulb lamp household lighting super bright energy-saving bulb to color temperature bulb "); productNames.add(" Philips led Light bulb: e14 Screw candle bulb 3W Sharp bubble tail pull energy-saving bulb warm yellow light source Lamp"); productNames.add(" Rex lighting LED Light bulb: e27 Large screw mouth energy-saving lamp 3W Bulb Lamp led Energy saving light bulbs "); productNames.add(" Philips led Light bulb: e27 Screw mouth household 3w Warm white bulb lamp, energy-saving lamp 5W Light bulb: LED Single lamp 7w"); productNames.add(" Philips led Vesicle e14 Screw mouth 4.5w Transparent style led Energy saving bulb lighting source lamp Single lamp "); productNames.add(" Philips dandelion eye protection table lamp work, learn to read energy-saving lamps 30508 With light source "); productNames.add(" Opp lighting led Light bulbs, candles, energy-saving light bulbs e14 Screw mouth bulb lamp super bright lighting single lamp light source "); productNames.add(" Opp lighting led Energy saving bulb super bright light source e14e27 Spiral mouth small ball bubble warm yellow "); productNames.add(" Juop lighting led Energy saving light bulb e27 Screw mouth ball bubble led Lighting single lamp super bright light source "); Directory index = createIndex(analyzer, productNames); // 3. Interrogator String keyword = " Eye protection with light source "; Query query = new QueryParser("name", analyzer).parse(keyword); // 4. Search IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); int numberPerPage = 1000; System.out.printf(" At present, there are %d Data bar %n",productNames.size()); System.out.printf(" The query keyword is :\"%s\"%n",keyword); ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs; // 5. Display query results showSearchResults(searcher, hits, query, analyzer); // 6. Close query reader.close(); } private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception { System.out.println(" find " + hits.length + " Hits ."); System.out.println(" Serial number \t Match score \t result "); for (int i = 0; i < hits.length; ++i) { ScoreDoc scoreDoc= hits[i]; int docId = scoreDoc.doc; Document d = searcher.doc(docId); List<IndexableField> fields = d.getFields(); System.out.print((i + 1)); System.out.print("\t" + scoreDoc.score); for (IndexableField f : fields) { System.out.print("\t" + d.get(f.name())); } System.out.println(); } } private static Directory createIndex(IKAnalyzer analyzer, List<String> products) throws IOException { Directory index = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter writer = new IndexWriter(index, config); for (String name : products) { addDoc(writer, name); } writer.close(); return index; } private static void addDoc(IndexWriter w, String name) throws IOException { Document doc = new Document(); doc.add(new TextField("name", name, Field.Store.YES)); w.addDocument(doc); } }
Prepare Chinese word splitter , More concepts about word segmentation are in
Word breaker concept It is explained in detail in , Let's start with
// 1. Prepare Chinese word splitter IKAnalyzer analyzer = new IKAnalyzer();
// 1. Prepare Chinese word splitter IKAnalyzer analyzer = new IKAnalyzer();
1. First prepare 10 Data bar
this 10 Each piece of data is a string , Equivalent to the data in the product table 2. Through createIndex method , Add it to the index Create memory index , Why? Lucene Will be faster than the database ? Because it checks from memory , Naturally, it's much faster than in the database Directory index = new RAMDirectory(); Create configuration object according to Chinese word segmentation IndexWriterConfig config = new IndexWriterConfig(analyzer); Create index writer IndexWriter writer = new IndexWriter(index, config); Traverse that 10 Data bar , Put them in the index one by one for (String name : products) { addDoc(writer, name); } Create one for each data Document, And put this Document Put it in the index . this Document There is a field , be called "name". TestLucene.java The first 49 Row create query , You will specify the query field private static void addDoc(IndexWriter w, String name) throws IOException { Document doc = new Document(); doc.add(new TextField("name", name, Field.Store.YES)); w.addDocument(doc); }
// 2. Indexes List<String> productNames = new ArrayList<>(); productNames.add(" Philips led Light bulb: e27 Screw mouth warm white bulb lamp household lighting super bright energy-saving bulb to color temperature bulb "); productNames.add(" Philips led Light bulb: e14 Screw candle bulb 3W Sharp bubble tail pull energy-saving bulb warm yellow light source Lamp"); productNames.add(" Rex lighting LED Light bulb: e27 Large screw mouth energy-saving lamp 3W Bulb Lamp led Energy saving light bulbs "); productNames.add(" Philips led Light bulb: e27 Screw mouth household 3w Warm white bulb lamp, energy-saving lamp 5W Light bulb: LED Single lamp 7w"); productNames.add(" Philips led Vesicle e14 Screw mouth 4.5w Transparent style led Energy saving bulb lighting source lamp Single lamp "); productNames.add(" Philips dandelion eye protection table lamp work, learn to read energy-saving lamps 30508 With light source "); productNames.add(" Opp lighting led Light bulbs, candles, energy-saving light bulbs e14 Screw mouth bulb lamp super bright lighting single lamp light source "); productNames.add(" Opp lighting led Energy saving bulb super bright light source e14e27 Spiral mouth small ball bubble warm yellow "); productNames.add(" Juop lighting led Energy saving light bulb e27 Screw mouth ball bubble led Lighting single lamp super bright light source "); Directory index = createIndex(analyzer, productNames);
private static Directory createIndex(IKAnalyzer analyzer, List<String> products) throws IOException { Directory index = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter writer = new IndexWriter(index, config); for (String name : products) { addDoc(writer, name); } writer.close(); return index; }
private static void addDoc(IndexWriter w, String name) throws IOException { Document doc = new Document(); doc.add(new TextField("name", name, Field.Store.YES)); w.addDocument(doc); }
According to the keyword
Eye protection with light source , be based on "name" Field . this "name" The field is in
Create index Each step Document of "name" field , Equivalent to the field name of the table
String keyword = " Eye protection with light source "; Query query = new QueryParser("name", analyzer).parse(keyword);
String keyword = " Eye protection with light source "; Query query = new QueryParser("name", analyzer).parse(keyword);
Then perform a search :
Create index reader: IndexReader reader = DirectoryReader.open(index); be based on reader Create a searcher : IndexSearcher searcher = new IndexSearcher(reader); Specify how many pieces of data to display per page : int numberPerPage = 1000; Perform a search ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
// 4. Search IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); int numberPerPage = 1000; System.out.printf(" At present, there are %d Data bar %n",productNames.size()); System.out.printf(" The query keyword is :\"%s\"%n",keyword); ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
// 4. Search IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); int numberPerPage = 1000; System.out.printf(" At present, there are %d Data bar %n",productNames.size()); System.out.printf(" The query keyword is :\"%s\"%n",keyword); ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
every last ScoreDoc[] hits It's a search result , First, traverse him out
for (int i = 0; i < hits.length; ++i) { ScoreDoc scoreDoc= hits[i]; Then get the current result docid, this docid This is the primary key of the data in the index int docId = scoreDoc.doc; Then according to the primary key docid, Search the corresponding... From the index through the searcher Document take out Document d = searcher.doc(docId); Then print out this Document The data inside . Although currently Document only name A field , But the code is still in the form of traversing all fields , Print out the value inside , So when Docment When there are multiple fields , The code doesn't have to be modified , Better compatibility . scoreDoc.score Indicates the match score of the current hit , The higher the, the higher the degree of matching List<IndexableField> fields = d.getFields(); System.out.print((i + 1)); System.out.print("\t" + scoreDoc.score); for (IndexableField f : fields) { System.out.print("\t" + d.get(f.name())); }
private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception { System.out.println(" find " + hits.length + " Hits ."); System.out.println(" Serial number \t Match score \t result "); for (int i = 0; i < hits.length; ++i) { ScoreDoc scoreDoc= hits[i]; int docId = scoreDoc.doc; Document d = searcher.doc(docId); List<IndexableField> fields = d.getFields(); System.out.print((i + 1)); System.out.print("\t" + scoreDoc.score); for (IndexableField f : fields) { System.out.print("\t" + d.get(f.name())); } System.out.println(); } }
private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception { System.out.println(" find " + hits.length + " Hits ."); System.out.println(" Serial number \t Match score \t result "); for (int i = 0; i < hits.length; ++i) { ScoreDoc scoreDoc= hits[i]; int docId = scoreDoc.doc; Document d = searcher.doc(docId); List<IndexableField> fields = d.getFields(); System.out.print((i + 1)); System.out.print("\t" + scoreDoc.score); for (IndexableField f : fields) { System.out.print("\t" + d.get(f.name())); } System.out.println(); } }
As shown in the figure , Altogether 10 Data bar , Query by keyword 6 The result of life , Different hit results have different match scores , For example, the first , Hits are high , Existing
Eye protection , Also
With light source . Other hits are relatively low , No matching of eye protection keywords , only
light source Keyword matching .
like You can also query , Then use lucene What's the difference between the way ? There are two main points :
1. Correlation degree Through observation Operation results , You can see that the results of different correlation degrees will be queried , But use like, You can't do that 2. performance When the amount of data is small ,like Will also have a good performance , But the amount of data is huge ,like Your performance is much worse . In the next tutorial, we will demonstrate the right 14 Ten thousand pieces of data Query of
Now I did it by myself Lucene Yes , With perceptual knowledge , Then let's tidy up and do Lucene Ideas .
1. First collect data Data can be file systems , database , On the network , Manually entered , Or write directly in memory as in this example 2. Create an index from data 3. The user enters a keyword 4. Create a query by keyword 5. Get data from the query to the index 6. Then display the query results in front of the user
The official account of programming , Follow and get the latest tutorials and promotions in real time , thank you .
![]()
Q & A area
2020-10-05
Just like the students in front , I also read the basic concept outside the station, and then I'll read it again here , The idea is clearer
The answer has been submitted successfully , Auditing . Please
My answer Check the answer record at , thank you
2019-12-19
Based only on the current tutorial , Welcome to correct
The answer has been submitted successfully , Auditing . Please
My answer Check the answer record at , thank you
2019-05-13
Personal thoughts
2019-04-18
stationmaster , The index is created here using a word splitter , Create a word breaker for the query , Display the search results with a word splitter . These three steps must be word segmentation , Or is it the rule .
2019-02-17
Obviously 9 A piece of data ..
Too many questions , Page rendering is too slow , To speed up rendering , Only a few questions are displayed on this page at most . also 1 Previous questions , please Click to view
Please... Before asking questions land
The question has been submitted successfully , Auditing . Please
My question Check the question record at , thank you
|