前言:根據Lucene7.0版本介紹Lucene相關知識
Lucene7.0包目錄
Lucene7.0包目錄
Lucene7.0官方文檔
org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a Reader into a TokenStream, an enumeration of token Attributes. A TokenStream can be composed by applying TokenFilters to the output of a Tokenizer. Tokenizers and TokenFilters are strung together and applied with an Analyzer. analyzers-common provides a number of Analyzer implementations, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.codecs provides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.
org.apache.lucene.document provides a simple Document class. A Document is simply a set of named Fields, whose values may be strings or instances of Reader.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (ie TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the IndexSearcher which turns queries into TopDocs. A number of QueryParsers are provided for producing query structures from strings or xml.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, which is a collection of named files written by an IndexOutput and read by an IndexInput. Multiple implementations are provided, including FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
org.apache.lucene.util contains a few handy data structures and util classes, ie FixedBitSet and PriorityQueue.
解釋
analysis:定義了一個分詞器的API抽象類以及提供了一些常用分詞器;分詞器的作業是建立索引過程中,對文本進行分詞,去掉停用詞,轉換成詞根等。(如果想深入了解分詞器推薦《Lucene實戰》的第四章,Lucene的分析過程)
codecs:提供對反向索引結構的編碼和解碼的抽象,以及根據應用需要可選擇的不同實現。
document:提供簡單的文檔類。文檔只是一組命名的Fields,其值可能是字符串或Reader實例。
index:提供了兩個主要的類:IndexWriter,想索引中創建和添加文件;IndexReader,訪問索引數據。
search:提供的數據結構來表示查詢(TermQuery、PhraseQuery、BooleanQuery)并將查詢結果存放到TopDocs中,提供從字符串或xml生成查詢結構的QueryParsers。
store:定義一個抽象類來存儲持久數據,該目錄是由一個IndexOutput編寫的命名文件的集合,并由一個IndexInput讀取。提供了多個實現,包括使用文件系統目錄存儲文件的FSDirectory和將文件作為內存駐留的數據結構實現的RAMDirectory。
util:包含了一些有用的數據結構和工具類。
geo:Lucene核心的地理空間工具實現
注:能力一般,水平有限,如有不當之處,請批評指正,定當虛心接受!