AD

Lucene is very important in the Field class (transfer)

org.apache.lucene.demo.IndexFiles class, using the recursive approach to the index file. In constructing an IndexWriter indexer, you can add to the indexer Doucument, and the implementation of truly indexing process. Traverse each directory, because each directory in the directory may still exist, in-depth traversal, using a recursive technique to find the file in the leaf nodes (ordinary files with extensions such as my.txt file), then call the following code red:

static void indexDocs (IndexWriter writer, File file)
throws IOException (
/ / File can be read
if (file.canRead ()) (
if (file.isDirectory ()) (/ / If the file is a directory (the directory may have the following files, directory files, empty file three cases)
String [] files = file.list (); / / obtain the file directory, all files (including directory files) File object, into an array of files in
/ / If the files! = Null
if (files! = null) (
for (int i = 0; i <files.length; i + +) (/ / on the files inside the File object recursive array index breadth traversal through
indexDocs (writer, new File (file, files [i]));
)
)
) Else (/ / reach the leaf node, that is, a File, not the directory, the index
System.out.println ("adding" + file);
try (
writer.addDocument (FileDocument.Document (file));
)
catch (FileNotFoundException fnfe) (
;
)
)
)
)

Red mark above this one:

writer.addDocument (FileDocument.Document (file));

Has done a lot of work. When leaf nodes recursively to get a file, not a directory file, such as file myWorld.txt. Then this document complex operations:

First under the File object constructed by the myWorld.txt f, by f to obtain myWorld.txt specific information, such as storage path, modification time, etc., constructed several Field object, then the convergence of these different Field, build a Document object Finally, the Document Object added indexer IndexWriter object, the index is available through the Document on the Field of these aggregate information in word segmentation, filtering processing, to facilitate retrieval.

org.apache.lucene.demo.FileDocument class source code is as follows:

package org.apache.lucene.demo;

import java.io.File;
import java.io.FileReader;

import org.apache.lucene.document.DateTools;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

public class FileDocument (
public static Document Document (File f)
throws java.io.FileNotFoundException (

/ / Instantiate a Document
Document doc = new Document ();
/ / According to transfer incoming File f, construct more Field objects, then they are added to the Document of

/ / Where the path through f to construct a Field object, and set some properties of the Field object:
/ / "Path" is to construct the Field's name, the name can be found through the Field
/ / Field.Store.YES that store the Field; Field.Index.UN_TOKENIZED Field said they were not carried out the word, but their indexing for retrieval
doc.add (new Field ("path", f.getPath (), Field.Store.YES, Field.Index.UN_TOKENIZED));

/ / Construct a message with the recent changes in the Field modified
doc.add (new Field ("modified",
DateTools.timeToString (f.lastModified (), DateTools.Resolution.MINUTE),
Field.Store.YES, Field.Index.UN_TOKENIZED));

/ / Construct a Field, the Field can be read from a file stream, must be guaranteed by the construction f the file stream is open
doc.add (new Field ("contents", new FileReader (f)));
return doc;
)

private FileDocument () ()
)

Through the above code, you can see Field is too important to be in total control of the Field.

Field class defines two useful internal static class: Store and Index, use them to set the index on the Field for a number of attributes.

/ / Store is an internal class, it is static, the main storage properties to set the Field
public static final class Store extends Parameter implements Serializable (

private Store (String name) (
super (name);
)

/ / Field in the index value stored in the compressed
public static final Store COMPRESS = new Store ("COMPRESS");

/ / In the index value stored in Field
public static final Store YES = new Store ("YES");

/ / In the index value is not stored in Field
public static final Store NO = new Store ("NO");
)

/ / Set indexed by Index
public static final class Index extends Parameter implements Serializable (

private Index (String name) (
super (name);
)

/ / Field not indexed, so the Field can not be retrieved (in general, but it is not indexed search, which is meaningless)
/ / If the Field has also set up Field.Store to Field.Store.YES or Field.Store.COMPRESS, you can retrieve
public static final Index NO = new Index ("NO");

/ / Indexed on the Field, but also their word segmentation (by the Analyzer to manage how the word)
public static final Index TOKENIZED = new Index ("TOKENIZED");

/ / Indexed on the Field, but not for word of their
public static final Index UN_TOKENIZED = new Index ("UN_TOKENIZED");

/ / Indexed on the Field, but do not use the Analyzer
public static final Index NO_NORMS = new Index ("NO_NORMS");

)

Field class there is an internal class that statement as follows:

public static final class TermVector extends Parameter implements Serializable

This is a term related to the class. Because when the need to specify a search keyword search through a Field to add a TermVector, you can put in the retrieval of the Field retrieved. Its constructor:

private TermVector (String name) (
super (name);
)

By specifying a string, to construct a Field of TermVector, specify the Field is set against the entry, as follows:

/ / Not stored
public static final TermVector NO = new TermVector ("NO");

/ / For each Document is stored a TermVector
public static final TermVector YES = new TermVector ("YES");

/ / Storage, and store location information
public static final TermVector WITH_POSITIONS = new TermVector ("WITH_POSITIONS");

/ / Storage, while the offset information stored
public static final TermVector WITH_OFFSETS = new TermVector ("WITH_OFFSETS");

/ / Storage, while storage location, the offset information
public static final TermVector WITH_POSITIONS_OFFSETS = new TermVector ("WITH_POSITIONS_OFFSETS");
)

At the same time, Field's value can be constructed into many types, Field class has defined 4: String, Reader, byte [], TokenStream.

Then there was the construction of Field object, the constructor should look at it, it has seven kinds of construction methods:

public Field (String name, byte [] value, Store store)
public Field (String name, Reader reader)
public Field (String name, Reader reader, TermVector termVector)
public Field (String name, String value, Store store, Index index)
public Field (String name, String value, Store store, Index index, TermVector termVector)
public Field (String name, TokenStream tokenStream)
public Field (String name, TokenStream tokenStream, TermVector termVector)

Also note, and through the Field class declaration:

public final class Field extends AbstractField implements Fieldable, Serializable

Can be seen, should it inherit the parent class has a class AbstractField understanding, the following is AbstractField class properties:

protected String name = "body";
protected boolean storeTermVector = false;
protected boolean storeOffsetWithTermVector = false;
protected boolean storePositionWithTermVector = false;
protected boolean omitNorms = false;
protected boolean isStored = false;
protected boolean isIndexed = true;
protected boolean isTokenized = true;
protected boolean isBinary = false;
protected boolean isCompressed = false;
protected boolean lazy = false;
protected float boost = 1.0f;
protected Object fieldsData = null;

Field also realized Fieldable interface, added some of the corresponding Document management in the Field method to judge the information.
标签: lt, implementation, quot, demo, breadth, static void, array index, document object, convergence, index file, indexer, apache lucene, directory files, modification time, files directory, leaf nodes, index system, leaf node, recursive approach, myworld
分类: Internet
时间: 2010-09-07

相关文章

  1. Lucene (2.4.1) Technology (4) - Field Source

    Field class (similar to the database fields and properties) This class implements the interfaces: Fieldable in ...
  2. Summary of the eight learning Lucene: Lucene's query syntax, JavaCC and QueryParser (1) transfer

    1, Lucene's query syntax Lucene query syntax supported by visible http://lucene.apache.org/java/3_0_1/querypar ...
  3. Summary of the eight learning Lucene: Lucene's query syntax, JavaCC and QueryParser (2) transfer

    Third, analytical QueryParser.jj 3.1 Statement QueryParser class In QueryParser.jj file, PARSER_BEGIN (QueryPa ...
  4. struts2 xml validation appear Invalid field value for field solutions (transfer)

    By default, all of the equipment used for general i18n error message xwork.default.invalid.fieldvalue, you can ...
  5. python template engine in the field of transfer

    GenShi Features: interpreted template, simply for the xml, streaming processing mechanism, can be embedded pyt ...
  6. mysql field type (transfer)

    MySQL supports a number of column types, which can be divided into three categories: numeric types, date and t ...
  7. Lucene in Action (Simplified Chinese)

    A total of 10 part of the first part of the Lucene core 1. Contact Lucene 2. Index 3. To add a search procedur ...
  8. lucene standardized factor norm

    The lucene series of articles taken from forfuture1978 to understand the standardization factor for the partic ...
  9. Translation: Lucene Syntax (lucene query syntax explain) (change)

    Original Address Lucene provides a rich API to combine the query you need custom device, but can also use Quer ...
  10. Detailed reference Lucene query syntax

    Quote Age of the Lucene query syntax Detailed Lucene provides a rich API to combine the query you need to cust ...
  11. I understand the principles of the lucene (primary)

    1 beginning with the first piece of code, analyzing the simplest lucene code of Hello World package cn.itcast. ...
  12. Lucene 3.0.2 Query.setBoost () problem

    In Lucene 3.0.2, in the Field, Document and Query in both setBoost interface, but why set the boost value in t ...
  13. Annotated Lucene (source analysis of the Chinese Version)

    Apache Lucene is a high performance (high-performance) of the all-powerful full-text search (full-featured tex ...
  14. lucene indexing process of core classes

    IndexWriter IndexWriter is a core component of the indexing process. Used to create a new document added to an ...
  15. lucene core classes in the indexing process

    IndexWriter IndexWriter is a core component of the indexing process. Used to create a new index and the docume ...
  16. Elasticsearch 的坑爹事--记录一次mapping field修改过程

    Elasticsearch 的坑爹事 本文记录一次Elasticsearch mapping field修改过程 团队使用Elasticsearch做日志的分类检索分析服务,使用了类似如下的_mapping 1 2 3 ...
  17. Lucene 入门整理

    1. 概述 Lucene是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎.Lucene以其方便使用.快速实施以及灵活性受到广泛的关注.它可以方便地嵌入到各种应用中实现针对应用的全文索引.检索功能,本总结使用luc ...
  18. Lucene包结构

    1.Lucene提供了完整的查询引擎和索引引擎. 2.Lucene的jar包结构: 1)org.apache.lucene.analysis 对需要建立索引的文本进行分词.过滤等操作, 语言分析器,主要用于的切词Anal ...
  19. Lucene 4.7 教程起步--创建索引

    欢迎光临我的个人网站,CSDN更多以工具文章为主,个人网站里会有更多关于编程思维等多方面的文章 http://blog.guaidm.com/shocky .从产品设计css一直做到linux运维的程序猿... 很多技术 ...