Class LuceneAnalyzerUtil
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.nutch.scoring.similarity.util.LuceneAnalyzerUtil
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class LuceneAnalyzerUtil extends org.apache.lucene.analysis.AnalyzerCreates a custom analyzer based on user provided inputs
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classLuceneAnalyzerUtil.StemFilterType
-
Constructor Summary
Constructors Constructor Description LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, boolean useStopFilter)Creates an analyzer instance based on Lucene default stopword set if the param useStopFilter is set to trueLuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, List<String> stopWords, boolean addToDefault)Creates an analyzer instance based on user provided stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.lucene.analysis.Analyzer.TokenStreamComponentscreateComponents(String fieldName)
-
-
-
Constructor Detail
-
LuceneAnalyzerUtil
public LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, boolean useStopFilter)
Creates an analyzer instance based on Lucene default stopword set if the param useStopFilter is set to true- Parameters:
stemFilterType- a preferredLuceneAnalyzerUtil.StemFilterTypeto use. Can be one ofLuceneAnalyzerUtil.StemFilterType.PORTERSTEM_FILTER,LuceneAnalyzerUtil.StemFilterType.ENGLISHMINIMALSTEM_FILTER, orLuceneAnalyzerUtil.StemFilterType.NONEuseStopFilter- if true use the default Lucene stopword set, false otherwise
-
LuceneAnalyzerUtil
public LuceneAnalyzerUtil(LuceneAnalyzerUtil.StemFilterType stemFilterType, List<String> stopWords, boolean addToDefault)
Creates an analyzer instance based on user provided stop words. If the param addToDefault is set to true, then user provided stop words will be added to the Lucene default stopset.- Parameters:
stemFilterType- a preferredLuceneAnalyzerUtil.StemFilterTypeto use. Can be one ofLuceneAnalyzerUtil.StemFilterType.PORTERSTEM_FILTER,LuceneAnalyzerUtil.StemFilterType.ENGLISHMINIMALSTEM_FILTER, orLuceneAnalyzerUtil.StemFilterType.NONEstopWords- aListof stop word StringsaddToDefault- if true the provided stop words will be added to the default Lucene stopword set, false otherwise
-
-
Method Detail
-
createComponents
protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName)
- Specified by:
createComponentsin classorg.apache.lucene.analysis.Analyzer
-
-