Class StaticFieldIndexer
- java.lang.Object
-
- org.apache.nutch.indexer.staticfield.StaticFieldIndexer
-
- All Implemented Interfaces:
Configurable,IndexingFilter,Pluggable
public class StaticFieldIndexer extends Object implements IndexingFilter
A simple plugin called at indexing that adds fields with static data. You can specify a list of fieldname:fieldcontent per nutch job. It can be useful when collections can't be created by urlpatterns, like in subcollection, but on a job-basis.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description StaticFieldIndexer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocumentfilter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheStaticFieldIndexerfilter object which adds fields as per configuration setting.ConfigurationgetConf()Get theConfigurationobjectprotected StringregexEscape(String in)Escapes any character that needs escaping so it can be used in a regexp.voidsetConf(Configuration conf)Set theConfigurationobject
-
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheStaticFieldIndexerfilter object which adds fields as per configuration setting. Seeindex.staticin nutch-default.xml.- Specified by:
filterin interfaceIndexingFilter- Parameters:
doc- TheNutchDocumentobjectparse- The relevantParseobject passing through the filterurl- URL to be filtered for anchor textdatum- TheCrawlDatumentryinlinks- TheInlinkscontaining anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException- if an error occurs during during filtering
-
setConf
public void setConf(Configuration conf)
Set theConfigurationobject- Specified by:
setConfin interfaceConfigurable
-
getConf
public Configuration getConf()
Get theConfigurationobject- Specified by:
getConfin interfaceConfigurable
-
-