Class MetadataIndexer
- java.lang.Object
-
- org.apache.nutch.indexer.metadata.MetadataIndexer
-
- All Implemented Interfaces:
Configurable,IndexingFilter,Pluggable
public class MetadataIndexer extends Object implements IndexingFilter
Indexer which can be configured to extract metadata from the crawldb, parse metadata or content metadata. You can specify the properties "index.db.md", "index.parse.md" or "index.content.md" who's values are comma-delimitedkey1,key2,key3.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description MetadataIndexer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidadd(NutchDocument doc, String key, String value)NutchDocumentfilter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)Adds fields or otherwise modifies the document that will be indexed for a parse.ConfigurationgetConf()voidsetConf(Configuration conf)
-
-
-
Method Detail
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Description copied from interface:IndexingFilterAdds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.- Specified by:
filterin interfaceIndexingFilter- Parameters:
doc- document instance for collecting fieldsparse- parse data instanceurl- page urldatum- crawl datum for the page (fetch datum from segment containing fetch status and fetch time)inlinks- page inlinks- Returns:
- modified (or a new) document instance, or null (meaning the document should be discarded)
- Throws:
IndexingException- if an error occurs during during filtering
-
add
protected void add(NutchDocument doc, String key, String value)
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
-