Package org.apache.nutch.indexer.anchor
Class AnchorIndexingFilter
- java.lang.Object
-
- org.apache.nutch.indexer.anchor.AnchorIndexingFilter
-
- All Implemented Interfaces:
Configurable,IndexingFilter,Pluggable
public class AnchorIndexingFilter extends Object implements IndexingFilter
Indexing filter that offers an option to either index all inbound anchor text for a document or deduplicate anchors. Deduplication does have it's con's, SeeanchorIndexingFilter.deduplicatein nutch-default.xml.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.indexer.IndexingFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description AnchorIndexingFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description NutchDocumentfilter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks)TheAnchorIndexingFilterfilter object which supports boolean configuration settings for the deduplication of anchors.ConfigurationgetConf()Get theConfigurationobjectvoidsetConf(Configuration conf)Set theConfigurationobject
-
-
-
Method Detail
-
setConf
public void setConf(Configuration conf)
Set theConfigurationobject- Specified by:
setConfin interfaceConfigurable
-
getConf
public Configuration getConf()
Get theConfigurationobject- Specified by:
getConfin interfaceConfigurable
-
filter
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
TheAnchorIndexingFilterfilter object which supports boolean configuration settings for the deduplication of anchors. SeeanchorIndexingFilter.deduplicatein nutch-default.xml.- Specified by:
filterin interfaceIndexingFilter- Parameters:
doc- TheNutchDocumentobjectparse- The relevantParseobject passing through the filterurl- URL to be filtered for anchor textdatum- TheCrawlDatumentryinlinks- TheInlinkscontaining anchor text- Returns:
- filtered NutchDocument
- Throws:
IndexingException- if an error occurs during during filtering
-
-