Package org.apache.nutch.scoring.orphan
Class OrphanScoringFilter
- java.lang.Object
-
- org.apache.nutch.scoring.AbstractScoringFilter
-
- org.apache.nutch.scoring.orphan.OrphanScoringFilter
-
- All Implemented Interfaces:
Configurable,Pluggable,ScoringFilter
public class OrphanScoringFilter extends AbstractScoringFilter
Orphan scoring filter that determines whether a page has become orphaned, e.g. it has no more other pages linking to it. If a page hasn't been linked to after markGoneAfter seconds, the page is marked as gone and is then removed by an indexer. If a page hasn't been linked to after markOrphanAfter seconds, the page is removed from the CrawlDB.
-
-
Field Summary
Fields Modifier and Type Field Description static TextORPHAN_KEY_WRITABLE-
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description OrphanScoringFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidorphanedScore(Text url, CrawlDatum datum)This method may change the score or status of CrawlDatum during CrawlDb update, when the URL is neither fetched nor has any inlinks.voidsetConf(Configuration conf)voidupdateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinks)Used for orphan control.-
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFilter
distributeScoreToOutlinks, generatorSortValue, getConf, indexerScore, initialScore, injectedScore, passScoreAfterParsing, passScoreBeforeParsing
-
-
-
-
Field Detail
-
ORPHAN_KEY_WRITABLE
public static Text ORPHAN_KEY_WRITABLE
-
-
Method Detail
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable- Overrides:
setConfin classAbstractScoringFilter
-
updateDbScore
public void updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List<CrawlDatum> inlinks) throws ScoringFilterException
Used for orphan control.- Specified by:
updateDbScorein interfaceScoringFilter- Overrides:
updateDbScorein classAbstractScoringFilter- Parameters:
url- of the recordold- CrawlDatumdatum- new CrawlDatuminlinks- list of inlinked CrawlDatums- Throws:
ScoringFilterException- there is a fatal error calculating a new score ofCrawlDatumduring CrawlDb update
-
orphanedScore
public void orphanedScore(Text url, CrawlDatum datum)
Description copied from interface:ScoringFilterThis method may change the score or status of CrawlDatum during CrawlDb update, when the URL is neither fetched nor has any inlinks.- Parameters:
url- URL of the pagedatum- CrawlDatum for page
-
-