Class SimilarityScoringFilter
- java.lang.Object
-
- org.apache.nutch.scoring.AbstractScoringFilter
-
- org.apache.nutch.scoring.similarity.SimilarityScoringFilter
-
- All Implemented Interfaces:
Configurable,Pluggable,ScoringFilter
public class SimilarityScoringFilter extends AbstractScoringFilter
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.scoring.ScoringFilter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description SimilarityScoringFilter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CrawlDatumdistributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount)Distribute score value from the current page to all its outlinked pages.ConfigurationgetConf()voidpassScoreAfterParsing(Text url, Content content, Parse parse)Currently a part of score distribution is performed using only data coming from the parsing process.voidsetConf(Configuration conf)-
Methods inherited from class org.apache.nutch.scoring.AbstractScoringFilter
generatorSortValue, indexerScore, initialScore, injectedScore, passScoreBeforeParsing, updateDbScore
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.nutch.scoring.ScoringFilter
orphanedScore
-
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable- Overrides:
getConfin classAbstractScoringFilter
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable- Overrides:
setConfin classAbstractScoringFilter
-
passScoreAfterParsing
public void passScoreAfterParsing(Text url, Content content, Parse parse) throws ScoringFilterException
Description copied from interface:ScoringFilterCurrently a part of score distribution is performed using only data coming from the parsing process. We need this method in order to ensure the presence of score data in these steps.- Specified by:
passScoreAfterParsingin interfaceScoringFilter- Overrides:
passScoreAfterParsingin classAbstractScoringFilter- Parameters:
url- page urlcontent- original content. NOTE: modifications to this value are not persisted.parse- target instance to copy the score information to. Implementations may modify this in-place, primarily by setting some metadata properties.- Throws:
ScoringFilterException- if there is a fatal error processing score data in subsequent steps after parsing
-
distributeScoreToOutlinks
public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Map.Entry<Text,CrawlDatum>> targets, CrawlDatum adjust, int allCount) throws ScoringFilterException
Description copied from interface:ScoringFilterDistribute score value from the current page to all its outlinked pages.- Specified by:
distributeScoreToOutlinksin interfaceScoringFilter- Overrides:
distributeScoreToOutlinksin classAbstractScoringFilter- Parameters:
fromUrl- url of the source pageparseData- ParseData instance, which stores relevant score value(s) in its metadata. NOTE: filters may modify this in-place, all changes will be persisted.targets- <url, CrawlDatum> pairs. NOTE: filters can modify this in-place, all changes will be persisted.adjust- a CrawlDatum instance, initially null, which implementations may use to pass adjustment values to the original CrawlDatum. When creating this instance, set its status toCrawlDatum.STATUS_LINKED.allCount- number of all collected outlinks from the source page- Returns:
- if needed, implementations may return an instance of CrawlDatum,
with status
CrawlDatum.STATUS_LINKED, which contains adjustments to be applied to the original CrawlDatum score(s) and metadata. This can be null if not needed. - Throws:
ScoringFilterException- there is a fatal error distributing score data from the current page to all of its outlinks
-
-