Package org.apache.nutch.segment
Class SegmentMergeFilters
- java.lang.Object
-
- org.apache.nutch.segment.SegmentMergeFilters
-
public class SegmentMergeFilters extends Object
This class wraps allSegmentMergeFilterextensions in a single object so it is easier to operate on them. If any of extensions returnsfalsethis one will returnfalseas well.
-
-
Constructor Summary
Constructors Constructor Description SegmentMergeFilters(Configuration conf)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanfilter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)Iterates over allSegmentMergeFilterextensions and if any of them returns false, it will return false as well.
-
-
-
Constructor Detail
-
SegmentMergeFilters
public SegmentMergeFilters(Configuration conf)
-
-
Method Detail
-
filter
public boolean filter(Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection<CrawlDatum> linked)
Iterates over allSegmentMergeFilterextensions and if any of them returns false, it will return false as well.- Parameters:
key- the segment record keygenerateData- directory and data produced by the generation phasefetchData- directory and data produced by the fetch phasesigData- directory and data produced by the parse phasecontent- directory and data produced by the parse phaseparseData- directory and data produced by the parse phaseparseText- directory and data produced by the parse phaselinked- all LINKED values from the latest segment- Returns:
truevalues for thiskey(URL) should be merged into the new segment.
-
-