Package org.apache.nutch.crawl
Class DeduplicationJob.DedupReducer<K extends Writable>
- java.lang.Object
-
- org.apache.hadoop.mapreduce.Reducer<K,CrawlDatum,Text,CrawlDatum>
-
- org.apache.nutch.crawl.DeduplicationJob.DedupReducer<K>
-
- Enclosing class:
- DeduplicationJob
public static class DeduplicationJob.DedupReducer<K extends Writable> extends Reducer<K,CrawlDatum,Text,CrawlDatum>
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer
Reducer.Context
-
-
Field Summary
Fields Modifier and Type Field Description protected String[]compareOrder
-
Constructor Summary
Constructors Constructor Description DedupReducer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CrawlDatumgetDuplicate(CrawlDatum existingDoc, CrawlDatum newDoc)voidreduce(K key, Iterable<CrawlDatum> values, Reducer.Context context)voidsetup(Reducer.Context context)protected voidwriteOutAsDuplicate(CrawlDatum datum, Reducer.Context context)
-
-
-
Field Detail
-
compareOrder
protected String[] compareOrder
-
-
Method Detail
-
setup
public void setup(Reducer.Context context)
- Overrides:
setupin classReducer<K extends Writable,CrawlDatum,Text,CrawlDatum>
-
writeOutAsDuplicate
protected void writeOutAsDuplicate(CrawlDatum datum, Reducer.Context context) throws IOException, InterruptedException
- Throws:
IOExceptionInterruptedException
-
reduce
public void reduce(K key, Iterable<CrawlDatum> values, Reducer.Context context) throws IOException, InterruptedException
- Overrides:
reducein classReducer<K extends Writable,CrawlDatum,Text,CrawlDatum>- Throws:
IOExceptionInterruptedException
-
getDuplicate
protected CrawlDatum getDuplicate(CrawlDatum existingDoc, CrawlDatum newDoc)
-
-