Package org.apache.nutch.hostdb
Class FetchOverdueCrawlDatumProcessor
- java.lang.Object
-
- org.apache.nutch.hostdb.FetchOverdueCrawlDatumProcessor
-
- All Implemented Interfaces:
CrawlDatumProcessor
public class FetchOverdueCrawlDatumProcessor extends Object implements CrawlDatumProcessor
Simple custom crawl datum processor that counts the number of records that are overdue for fetching, e.g. new unfetched URLs that haven't been fetched within two days.
-
-
Field Summary
Fields Modifier and Type Field Description protected Configurationconfprotected longnumOverDueprotected longoverDueTimeprotected longoverDueTimeLimit
-
Constructor Summary
Constructors Constructor Description FetchOverdueCrawlDatumProcessor(Configuration conf)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcount(CrawlDatum crawlDatum)Process a single crawl datum instance to aggregate custom counts.voidfinalize(HostDatum hostDatum)Process the final host datum instance and store the aggregated custom counts in the HostDatum.
-
-
-
Field Detail
-
conf
protected final Configuration conf
-
overDueTimeLimit
protected long overDueTimeLimit
-
overDueTime
protected long overDueTime
-
numOverDue
protected long numOverDue
-
-
Constructor Detail
-
FetchOverdueCrawlDatumProcessor
public FetchOverdueCrawlDatumProcessor(Configuration conf)
-
-
Method Detail
-
count
public void count(CrawlDatum crawlDatum)
Description copied from interface:CrawlDatumProcessorProcess a single crawl datum instance to aggregate custom counts.- Specified by:
countin interfaceCrawlDatumProcessor- Parameters:
crawlDatum- CrawlDatum instance to count information from
-
finalize
public void finalize(HostDatum hostDatum)
Description copied from interface:CrawlDatumProcessorProcess the final host datum instance and store the aggregated custom counts in the HostDatum.- Specified by:
finalizein interfaceCrawlDatumProcessor- Parameters:
hostDatum- HostDatum instance to hold the aggregated custom counts
-
-