Package org.apache.nutch.util
Class AbstractChecker
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.util.AbstractChecker
-
- All Implemented Interfaces:
Configurable,Tool
- Direct Known Subclasses:
CrawlDbReader,IndexingFiltersChecker,LinkDbReader,ParserChecker,URLFilterChecker,URLNormalizerChecker
public abstract class AbstractChecker extends Configured implements Tool
Scaffolding class for the various Checker implementations. Can process cmdline input, stdin and TCP connections.- Author:
- Jurian Broertjes
-
-
Field Summary
Fields Modifier and Type Field Description protected booleankeepClientCnxOpenprotected booleanstdinprotected inttcpPortprotected Stringusage
-
Constructor Summary
Constructors Constructor Description AbstractChecker()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected ProtocolOutputgetProtocolOutput(String url, CrawlDatum datum, boolean checkRobotsTxt)protected intparseArgs(String[] args, int i)protected abstract intprocess(String line, StringBuilder output)protected intprocessSingle(String input)protected intprocessStdin()protected intprocessTCP(int tcpPort)protected intrun()-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
keepClientCnxOpen
protected boolean keepClientCnxOpen
-
tcpPort
protected int tcpPort
-
stdin
protected boolean stdin
-
usage
protected String usage
-
-
Method Detail
-
process
protected abstract int process(String line, StringBuilder output) throws Exception
- Throws:
Exception
-
parseArgs
protected int parseArgs(String[] args, int i)
-
getProtocolOutput
protected ProtocolOutput getProtocolOutput(String url, CrawlDatum datum, boolean checkRobotsTxt) throws Exception
- Throws:
Exception
-
-