Package org.apache.nutch.crawl
Class LinkDb
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.util.NutchTool
-
- org.apache.nutch.crawl.LinkDb
-
- All Implemented Interfaces:
Configurable,Tool
public class LinkDb extends NutchTool implements Tool
Maintains an inverted link map, listing incoming links for each url.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classLinkDb.LinkDbMapper
-
Field Summary
Fields Modifier and Type Field Description static StringCURRENT_NAMEstatic StringIGNORE_EXTERNAL_LINKSstatic StringIGNORE_INTERNAL_LINKSstatic StringLOCK_NAME-
Fields inherited from class org.apache.nutch.util.NutchTool
currentJob, currentJobNum, numJobs, results, status
-
-
Constructor Summary
Constructors Constructor Description LinkDb()LinkDb(Configuration conf)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static voidinstall(Job job, Path linkDb)voidinvert(Path linkDb, Path[] segments, boolean normalize, boolean filter, boolean force)voidinvert(Path linkDb, Path segmentsDir, boolean normalize, boolean filter, boolean force)static voidmain(String[] args)intrun(String[] args)Map<String,Object>run(Map<String,Object> args, String crawlId)Runs the tool, using a map of arguments.-
Methods inherited from class org.apache.nutch.util.NutchTool
getProgress, getStatus, killJob, setConf, stopJob
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
IGNORE_INTERNAL_LINKS
public static final String IGNORE_INTERNAL_LINKS
- See Also:
- Constant Field Values
-
IGNORE_EXTERNAL_LINKS
public static final String IGNORE_EXTERNAL_LINKS
- See Also:
- Constant Field Values
-
CURRENT_NAME
public static final String CURRENT_NAME
- See Also:
- Constant Field Values
-
LOCK_NAME
public static final String LOCK_NAME
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
LinkDb
public LinkDb()
-
LinkDb
public LinkDb(Configuration conf)
-
-
Method Detail
-
invert
public void invert(Path linkDb, Path segmentsDir, boolean normalize, boolean filter, boolean force) throws IOException, InterruptedException, ClassNotFoundException
-
invert
public void invert(Path linkDb, Path[] segments, boolean normalize, boolean filter, boolean force) throws IOException, InterruptedException, ClassNotFoundException
-
install
public static void install(Job job, Path linkDb) throws IOException
- Throws:
IOException
-
run
public Map<String,Object> run(Map<String,Object> args, String crawlId) throws Exception
Description copied from class:NutchToolRuns the tool, using a map of arguments. May return results, or null.- Specified by:
runin classNutchTool- Parameters:
args- aMapof arguments to be run with the toolcrawlId- a crawl identifier to associate with the tool invocation- Returns:
- Map results object if tool executes successfully otherwise null
- Throws:
Exception- if there is an error during the tool execution
-
-