Package org.apache.nutch.parse.tika
Class TikaParser
- java.lang.Object
-
- org.apache.nutch.parse.tika.TikaParser
-
- All Implemented Interfaces:
Configurable,Parser,Pluggable
public class TikaParser extends Object implements Parser
Wrapper for Tika parsers. Mimics the HTMLParser but using the XHTML representation returned by Tika as SAX events
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description TikaParser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ConfigurationgetConf()ParseResultgetParse(Content content)This method parses the given content and returns a map of <key, parse> pairs.voidsetConf(Configuration conf)
-
-
-
Method Detail
-
getParse
public ParseResult getParse(Content content)
Description copied from interface:ParserThis method parses the given content and returns a map of <key, parse> pairs.
Parseinstances will be persisted under the given key.Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html",Parsewith aParseStatusindicating the redirect>.
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
-