Package org.apache.nutch.parse
Class ParseUtil
- java.lang.Object
-
- org.apache.nutch.parse.ParseUtil
-
public class ParseUtil extends Object
-
-
Constructor Summary
Constructors Constructor Description ParseUtil(Configuration conf)Overloaded constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseResultparse(Content content)ParseResultparseByExtensionId(String extId, Content content)
-
-
-
Constructor Detail
-
ParseUtil
public ParseUtil(Configuration conf)
Overloaded constructor- Parameters:
conf- a populatedConfiguration
-
-
Method Detail
-
parse
public ParseResult parse(Content content) throws ParseException
Performs a parse by iterating through a List of preferredParsers until a successful parse is performed and aParseobject is returned. If the parse is unsuccessful, a message is logged to theWARNINGlevel, and an empty parse is returned.- Parameters:
content- The content to try and parse.- Returns:
- <key,
Parse> pairs. - Throws:
ParseException- If no suitable parser is found to perform the parse.
-
parseByExtensionId
public ParseResult parseByExtensionId(String extId, Content content) throws ParseException
Method parses aContentobject using theParserspecified by the parameterextId, i.e., the Parser's extension ID. If a suitableParseris not found, then aWARNINGlevel message is logged, and a ParseException is thrown. If the parse is uncessful for any other reason, then aWARNINGlevel message is logged, and aParseStatus.getEmptyParse()is returned.- Parameters:
extId- The extension implementation ID of theParserto use to parse the specified content.content- The content to parse.- Returns:
- <key,
Parse> pairs if the parse is successful, otherwise, a single <key,ParseStatus.getEmptyParse()> pair. - Throws:
ParseException- If there is no suitableParserfound to perform the parse.
-
-