Package org.apache.nutch.parse
Class HtmlParseFilters
- java.lang.Object
-
- org.apache.nutch.parse.HtmlParseFilters
-
public class HtmlParseFilters extends Object
Creates and cachesHtmlParseFilterimplementing plugins.
-
-
Field Summary
Fields Modifier and Type Field Description static StringHTMLPARSEFILTER_ORDER
-
Constructor Summary
Constructors Constructor Description HtmlParseFilters(Configuration conf)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseResultfilter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Run all defined filters.
-
-
-
Field Detail
-
HTMLPARSEFILTER_ORDER
public static final String HTMLPARSEFILTER_ORDER
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HtmlParseFilters
public HtmlParseFilters(Configuration conf)
-
-
Method Detail
-
filter
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Run all defined filters.- Parameters:
content- theContentfor a given responseparseResult- the result of running on or moreParser's on the content.metaTags- a populatedHTMLMetaTagsobjectdoc- aDocumentFragment(DOM) which can be processed in the filtering process.- Returns:
- a filtered
ParseResult - See Also:
Parser.getParse(Content)
-
-