Package org.apache.nutch.parse
The
Parse interface and related classes.-
Interface Summary Interface Description HtmlParseFilter Extension point for DOM-based HTML parsers.Parse The result of parsing a page's raw content.Parser A parser for content generated by aProtocolimplementation. -
Class Summary Class Description HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.HtmlParseFilters Creates and cachesHtmlParseFilterimplementing plugins.Outlink An outgoing link from a page.OutlinkExtractor Extractor to extractOutlinks / URLs from plain text using Regular Expressions.ParseData Data extracted from a page's content.ParseImpl The result of parsing a page's raw content.ParseOutputFormat ParserChecker Parser checker, useful for testing parser.ParseResult A utility class that stores result of a parse.ParserFactory Creates and cachesParserplugins.ParseSegment ParseSegment.ParseSegmentMapper ParseSegment.ParseSegmentReducer ParseStatus ParseText ParseUtil -
Exception Summary Exception Description ParseException ParserNotFound