Package org.apache.nutch.parse.js
Class JSParseFilter
- java.lang.Object
-
- org.apache.nutch.parse.js.JSParseFilter
-
- All Implemented Interfaces:
Configurable,HtmlParseFilter,Parser,Pluggable
public class JSParseFilter extends Object implements HtmlParseFilter, Parser
This class is a heuristic link extractor for JavaScript files and code snippets. The general idea of a two-pass regex matching comes from Heritrix. Parts of the code come from OutlinkExtractor.java
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.parse.HtmlParseFilter
X_POINT_ID
-
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description JSParseFilter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseResultfilter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)Scan the JavaScript fragments of a HTML page looking for possibleOutlink'sConfigurationgetConf()ParseResultgetParse(Content c)Parse a JavaScript file and extract outlinksstatic voidmain(String[] args)Main method which can be run from command line with the plugin option.voidsetConf(Configuration conf)
-
-
-
Method Detail
-
filter
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc)
Scan the JavaScript fragments of a HTML page looking for possibleOutlink's- Specified by:
filterin interfaceHtmlParseFilter- Parameters:
content- page contentparseResult- parsed content, result of running the HTML parsermetaTags- within theHTMLMetaTagsdoc- TheDocumentFragmentobject- Returns:
- parse the actual
ParseResultobject with additional outlinks from JavaScript - See Also:
Parser.getParse(Content)
-
getParse
public ParseResult getParse(Content c)
Parse a JavaScript file and extract outlinks
-
main
public static void main(String[] args) throws Exception
Main method which can be run from command line with the plugin option. The method takes two arguments e.g. o.a.n.parse.js.JSParseFilter file.js baseURL- Parameters:
args- run with no args to get help- Throws:
Exception- if there is a fatal error running the class with the given input
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
-