Class BasicURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
-
- All Implemented Interfaces:
Configurable,URLNormalizer
public class BasicURLNormalizer extends Object implements URLNormalizer
Converts URLs to a normal form:- remove dot segments in path:
/./or/../ - remove default ports, e.g. 80 for protocol
http:// - normalize percent-encoding in URL paths
-
-
Field Summary
Fields Modifier and Type Field Description static StringNORM_HOST_IDNstatic StringNORM_HOST_TRIM_TRAILING_DOT-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description BasicURLNormalizer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ConfigurationgetConf()static voidmain(String[] args)Stringnormalize(String urlString, String scope)voidsetConf(Configuration conf)
-
-
-
Field Detail
-
NORM_HOST_IDN
public static final String NORM_HOST_IDN
- See Also:
- Constant Field Values
-
NORM_HOST_TRIM_TRAILING_DOT
public static final String NORM_HOST_TRIM_TRAILING_DOT
- See Also:
- Constant Field Values
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
normalize
public String normalize(String urlString, String scope) throws MalformedURLException
- Specified by:
normalizein interfaceURLNormalizer- Throws:
MalformedURLException
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-