Class HostURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.host.HostURLNormalizer
-
- All Implemented Interfaces:
Configurable,URLNormalizer
public class HostURLNormalizer extends Object implements URLNormalizer
URL normalizer for mapping hosts to their desired form. It takes a simple text file as source in the format: example.org www.example.org mapping all URL's of example.org the the www sub-domain. It also allows for wildcards to be used to map all sub-domains to another host: *.example.org www.example.org
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description HostURLNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ConfigurationgetConf()Stringnormalize(String urlString, String scope)protected StringreplaceHost(String urlString, String host, String target)voidsetConf(Configuration conf)
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
normalize
public String normalize(String urlString, String scope) throws MalformedURLException
- Specified by:
normalizein interfaceURLNormalizer- Throws:
MalformedURLException
-
-