Class ProtocolURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer
-
- All Implemented Interfaces:
Configurable,URLNormalizer
public class ProtocolURLNormalizer extends Object implements URLNormalizer
URL normalizer to normalize the protocol for all URLs of a given host or domain, e.g. normalizehttp://nutch.apache.org/path/tohttps://www.apache.org/path/if it's known that the hostnutch.apache.orgsupports https and http-URLs either cause duplicate content or are redirected to https. Seeorg.apache.nutch.net.urlnormalizer.protocolfor details and configuration.
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description ProtocolURLNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ConfigurationgetConf()Stringnormalize(String url, String scope)voidsetConf(Configuration conf)
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConfin interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConfin interfaceConfigurable
-
normalize
public String normalize(String url, String scope) throws MalformedURLException
- Specified by:
normalizein interfaceURLNormalizer- Throws:
MalformedURLException
-
-