- Prev
- Next
Uses of Interface
org.apache.nutch.net.URLNormalizer
Packages that use URLNormalizer Package Description org.apache.nutch.net.urlnormalizer.basic
URL normalizer performing basic normalizations: remove default ports and dot segments in path. org.apache.nutch.net.urlnormalizer.host
URL normalizer renaming hosts to a canonical form listed in the configuration file. org.apache.nutch.net.urlnormalizer.pass
URL normalizer dummy which does not change URLs. org.apache.nutch.net.urlnormalizer.querystring
URL normalizer which sort the elements in the query part to avoid duplicates by permutations. org.apache.nutch.net.urlnormalizer.regex
URL normalizer with configurable rules based on regular expressions (Pattern
).
Uses of URLNormalizer in org.apache.nutch.net.urlnormalizer.basic
Classes in org.apache.nutch.net.urlnormalizer.basic that implement URLNormalizer Modifier and Type Class and Description class
BasicURLNormalizer
Converts URLs to a normal form: remove dot segments in path: /./
or /../
remove default ports, e.g.
Uses of URLNormalizer in org.apache.nutch.net.urlnormalizer.host
Classes in org.apache.nutch.net.urlnormalizer.host that implement URLNormalizer Modifier and Type Class and Description class
HostURLNormalizer
URL normalizer for mapping hosts to their desired form.
Uses of URLNormalizer in org.apache.nutch.net.urlnormalizer.pass
Classes in org.apache.nutch.net.urlnormalizer.pass that implement URLNormalizer Modifier and Type Class and Description class
PassURLNormalizer
This URLNormalizer doesn't change urls.
Uses of URLNormalizer in org.apache.nutch.net.urlnormalizer.querystring
Classes in org.apache.nutch.net.urlnormalizer.querystring that implement URLNormalizer Modifier and Type Class and Description class
QuerystringURLNormalizer
URL normalizer plugin for normalizing query strings but sorting query string parameters.
Uses of URLNormalizer in org.apache.nutch.net.urlnormalizer.regex
Classes in org.apache.nutch.net.urlnormalizer.regex that implement URLNormalizer Modifier and Type Class and Description class
RegexURLNormalizer
Allows users to do regex substitutions on all/any URLs that are encountered, which is useful for stripping session IDs from URLs.
- Prev
- Next