- Prev
- Next
Uses of Interface
org.apache.nutch.plugin.Pluggable
Packages that use Pluggable Package Description org.apache.nutch.analysis.lang
Text document language identifier. org.apache.nutch.collection
Subcollection is a subset of an index. org.apache.nutch.indexer
Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index. org.apache.nutch.indexer.anchor
An indexing plugin for inbound anchor text. org.apache.nutch.indexer.basic
A basic indexing plugin, adds basic fields: url, host, title, content, etc. org.apache.nutch.indexer.feed
Indexing filter to index meta data from RSS feeds. org.apache.nutch.indexer.metadata
Indexing filter to add document metadata to the index. org.apache.nutch.indexer.more
A more indexing plugin, adds "more" index fields: last modified date, MIME type, content length. org.apache.nutch.indexer.staticfield
A simple plugin called at indexing that adds fields with static data. org.apache.nutch.indexer.subcollection
Indexing filter to assign documents to subcollections. org.apache.nutch.indexer.tld
Top Level Domain Indexing plugin. org.apache.nutch.indexer.urlmeta
URL Meta Tag Indexing Plugin org.apache.nutch.indexwriter.dummy
Index writer plugin for debugging, writes pairs ofto a text file, action is one of "add", "update", or "delete". org.apache.nutch.indexwriter.elastic
Index writer plugin for Elasticsearch. org.apache.nutch.indexwriter.solr
Index writer plugin for Apache Solr. org.apache.nutch.microformats.reltag
A microformats Rel-Tag Parser/Indexer/Querier plugin. org.apache.nutch.net
Web-related interfaces: URLfilters
andnormalizers
. org.apache.nutch.parse
TheParse
interface and related classes. org.apache.nutch.parse.ext
Parse wrapper to run external command to do the parsing. org.apache.nutch.parse.feed
Parse RSS feeds. org.apache.nutch.parse.headings
Parse filter to extract headings (h1, h2, etc.) from DOM parse tree. org.apache.nutch.parse.html
An HTML document parsing plugin. org.apache.nutch.parse.js
Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets. org.apache.nutch.parse.metatags
Parse filter to extract meta tags: keywords, description, etc. org.apache.nutch.parse.swf
Parse Flash SWF files. org.apache.nutch.parse.tika
Parse various document formats with help of Apache Tika. org.apache.nutch.parse.zip
Parse ZIP files: embedded files are recursively passed to appropriate parsers. org.apache.nutch.protocol
Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
. org.apache.nutch.protocol.file
Protocol plugin which supports retrieving local file resources. org.apache.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol. org.apache.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol. org.apache.nutch.protocol.http.api
Common API used by HTTP plugins (http
,httpclient
) org.apache.nutch.scoring
TheScoringFilter
interface. org.apache.nutch.scoring.depth
Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs). org.apache.nutch.scoring.link
Scoring filter used in conjunction withWebGraph
. org.apache.nutch.scoring.opic
Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm. org.apache.nutch.scoring.tld
Top Level Domain Scoring plugin. org.apache.nutch.scoring.urlmeta
URL Meta Tag Scoring Plugin org.apache.nutch.urlfilter.api
GenericURL filter
library, abstracting away from regular expression implementations. org.apache.nutch.urlfilter.automaton
URL filter plugin based on dk.brics.automaton Finite-State Automata for JavaTM. org.apache.nutch.urlfilter.domain
URL filter plugin to include only URLs which match an element in a given list of domain suffixes, domain names, and/or host names. org.apache.nutch.urlfilter.domainblacklist
URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names. org.apache.nutch.urlfilter.prefix
URL filter plugin to include only URLs which match one of a given list of URL prefixes. org.apache.nutch.urlfilter.regex
URL filter plugin to include and/or exclude URLs matching Java regular expressions. org.apache.nutch.urlfilter.suffix
URL filter plugin to either exclude or include only URLs which match one of the given (path) suffixes. org.apache.nutch.urlfilter.validator
URL filter plugin that validates given urls. org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
Uses of Pluggable in org.apache.nutch.analysis.lang
Classes in org.apache.nutch.analysis.lang that implement Pluggable Modifier and Type Class and Description class
HTMLLanguageParser
class
LanguageIndexingFilter
An IndexingFilter
that add a lang
(language) field to the document.
Uses of Pluggable in org.apache.nutch.collection
Classes in org.apache.nutch.collection that implement Pluggable Modifier and Type Class and Description class
Subcollection
SubCollection represents a subset of index, you can define url patterns that will indicate that particular page (url) is part of SubCollection.
Uses of Pluggable in org.apache.nutch.indexer
Subinterfaces of Pluggable in org.apache.nutch.indexer Modifier and Type Interface and Description interface
IndexingFilter
Extension point for indexing.
interface
IndexWriter
Uses of Pluggable in org.apache.nutch.indexer.anchor
Classes in org.apache.nutch.indexer.anchor that implement Pluggable Modifier and Type Class and Description class
AnchorIndexingFilter
Indexing filter that offers an option to either index all inbound anchor text for a document or deduplicate anchors.
Uses of Pluggable in org.apache.nutch.indexer.basic
Classes in org.apache.nutch.indexer.basic that implement Pluggable Modifier and Type Class and Description class
BasicIndexingFilter
Adds basic searchable fields to a document.
Uses of Pluggable in org.apache.nutch.indexer.feed
Classes in org.apache.nutch.indexer.feed that implement Pluggable Modifier and Type Class and Description class
FeedIndexingFilter
Uses of Pluggable in org.apache.nutch.indexer.metadata
Classes in org.apache.nutch.indexer.metadata that implement Pluggable Modifier and Type Class and Description class
MetadataIndexer
Indexer which can be configured to extract metadata from the crawldb, parse metadata or content metadata.
Uses of Pluggable in org.apache.nutch.indexer.more
Classes in org.apache.nutch.indexer.more that implement Pluggable Modifier and Type Class and Description class
MoreIndexingFilter
Add (or reset) a few metaData properties as respective fields (if they are available), so that they can be accurately used within the search index.
Uses of Pluggable in org.apache.nutch.indexer.staticfield
Classes in org.apache.nutch.indexer.staticfield that implement Pluggable Modifier and Type Class and Description class
StaticFieldIndexer
A simple plugin called at indexing that adds fields with static data.
Uses of Pluggable in org.apache.nutch.indexer.subcollection
Classes in org.apache.nutch.indexer.subcollection that implement Pluggable Modifier and Type Class and Description class
SubcollectionIndexingFilter
Uses of Pluggable in org.apache.nutch.indexer.tld
Classes in org.apache.nutch.indexer.tld that implement Pluggable Modifier and Type Class and Description class
TLDIndexingFilter
Adds the Top level domain extensions to the index
Uses of Pluggable in org.apache.nutch.indexer.urlmeta
Classes in org.apache.nutch.indexer.urlmeta that implement Pluggable Modifier and Type Class and Description class
URLMetaIndexingFilter
This is part of the URL Meta plugin.
Uses of Pluggable in org.apache.nutch.indexwriter.dummy
Classes in org.apache.nutch.indexwriter.dummy that implement Pluggable Modifier and Type Class and Description class
DummyIndexWriter
DummyIndexWriter.
Uses of Pluggable in org.apache.nutch.indexwriter.elastic
Classes in org.apache.nutch.indexwriter.elastic that implement Pluggable Modifier and Type Class and Description class
ElasticIndexWriter
Uses of Pluggable in org.apache.nutch.indexwriter.solr
Classes in org.apache.nutch.indexwriter.solr that implement Pluggable Modifier and Type Class and Description class
SolrIndexWriter
Uses of Pluggable in org.apache.nutch.microformats.reltag
Classes in org.apache.nutch.microformats.reltag that implement Pluggable Modifier and Type Class and Description class
RelTagIndexingFilter
An IndexingFilter
that add tag
field(s) to the document.
class
RelTagParser
Adds microformat rel-tags of document if found.
Uses of Pluggable in org.apache.nutch.net
Subinterfaces of Pluggable in org.apache.nutch.net Modifier and Type Interface and Description interface
URLFilter
Interface used to limit which URLs enter Nutch.
Uses of Pluggable in org.apache.nutch.parse
Subinterfaces of Pluggable in org.apache.nutch.parse Modifier and Type Interface and Description interface
HtmlParseFilter
Extension point for DOM-based HTML parsers.
interface
Parser
A parser for content generated by a Protocol
implementation.
Uses of Pluggable in org.apache.nutch.parse.ext
Classes in org.apache.nutch.parse.ext that implement Pluggable Modifier and Type Class and Description class
ExtParser
A wrapper that invokes external command to do real parsing job.
Uses of Pluggable in org.apache.nutch.parse.feed
Classes in org.apache.nutch.parse.feed that implement Pluggable Modifier and Type Class and Description class
FeedParser
Uses of Pluggable in org.apache.nutch.parse.headings
Classes in org.apache.nutch.parse.headings that implement Pluggable Modifier and Type Class and Description class
HeadingsParseFilter
HtmlParseFilter to retrieve h1 and h2 values from the DOM.
Uses of Pluggable in org.apache.nutch.parse.html
Classes in org.apache.nutch.parse.html that implement Pluggable Modifier and Type Class and Description class
HtmlParser
Uses of Pluggable in org.apache.nutch.parse.js
Classes in org.apache.nutch.parse.js that implement Pluggable Modifier and Type Class and Description class
JSParseFilter
This class is a heuristic link extractor for JavaScript files and code snippets.
Uses of Pluggable in org.apache.nutch.parse.metatags
Classes in org.apache.nutch.parse.metatags that implement Pluggable Modifier and Type Class and Description class
MetaTagsParser
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'.
Uses of Pluggable in org.apache.nutch.parse.swf
Classes in org.apache.nutch.parse.swf that implement Pluggable Modifier and Type Class and Description class
SWFParser
Parser for Flash SWF files.
Uses of Pluggable in org.apache.nutch.parse.tika
Classes in org.apache.nutch.parse.tika that implement Pluggable Modifier and Type Class and Description class
TikaParser
Wrapper for Tika parsers.
Uses of Pluggable in org.apache.nutch.parse.zip
Classes in org.apache.nutch.parse.zip that implement Pluggable Modifier and Type Class and Description class
ZipParser
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter.
Uses of Pluggable in org.apache.nutch.protocol
Subinterfaces of Pluggable in org.apache.nutch.protocol Modifier and Type Interface and Description interface
Protocol
A retriever of url content.
Uses of Pluggable in org.apache.nutch.protocol.file
Classes in org.apache.nutch.protocol.file that implement Pluggable Modifier and Type Class and Description class
File
This class is a protocol plugin used for file: scheme.
Uses of Pluggable in org.apache.nutch.protocol.ftp
Classes in org.apache.nutch.protocol.ftp that implement Pluggable Modifier and Type Class and Description class
Ftp
This class is a protocol plugin used for ftp: scheme.
Uses of Pluggable in org.apache.nutch.protocol.http
Classes in org.apache.nutch.protocol.http that implement Pluggable Modifier and Type Class and Description class
Http
Uses of Pluggable in org.apache.nutch.protocol.http.api
Classes in org.apache.nutch.protocol.http.api that implement Pluggable Modifier and Type Class and Description class
HttpBase
Uses of Pluggable in org.apache.nutch.scoring
Subinterfaces of Pluggable in org.apache.nutch.scoring Modifier and Type Interface and Description interface
ScoringFilter
A contract defining behavior of scoring plugins.
Classes in org.apache.nutch.scoring that implement Pluggable Modifier and Type Class and Description class
AbstractScoringFilter
class
ScoringFilters
Creates and caches ScoringFilter
implementing plugins.
Uses of Pluggable in org.apache.nutch.scoring.depth
Classes in org.apache.nutch.scoring.depth that implement Pluggable Modifier and Type Class and Description class
DepthScoringFilter
This scoring filter limits the number of hops from the initial seed urls.
Uses of Pluggable in org.apache.nutch.scoring.link
Classes in org.apache.nutch.scoring.link that implement Pluggable Modifier and Type Class and Description class
LinkAnalysisScoringFilter
Uses of Pluggable in org.apache.nutch.scoring.opic
Classes in org.apache.nutch.scoring.opic that implement Pluggable Modifier and Type Class and Description class
OPICScoringFilter
This plugin implements a variant of an Online Page Importance Computation (OPIC) score, described in this paper: Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003), Adaptive On-Line Page Importance Computation .
Uses of Pluggable in org.apache.nutch.scoring.tld
Classes in org.apache.nutch.scoring.tld that implement Pluggable Modifier and Type Class and Description class
TLDScoringFilter
Scoring filter to boost tlds.
Uses of Pluggable in org.apache.nutch.scoring.urlmeta
Classes in org.apache.nutch.scoring.urlmeta that implement Pluggable Modifier and Type Class and Description class
URLMetaScoringFilter
For documentation:
Uses of Pluggable in org.apache.nutch.urlfilter.api
Classes in org.apache.nutch.urlfilter.api that implement Pluggable Modifier and Type Class and Description class
RegexURLFilterBase
Generic URL filter
based on regular expressions.
Uses of Pluggable in org.apache.nutch.urlfilter.automaton
Classes in org.apache.nutch.urlfilter.automaton that implement Pluggable Modifier and Type Class and Description class
AutomatonURLFilter
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.
Uses of Pluggable in org.apache.nutch.urlfilter.domain
Classes in org.apache.nutch.urlfilter.domain that implement Pluggable Modifier and Type Class and Description class
DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames.
Uses of Pluggable in org.apache.nutch.urlfilter.domainblacklist
Classes in org.apache.nutch.urlfilter.domainblacklist that implement Pluggable Modifier and Type Class and Description class
DomainBlacklistURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames.
Uses of Pluggable in org.apache.nutch.urlfilter.prefix
Classes in org.apache.nutch.urlfilter.prefix that implement Pluggable Modifier and Type Class and Description class
PrefixURLFilter
Filters URLs based on a file of URL prefixes.
Uses of Pluggable in org.apache.nutch.urlfilter.regex
Classes in org.apache.nutch.urlfilter.regex that implement Pluggable Modifier and Type Class and Description class
RegexURLFilter
Filters URLs based on a file of regular expressions using the Java Regex implementation
.
Uses of Pluggable in org.apache.nutch.urlfilter.suffix
Classes in org.apache.nutch.urlfilter.suffix that implement Pluggable Modifier and Type Class and Description class
SuffixURLFilter
Filters URLs based on a file of URL suffixes.
Uses of Pluggable in org.apache.nutch.urlfilter.validator
Classes in org.apache.nutch.urlfilter.validator that implement Pluggable Modifier and Type Class and Description class
UrlValidator
Validates URLs.
Uses of Pluggable in org.creativecommons.nutch
Classes in org.creativecommons.nutch that implement Pluggable Modifier and Type Class and Description class
CCIndexingFilter
Adds basic searchable fields to a document.
class
CCParseFilter
Adds metadata identifying the Creative Commons license used, if any.
- Prev
- Next