- Prev
- Next
Uses of Interface
org.apache.nutch.plugin.Pluggable
Packages that use Pluggable Package Description org.apache.nutch.analysis.lang
Text document language identifier. org.apache.nutch.collection
Subcollection is a subset of an index. org.apache.nutch.indexer
Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index. org.apache.nutch.indexer.anchor
An indexing plugin for inbound anchor text. org.apache.nutch.indexer.basic
A basic indexing plugin, adds basic fields: url, host, title, content, etc. org.apache.nutch.indexer.feed
Indexing filter to index meta data from RSS feeds. org.apache.nutch.indexer.metadata
Indexing filter to add document metadata to the index. org.apache.nutch.indexer.more
A more indexing plugin, adds "more" index fields: last modified date, MIME type, content length. org.apache.nutch.indexer.staticfield
A simple plugin called at indexing that adds fields with static data. org.apache.nutch.indexer.subcollection
Indexing filter to assign documents to subcollections. org.apache.nutch.indexer.tld
Top Level Domain Indexing plugin. org.apache.nutch.indexer.urlmeta
URL Meta Tag Indexing Plugin org.apache.nutch.indexwriter.dummy
Index writer plugin for debugging, writes pairs ofto a text file, action is one of "add", "update", or "delete". org.apache.nutch.indexwriter.elastic
Index writer plugin for Elasticsearch. org.apache.nutch.indexwriter.solr
Index writer plugin for Apache Solr. org.apache.nutch.microformats.reltag
A microformats Rel-Tag Parser/Indexer/Querier plugin. org.apache.nutch.net
Web-related interfaces: URLfiltersandnormalizers. org.apache.nutch.parse
TheParseinterface and related classes. org.apache.nutch.parse.ext
Parse wrapper to run external command to do the parsing. org.apache.nutch.parse.feed
Parse RSS feeds. org.apache.nutch.parse.headings
Parse filter to extract headings (h1, h2, etc.) from DOM parse tree. org.apache.nutch.parse.html
An HTML document parsing plugin. org.apache.nutch.parse.js
Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets. org.apache.nutch.parse.metatags
Parse filter to extract meta tags: keywords, description, etc. org.apache.nutch.parse.swf
Parse Flash SWF files. org.apache.nutch.parse.tika
Parse various document formats with help of Apache Tika. org.apache.nutch.parse.zip
Parse ZIP files: embedded files are recursively passed to appropriate parsers. org.apache.nutch.protocol
Classes related to theProtocolinterface, see alsoorg.apache.nutch.net.protocols. org.apache.nutch.protocol.file
Protocol plugin which supports retrieving local file resources. org.apache.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol. org.apache.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol. org.apache.nutch.protocol.http.api
Common API used by HTTP plugins (http,httpclient) org.apache.nutch.scoring
TheScoringFilterinterface. org.apache.nutch.scoring.depth
Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs). org.apache.nutch.scoring.link
Scoring filter used in conjunction withWebGraph. org.apache.nutch.scoring.opic
Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm. org.apache.nutch.scoring.tld
Top Level Domain Scoring plugin. org.apache.nutch.scoring.urlmeta
URL Meta Tag Scoring Plugin org.apache.nutch.urlfilter.api
GenericURL filterlibrary, abstracting away from regular expression implementations. org.apache.nutch.urlfilter.automaton
URL filter plugin based on dk.brics.automaton Finite-State Automata for JavaTM. org.apache.nutch.urlfilter.domain
URL filter plugin to include only URLs which match an element in a given list of domain suffixes, domain names, and/or host names. org.apache.nutch.urlfilter.domainblacklist
URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names. org.apache.nutch.urlfilter.prefix
URL filter plugin to include only URLs which match one of a given list of URL prefixes. org.apache.nutch.urlfilter.regex
URL filter plugin to include and/or exclude URLs matching Java regular expressions. org.apache.nutch.urlfilter.suffix
URL filter plugin to either exclude or include only URLs which match one of the given (path) suffixes. org.apache.nutch.urlfilter.validator
URL filter plugin that validates given urls. org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
Uses of Pluggable in org.apache.nutch.analysis.lang
Classes in org.apache.nutch.analysis.lang that implement Pluggable Modifier and Type Class and Description class HTMLLanguageParser class LanguageIndexingFilter
An IndexingFilter that add a lang (language) field to the document.
Uses of Pluggable in org.apache.nutch.collection
Classes in org.apache.nutch.collection that implement Pluggable Modifier and Type Class and Description class Subcollection
SubCollection represents a subset of index, you can define url patterns that will indicate that particular page (url) is part of SubCollection.
Uses of Pluggable in org.apache.nutch.indexer
Subinterfaces of Pluggable in org.apache.nutch.indexer Modifier and Type Interface and Description interface IndexingFilter
Extension point for indexing.
interface IndexWriter
Uses of Pluggable in org.apache.nutch.indexer.anchor
Classes in org.apache.nutch.indexer.anchor that implement Pluggable Modifier and Type Class and Description class AnchorIndexingFilter
Indexing filter that offers an option to either index all inbound anchor text for a document or deduplicate anchors.
Uses of Pluggable in org.apache.nutch.indexer.basic
Classes in org.apache.nutch.indexer.basic that implement Pluggable Modifier and Type Class and Description class BasicIndexingFilter
Adds basic searchable fields to a document.
Uses of Pluggable in org.apache.nutch.indexer.feed
Classes in org.apache.nutch.indexer.feed that implement Pluggable Modifier and Type Class and Description class FeedIndexingFilter
Uses of Pluggable in org.apache.nutch.indexer.metadata
Classes in org.apache.nutch.indexer.metadata that implement Pluggable Modifier and Type Class and Description class MetadataIndexer
Indexer which can be configured to extract metadata from the crawldb, parse metadata or content metadata.
Uses of Pluggable in org.apache.nutch.indexer.more
Classes in org.apache.nutch.indexer.more that implement Pluggable Modifier and Type Class and Description class MoreIndexingFilter
Add (or reset) a few metaData properties as respective fields (if they are available), so that they can be accurately used within the search index.
Uses of Pluggable in org.apache.nutch.indexer.staticfield
Classes in org.apache.nutch.indexer.staticfield that implement Pluggable Modifier and Type Class and Description class StaticFieldIndexer
A simple plugin called at indexing that adds fields with static data.
Uses of Pluggable in org.apache.nutch.indexer.subcollection
Classes in org.apache.nutch.indexer.subcollection that implement Pluggable Modifier and Type Class and Description class SubcollectionIndexingFilter
Uses of Pluggable in org.apache.nutch.indexer.tld
Classes in org.apache.nutch.indexer.tld that implement Pluggable Modifier and Type Class and Description class TLDIndexingFilter
Adds the Top level domain extensions to the index
Uses of Pluggable in org.apache.nutch.indexer.urlmeta
Classes in org.apache.nutch.indexer.urlmeta that implement Pluggable Modifier and Type Class and Description class URLMetaIndexingFilter
This is part of the URL Meta plugin.
Uses of Pluggable in org.apache.nutch.indexwriter.dummy
Classes in org.apache.nutch.indexwriter.dummy that implement Pluggable Modifier and Type Class and Description class DummyIndexWriter
DummyIndexWriter.
Uses of Pluggable in org.apache.nutch.indexwriter.elastic
Classes in org.apache.nutch.indexwriter.elastic that implement Pluggable Modifier and Type Class and Description class ElasticIndexWriter
Uses of Pluggable in org.apache.nutch.indexwriter.solr
Classes in org.apache.nutch.indexwriter.solr that implement Pluggable Modifier and Type Class and Description class SolrIndexWriter
Uses of Pluggable in org.apache.nutch.microformats.reltag
Classes in org.apache.nutch.microformats.reltag that implement Pluggable Modifier and Type Class and Description class RelTagIndexingFilter
An IndexingFilter that add tag field(s) to the document.
class RelTagParser
Adds microformat rel-tags of document if found.
Uses of Pluggable in org.apache.nutch.net
Subinterfaces of Pluggable in org.apache.nutch.net Modifier and Type Interface and Description interface URLFilter
Interface used to limit which URLs enter Nutch.
Uses of Pluggable in org.apache.nutch.parse
Subinterfaces of Pluggable in org.apache.nutch.parse Modifier and Type Interface and Description interface HtmlParseFilter
Extension point for DOM-based HTML parsers.
interface Parser
A parser for content generated by a Protocol implementation.
Uses of Pluggable in org.apache.nutch.parse.ext
Classes in org.apache.nutch.parse.ext that implement Pluggable Modifier and Type Class and Description class ExtParser
A wrapper that invokes external command to do real parsing job.
Uses of Pluggable in org.apache.nutch.parse.feed
Classes in org.apache.nutch.parse.feed that implement Pluggable Modifier and Type Class and Description class FeedParser
Uses of Pluggable in org.apache.nutch.parse.headings
Classes in org.apache.nutch.parse.headings that implement Pluggable Modifier and Type Class and Description class HeadingsParseFilter
HtmlParseFilter to retrieve h1 and h2 values from the DOM.
Uses of Pluggable in org.apache.nutch.parse.html
Classes in org.apache.nutch.parse.html that implement Pluggable Modifier and Type Class and Description class HtmlParser
Uses of Pluggable in org.apache.nutch.parse.js
Classes in org.apache.nutch.parse.js that implement Pluggable Modifier and Type Class and Description class JSParseFilter
This class is a heuristic link extractor for JavaScript files and code snippets.
Uses of Pluggable in org.apache.nutch.parse.metatags
Classes in org.apache.nutch.parse.metatags that implement Pluggable Modifier and Type Class and Description class MetaTagsParser
Parse HTML meta tags (keywords, description) and store them in the parse metadata so that they can be indexed with the index-metadata plugin with the prefix 'metatag.'.
Uses of Pluggable in org.apache.nutch.parse.swf
Classes in org.apache.nutch.parse.swf that implement Pluggable Modifier and Type Class and Description class SWFParser
Parser for Flash SWF files.
Uses of Pluggable in org.apache.nutch.parse.tika
Classes in org.apache.nutch.parse.tika that implement Pluggable Modifier and Type Class and Description class TikaParser
Wrapper for Tika parsers.
Uses of Pluggable in org.apache.nutch.parse.zip
Classes in org.apache.nutch.parse.zip that implement Pluggable Modifier and Type Class and Description class ZipParser
ZipParser class based on MSPowerPointParser class by Stephan Strittmatter.
Uses of Pluggable in org.apache.nutch.protocol
Subinterfaces of Pluggable in org.apache.nutch.protocol Modifier and Type Interface and Description interface Protocol
A retriever of url content.
Uses of Pluggable in org.apache.nutch.protocol.file
Classes in org.apache.nutch.protocol.file that implement Pluggable Modifier and Type Class and Description class File
This class is a protocol plugin used for file: scheme.
Uses of Pluggable in org.apache.nutch.protocol.ftp
Classes in org.apache.nutch.protocol.ftp that implement Pluggable Modifier and Type Class and Description class Ftp
This class is a protocol plugin used for ftp: scheme.
Uses of Pluggable in org.apache.nutch.protocol.http
Classes in org.apache.nutch.protocol.http that implement Pluggable Modifier and Type Class and Description class Http
Uses of Pluggable in org.apache.nutch.protocol.http.api
Classes in org.apache.nutch.protocol.http.api that implement Pluggable Modifier and Type Class and Description class HttpBase
Uses of Pluggable in org.apache.nutch.scoring
Subinterfaces of Pluggable in org.apache.nutch.scoring Modifier and Type Interface and Description interface ScoringFilter
A contract defining behavior of scoring plugins.
Classes in org.apache.nutch.scoring that implement Pluggable Modifier and Type Class and Description class AbstractScoringFilter class ScoringFilters
Creates and caches ScoringFilter implementing plugins.
Uses of Pluggable in org.apache.nutch.scoring.depth
Classes in org.apache.nutch.scoring.depth that implement Pluggable Modifier and Type Class and Description class DepthScoringFilter
This scoring filter limits the number of hops from the initial seed urls.
Uses of Pluggable in org.apache.nutch.scoring.link
Classes in org.apache.nutch.scoring.link that implement Pluggable Modifier and Type Class and Description class LinkAnalysisScoringFilter
Uses of Pluggable in org.apache.nutch.scoring.opic
Classes in org.apache.nutch.scoring.opic that implement Pluggable Modifier and Type Class and Description class OPICScoringFilter
This plugin implements a variant of an Online Page Importance Computation (OPIC) score, described in this paper: Abiteboul, Serge and Preda, Mihai and Cobena, Gregory (2003), Adaptive On-Line Page Importance Computation .
Uses of Pluggable in org.apache.nutch.scoring.tld
Classes in org.apache.nutch.scoring.tld that implement Pluggable Modifier and Type Class and Description class TLDScoringFilter
Scoring filter to boost tlds.
Uses of Pluggable in org.apache.nutch.scoring.urlmeta
Classes in org.apache.nutch.scoring.urlmeta that implement Pluggable Modifier and Type Class and Description class URLMetaScoringFilter
For documentation:
Uses of Pluggable in org.apache.nutch.urlfilter.api
Classes in org.apache.nutch.urlfilter.api that implement Pluggable Modifier and Type Class and Description class RegexURLFilterBase
Generic URL filter based on regular expressions.
Uses of Pluggable in org.apache.nutch.urlfilter.automaton
Classes in org.apache.nutch.urlfilter.automaton that implement Pluggable Modifier and Type Class and Description class AutomatonURLFilter
RegexURLFilterBase implementation based on the dk.brics.automaton Finite-State Automata for JavaTM.
Uses of Pluggable in org.apache.nutch.urlfilter.domain
Classes in org.apache.nutch.urlfilter.domain that implement Pluggable Modifier and Type Class and Description class DomainURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames.
Uses of Pluggable in org.apache.nutch.urlfilter.domainblacklist
Classes in org.apache.nutch.urlfilter.domainblacklist that implement Pluggable Modifier and Type Class and Description class DomainBlacklistURLFilter
Filters URLs based on a file containing domain suffixes, domain names, and hostnames.
Uses of Pluggable in org.apache.nutch.urlfilter.prefix
Classes in org.apache.nutch.urlfilter.prefix that implement Pluggable Modifier and Type Class and Description class PrefixURLFilter
Filters URLs based on a file of URL prefixes.
Uses of Pluggable in org.apache.nutch.urlfilter.regex
Classes in org.apache.nutch.urlfilter.regex that implement Pluggable Modifier and Type Class and Description class RegexURLFilter
Filters URLs based on a file of regular expressions using the Java Regex implementation.
Uses of Pluggable in org.apache.nutch.urlfilter.suffix
Classes in org.apache.nutch.urlfilter.suffix that implement Pluggable Modifier and Type Class and Description class SuffixURLFilter
Filters URLs based on a file of URL suffixes.
Uses of Pluggable in org.apache.nutch.urlfilter.validator
Classes in org.apache.nutch.urlfilter.validator that implement Pluggable Modifier and Type Class and Description class UrlValidator
Validates URLs.
Uses of Pluggable in org.creativecommons.nutch
Classes in org.creativecommons.nutch that implement Pluggable Modifier and Type Class and Description class CCIndexingFilter
Adds basic searchable fields to a document.
class CCParseFilter
Adds metadata identifying the Creative Commons license used, if any.
- Prev
- Next
