- Prev
- Next
Uses of Class
org.apache.nutch.indexer.IndexingException
Packages that use IndexingException Package Description org.apache.nutch.analysis.lang
Text document language identifier. org.apache.nutch.indexer
Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index. org.apache.nutch.indexer.anchor
An indexing plugin for inbound anchor text. org.apache.nutch.indexer.basic
A basic indexing plugin, adds basic fields: url, host, title, content, etc. org.apache.nutch.indexer.feed
Indexing filter to index meta data from RSS feeds. org.apache.nutch.indexer.metadata
Indexing filter to add document metadata to the index. org.apache.nutch.indexer.more
A more indexing plugin, adds "more" index fields: last modified date, MIME type, content length. org.apache.nutch.indexer.staticfield
A simple plugin called at indexing that adds fields with static data. org.apache.nutch.indexer.subcollection
Indexing filter to assign documents to subcollections. org.apache.nutch.indexer.tld
Top Level Domain Indexing plugin. org.apache.nutch.indexer.urlmeta
URL Meta Tag Indexing Plugin org.apache.nutch.microformats.reltag
A microformats Rel-Tag Parser/Indexer/Querier plugin. org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
Uses of IndexingException in org.apache.nutch.analysis.lang
Methods in org.apache.nutch.analysis.lang that throw IndexingException Modifier and Type Method and Description NutchDocument
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of IndexingException in org.apache.nutch.indexer
Methods in org.apache.nutch.indexer that throw IndexingException Modifier and Type Method and Description NutchDocument
IndexingFilters.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters.
NutchDocument
IndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse.
Uses of IndexingException in org.apache.nutch.indexer.anchor
Methods in org.apache.nutch.indexer.anchor that throw IndexingException Modifier and Type Method and Description NutchDocument
AnchorIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The AnchorIndexingFilter
filter object which supports boolean configuration settings for the deduplication of anchors.
Uses of IndexingException in org.apache.nutch.indexer.basic
Methods in org.apache.nutch.indexer.basic that throw IndexingException Modifier and Type Method and Description NutchDocument
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The BasicIndexingFilter
filter object which supports few configuration settings for adding basic searchable fields.
Uses of IndexingException in org.apache.nutch.indexer.feed
Methods in org.apache.nutch.indexer.feed that throw IndexingException Modifier and Type Method and Description NutchDocument
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer
for indexing within the Nutch index.
Uses of IndexingException in org.apache.nutch.indexer.metadata
Methods in org.apache.nutch.indexer.metadata that throw IndexingException Modifier and Type Method and Description NutchDocument
MetadataIndexer.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of IndexingException in org.apache.nutch.indexer.more
Methods in org.apache.nutch.indexer.more that throw IndexingException Modifier and Type Method and Description NutchDocument
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of IndexingException in org.apache.nutch.indexer.staticfield
Methods in org.apache.nutch.indexer.staticfield that throw IndexingException Modifier and Type Method and Description NutchDocument
StaticFieldIndexer.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The StaticFieldIndexer
filter object which adds fields as per configuration setting.
Uses of IndexingException in org.apache.nutch.indexer.subcollection
Methods in org.apache.nutch.indexer.subcollection that throw IndexingException Modifier and Type Method and Description NutchDocument
SubcollectionIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of IndexingException in org.apache.nutch.indexer.tld
Methods in org.apache.nutch.indexer.tld that throw IndexingException Modifier and Type Method and Description NutchDocument
TLDIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text urlText,
CrawlDatum datum,
Inlinks inlinks)
Uses of IndexingException in org.apache.nutch.indexer.urlmeta
Methods in org.apache.nutch.indexer.urlmeta that throw IndexingException Modifier and Type Method and Description NutchDocument
URLMetaIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.
Uses of IndexingException in org.apache.nutch.microformats.reltag
Methods in org.apache.nutch.microformats.reltag that throw IndexingException Modifier and Type Method and Description NutchDocument
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of IndexingException in org.creativecommons.nutch
Methods in org.creativecommons.nutch that throw IndexingException Modifier and Type Method and Description NutchDocument
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
- Prev
- Next