- Prev
- Next
Uses of Class
org.apache.nutch.indexer.NutchDocument
Packages that use NutchDocument Package Description org.apache.nutch.analysis.lang
Text document language identifier. org.apache.nutch.indexer
Index content, configure and run indexing and cleaning jobs to add, update, and delete documents from an index. org.apache.nutch.indexer.anchor
An indexing plugin for inbound anchor text. org.apache.nutch.indexer.basic
A basic indexing plugin, adds basic fields: url, host, title, content, etc. org.apache.nutch.indexer.feed
Indexing filter to index meta data from RSS feeds. org.apache.nutch.indexer.metadata
Indexing filter to add document metadata to the index. org.apache.nutch.indexer.more
A more indexing plugin, adds "more" index fields: last modified date, MIME type, content length. org.apache.nutch.indexer.staticfield
A simple plugin called at indexing that adds fields with static data. org.apache.nutch.indexer.subcollection
Indexing filter to assign documents to subcollections. org.apache.nutch.indexer.tld
Top Level Domain Indexing plugin. org.apache.nutch.indexer.urlmeta
URL Meta Tag Indexing Plugin org.apache.nutch.indexwriter.dummy
Index writer plugin for debugging, writes pairs ofto a text file, action is one of "add", "update", or "delete". org.apache.nutch.indexwriter.elastic
Index writer plugin for Elasticsearch. org.apache.nutch.indexwriter.solr
Index writer plugin for Apache Solr. org.apache.nutch.microformats.reltag
A microformats Rel-Tag Parser/Indexer/Querier plugin. org.apache.nutch.scoring
TheScoringFilter
interface. org.apache.nutch.scoring.depth
Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs). org.apache.nutch.scoring.link
Scoring filter used in conjunction withWebGraph
. org.apache.nutch.scoring.opic
Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm. org.apache.nutch.scoring.tld
Top Level Domain Scoring plugin. org.apache.nutch.scoring.urlmeta
URL Meta Tag Scoring Plugin org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
Uses of NutchDocument in org.apache.nutch.analysis.lang
Methods in org.apache.nutch.analysis.lang that return NutchDocument Modifier and Type Method and Description NutchDocument
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.apache.nutch.analysis.lang with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of NutchDocument in org.apache.nutch.indexer
Fields in org.apache.nutch.indexer declared as NutchDocument Modifier and Type Field and Description NutchDocument
NutchIndexAction.doc
Methods in org.apache.nutch.indexer that return NutchDocument Modifier and Type Method and Description NutchDocument
IndexingFilters.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters.
NutchDocument
IndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse.
Methods in org.apache.nutch.indexer with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
IndexingFilters.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters.
NutchDocument
IndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse.
void
IndexWriters.update(NutchDocument doc)
void
IndexWriter.update(NutchDocument doc)
void
IndexWriters.write(NutchDocument doc)
void
IndexWriter.write(NutchDocument doc)
Constructors in org.apache.nutch.indexer with parameters of type NutchDocument Constructor and Description NutchIndexAction(NutchDocument doc,
byte action)
Uses of NutchDocument in org.apache.nutch.indexer.anchor
Methods in org.apache.nutch.indexer.anchor that return NutchDocument Modifier and Type Method and Description NutchDocument
AnchorIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The AnchorIndexingFilter
filter object which supports boolean configuration settings for the deduplication of anchors.
Methods in org.apache.nutch.indexer.anchor with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
AnchorIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The AnchorIndexingFilter
filter object which supports boolean configuration settings for the deduplication of anchors.
Uses of NutchDocument in org.apache.nutch.indexer.basic
Methods in org.apache.nutch.indexer.basic that return NutchDocument Modifier and Type Method and Description NutchDocument
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The BasicIndexingFilter
filter object which supports few configuration settings for adding basic searchable fields.
Methods in org.apache.nutch.indexer.basic with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The BasicIndexingFilter
filter object which supports few configuration settings for adding basic searchable fields.
Uses of NutchDocument in org.apache.nutch.indexer.feed
Methods in org.apache.nutch.indexer.feed that return NutchDocument Modifier and Type Method and Description NutchDocument
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer
for indexing within the Nutch index.
Methods in org.apache.nutch.indexer.feed with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
FeedIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields: FEED_AUTHOR FEED_TAGS FEED_PUBLISHED FEED_UPDATED FEED And sends them to the Indexer
for indexing within the Nutch index.
Uses of NutchDocument in org.apache.nutch.indexer.metadata
Methods in org.apache.nutch.indexer.metadata that return NutchDocument Modifier and Type Method and Description NutchDocument
MetadataIndexer.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.apache.nutch.indexer.metadata with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
MetadataIndexer.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of NutchDocument in org.apache.nutch.indexer.more
Methods in org.apache.nutch.indexer.more that return NutchDocument Modifier and Type Method and Description NutchDocument
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.apache.nutch.indexer.more with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of NutchDocument in org.apache.nutch.indexer.staticfield
Methods in org.apache.nutch.indexer.staticfield that return NutchDocument Modifier and Type Method and Description NutchDocument
StaticFieldIndexer.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The StaticFieldIndexer
filter object which adds fields as per configuration setting.
Methods in org.apache.nutch.indexer.staticfield with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
StaticFieldIndexer.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The StaticFieldIndexer
filter object which adds fields as per configuration setting.
Uses of NutchDocument in org.apache.nutch.indexer.subcollection
Methods in org.apache.nutch.indexer.subcollection that return NutchDocument Modifier and Type Method and Description NutchDocument
SubcollectionIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.apache.nutch.indexer.subcollection with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
SubcollectionIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of NutchDocument in org.apache.nutch.indexer.tld
Methods in org.apache.nutch.indexer.tld that return NutchDocument Modifier and Type Method and Description NutchDocument
TLDIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text urlText,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.apache.nutch.indexer.tld with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
TLDIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text urlText,
CrawlDatum datum,
Inlinks inlinks)
Uses of NutchDocument in org.apache.nutch.indexer.urlmeta
Methods in org.apache.nutch.indexer.urlmeta that return NutchDocument Modifier and Type Method and Description NutchDocument
URLMetaIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.
Methods in org.apache.nutch.indexer.urlmeta with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
URLMetaIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the CrawlDatum object.
Uses of NutchDocument in org.apache.nutch.indexwriter.dummy
Methods in org.apache.nutch.indexwriter.dummy with parameters of type NutchDocument Modifier and Type Method and Description void
DummyIndexWriter.update(NutchDocument doc)
void
DummyIndexWriter.write(NutchDocument doc)
Uses of NutchDocument in org.apache.nutch.indexwriter.elastic
Methods in org.apache.nutch.indexwriter.elastic with parameters of type NutchDocument Modifier and Type Method and Description void
ElasticIndexWriter.update(NutchDocument doc)
void
ElasticIndexWriter.write(NutchDocument doc)
Uses of NutchDocument in org.apache.nutch.indexwriter.solr
Methods in org.apache.nutch.indexwriter.solr with parameters of type NutchDocument Modifier and Type Method and Description void
SolrIndexWriter.update(NutchDocument doc)
void
SolrIndexWriter.write(NutchDocument doc)
Uses of NutchDocument in org.apache.nutch.microformats.reltag
Methods in org.apache.nutch.microformats.reltag that return NutchDocument Modifier and Type Method and Description NutchDocument
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.apache.nutch.microformats.reltag with parameters of type NutchDocument Modifier and Type Method and Description NutchDocument
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Uses of NutchDocument in org.apache.nutch.scoring
Methods in org.apache.nutch.scoring with parameters of type NutchDocument Modifier and Type Method and Description float
ScoringFilters.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
float
ScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a Lucene document boost.
float
AbstractScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Uses of NutchDocument in org.apache.nutch.scoring.depth
Methods in org.apache.nutch.scoring.depth with parameters of type NutchDocument Modifier and Type Method and Description float
DepthScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Uses of NutchDocument in org.apache.nutch.scoring.link
Methods in org.apache.nutch.scoring.link with parameters of type NutchDocument Modifier and Type Method and Description float
LinkAnalysisScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Uses of NutchDocument in org.apache.nutch.scoring.opic
Methods in org.apache.nutch.scoring.opic with parameters of type NutchDocument Modifier and Type Method and Description float
OPICScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower.
Uses of NutchDocument in org.apache.nutch.scoring.tld
Methods in org.apache.nutch.scoring.tld with parameters of type NutchDocument Modifier and Type Method and Description float
TLDScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Uses of NutchDocument in org.apache.nutch.scoring.urlmeta
Methods in org.apache.nutch.scoring.urlmeta with parameters of type NutchDocument Modifier and Type Method and Description float
URLMetaScoringFilter.indexerScore(org.apache.hadoop.io.Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Boilerplate
Uses of NutchDocument in org.creativecommons.nutch
Methods in org.creativecommons.nutch that return NutchDocument Modifier and Type Method and Description NutchDocument
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Methods in org.creativecommons.nutch with parameters of type NutchDocument Modifier and Type Method and Description void
CCIndexingFilter.addUrlFeatures(NutchDocument doc,
String urlString)
Add the features represented by a license URL.
NutchDocument
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
- Prev
- Next