- Prev
- Next
Uses of Class
org.apache.nutch.parse.ParseData
Packages that use ParseData Package Description org.apache.nutch.crawl
Crawl control code and tools to run the crawler. org.apache.nutch.parse
TheParse
interface and related classes. org.apache.nutch.scoring
TheScoringFilter
interface. org.apache.nutch.scoring.depth
Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs). org.apache.nutch.scoring.link
Scoring filter used in conjunction withWebGraph
. org.apache.nutch.scoring.opic
Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm. org.apache.nutch.scoring.tld
Top Level Domain Scoring plugin. org.apache.nutch.scoring.urlmeta
URL Meta Tag Scoring Plugin org.apache.nutch.segment
A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links.
Uses of ParseData in org.apache.nutch.crawl
Methods in org.apache.nutch.crawl with parameters of type ParseData Modifier and Type Method and Description void
LinkDb.map(org.apache.hadoop.io.Text key,
ParseData parseData,
org.apache.hadoop.mapred.OutputCollector
Uses of ParseData in org.apache.nutch.parse
Methods in org.apache.nutch.parse that return ParseData Modifier and Type Method and Description ParseData
ParseImpl.getData()
ParseData
Parse.getData()
Other data extracted from the page.
static ParseData
ParseData.read(DataInput in)
Methods in org.apache.nutch.parse with parameters of type ParseData Modifier and Type Method and Description void
ParseResult.put(String key,
ParseText text,
ParseData data)
Store a result of parsing.
void
ParseResult.put(org.apache.hadoop.io.Text key,
ParseText text,
ParseData data)
Store a result of parsing.
Constructors in org.apache.nutch.parse with parameters of type ParseData Constructor and Description ParseImpl(ParseText text,
ParseData data)
ParseImpl(ParseText text,
ParseData data,
boolean isCanonical)
ParseImpl(String text,
ParseData data)
Uses of ParseData in org.apache.nutch.scoring
Methods in org.apache.nutch.scoring with parameters of type ParseData Modifier and Type Method and Description CrawlDatum
ScoringFilters.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
CrawlDatum
ScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
Distribute score value from the current page to all its outlinked pages.
CrawlDatum
AbstractScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
Uses of ParseData in org.apache.nutch.scoring.depth
Methods in org.apache.nutch.scoring.depth with parameters of type ParseData Modifier and Type Method and Description CrawlDatum
DepthScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
Uses of ParseData in org.apache.nutch.scoring.link
Methods in org.apache.nutch.scoring.link with parameters of type ParseData Modifier and Type Method and Description CrawlDatum
LinkAnalysisScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
Uses of ParseData in org.apache.nutch.scoring.opic
Methods in org.apache.nutch.scoring.opic with parameters of type ParseData Modifier and Type Method and Description CrawlDatum
OPICScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.
Uses of ParseData in org.apache.nutch.scoring.tld
Methods in org.apache.nutch.scoring.tld with parameters of type ParseData Modifier and Type Method and Description CrawlDatum
TLDScoringFilter.distributeScoreToOutlink(org.apache.hadoop.io.Text fromUrl,
org.apache.hadoop.io.Text toUrl,
ParseData parseData,
CrawlDatum target,
CrawlDatum adjust,
int allCount,
int validCount)
CrawlDatum
TLDScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
Uses of ParseData in org.apache.nutch.scoring.urlmeta
Methods in org.apache.nutch.scoring.urlmeta with parameters of type ParseData Modifier and Type Method and Description CrawlDatum
URLMetaScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl,
ParseData parseData,
Collection
This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object.
Uses of ParseData in org.apache.nutch.segment
Methods in org.apache.nutch.segment with parameters of type ParseData Modifier and Type Method and Description boolean
SegmentMergeFilters.filter(org.apache.hadoop.io.Text key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection
Iterates over all SegmentMergeFilter
extensions and if any of them returns false, it will return false as well.
boolean
SegmentMergeFilter.filter(org.apache.hadoop.io.Text key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection
The filtering method which gets all information being merged for a given key (URL).
- Prev
- Next