[TOC]

  • Prev
  • Next

Uses of Class

org.apache.nutch.parse.ParseData

Uses of ParseData in org.apache.nutch.crawl

Methods in org.apache.nutch.crawl with parameters of type ParseData Modifier and Type Method and Description void LinkDb.map(org.apache.hadoop.io.Text key, ParseData parseData, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter)

Uses of ParseData in org.apache.nutch.parse

Methods in org.apache.nutch.parse that return ParseData Modifier and Type Method and Description ParseData ParseImpl.getData() ParseData Parse.getData() Other data extracted from the page. static ParseData ParseData.read(DataInput in)

Methods in org.apache.nutch.parse with parameters of type ParseData Modifier and Type Method and Description void ParseResult.put(String key, ParseText text, ParseData data) Store a result of parsing. void ParseResult.put(org.apache.hadoop.io.Text key, ParseText text, ParseData data) Store a result of parsing.

Constructors in org.apache.nutch.parse with parameters of type ParseData Constructor and Description ParseImpl(ParseText text, ParseData data) ParseImpl(ParseText text, ParseData data, boolean isCanonical) ParseImpl(String text, ParseData data)

Uses of ParseData in org.apache.nutch.scoring

Methods in org.apache.nutch.scoring with parameters of type ParseData Modifier and Type Method and Description CrawlDatum ScoringFilters.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount) CrawlDatum ScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount) Distribute score value from the current page to all its outlinked pages. CrawlDatum AbstractScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount)

Uses of ParseData in org.apache.nutch.scoring.depth

Methods in org.apache.nutch.scoring.depth with parameters of type ParseData Modifier and Type Method and Description CrawlDatum DepthScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount)

Uses of ParseData in org.apache.nutch.scoring.link

Methods in org.apache.nutch.scoring.link with parameters of type ParseData Modifier and Type Method and Description CrawlDatum LinkAnalysisScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount)

Uses of ParseData in org.apache.nutch.scoring.opic

Methods in org.apache.nutch.scoring.opic with parameters of type ParseData Modifier and Type Method and Description CrawlDatum OPICScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount) Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply.

Uses of ParseData in org.apache.nutch.scoring.tld

Methods in org.apache.nutch.scoring.tld with parameters of type ParseData Modifier and Type Method and Description CrawlDatum TLDScoringFilter.distributeScoreToOutlink(org.apache.hadoop.io.Text fromUrl, org.apache.hadoop.io.Text toUrl, ParseData parseData, CrawlDatum target, CrawlDatum adjust, int allCount, int validCount) CrawlDatum TLDScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount)

Uses of ParseData in org.apache.nutch.scoring.urlmeta

Methods in org.apache.nutch.scoring.urlmeta with parameters of type ParseData Modifier and Type Method and Description CrawlDatum URLMetaScoringFilter.distributeScoreToOutlinks(org.apache.hadoop.io.Text fromUrl, ParseData parseData, Collection> targets, CrawlDatum adjust, int allCount) This will take the metatags that you have listed in your "urlmeta.tags" property, and looks for them inside the parseData object.

Uses of ParseData in org.apache.nutch.segment

Methods in org.apache.nutch.segment with parameters of type ParseData Modifier and Type Method and Description boolean SegmentMergeFilters.filter(org.apache.hadoop.io.Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection linked) Iterates over all SegmentMergeFilter extensions and if any of them returns false, it will return false as well. boolean SegmentMergeFilter.filter(org.apache.hadoop.io.Text key, CrawlDatum generateData, CrawlDatum fetchData, CrawlDatum sigData, Content content, ParseData parseData, ParseText parseText, Collection linked) The filtering method which gets all information being merged for a given key (URL).

  • Prev
  • Next

Copyright © 2014 The Apache Software Foundation