- Prev
- Next
Uses of Package
org.apache.nutch.protocol
Packages that use org.apache.nutch.protocol Package Description org.apache.nutch.analysis.lang
Text document language identifier. org.apache.nutch.crawl
Crawl control code and tools to run the crawler. org.apache.nutch.microformats.reltag
A microformats Rel-Tag Parser/Indexer/Querier plugin. org.apache.nutch.parse
TheParse
interface and related classes. org.apache.nutch.parse.ext
Parse wrapper to run external command to do the parsing. org.apache.nutch.parse.feed
Parse RSS feeds. org.apache.nutch.parse.headings
Parse filter to extract headings (h1, h2, etc.) from DOM parse tree. org.apache.nutch.parse.html
An HTML document parsing plugin. org.apache.nutch.parse.js
Parser and parse filter plugin to extract all (possible) links from JavaScript files and embedded JavaScript code snippets. org.apache.nutch.parse.metatags
Parse filter to extract meta tags: keywords, description, etc. org.apache.nutch.parse.swf
Parse Flash SWF files. org.apache.nutch.parse.tika
Parse various document formats with help of Apache Tika. org.apache.nutch.parse.zip
Parse ZIP files: embedded files are recursively passed to appropriate parsers. org.apache.nutch.protocol
Classes related to theProtocol
interface, see alsoorg.apache.nutch.net.protocols
. org.apache.nutch.protocol.file
Protocol plugin which supports retrieving local file resources. org.apache.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol. org.apache.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol. org.apache.nutch.protocol.http.api
Common API used by HTTP plugins (http
,httpclient
) org.apache.nutch.scoring
TheScoringFilter
interface. org.apache.nutch.scoring.depth
Scoring filter to stop crawling at a configurable depth (number of "hops" from seed URLs). org.apache.nutch.scoring.link
Scoring filter used in conjunction withWebGraph
. org.apache.nutch.scoring.opic
Scoring filter implementing a variant of the Online Page Importance Computation (OPIC) algorithm. org.apache.nutch.scoring.tld
Top Level Domain Scoring plugin. org.apache.nutch.scoring.urlmeta
URL Meta Tag Scoring Plugin org.apache.nutch.segment
A segment stores all data from on generate/fetch/update cycle: fetch list, protocol status, raw content, parsed content, and extracted outgoing links. org.apache.nutch.util
Miscellaneous utility classes. org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.
Classes in org.apache.nutch.protocol used by org.apache.nutch.analysis.lang Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.crawl Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.microformats.reltag Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.ext Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.feed Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.headings Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.html Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.js Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.metatags Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.swf Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.tika Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.parse.zip Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.protocol Class and Description Content Protocol A retriever of url content. ProtocolException ProtocolNotFound ProtocolOutput Simple aggregate to pass from protocol plugins both content and protocol status. ProtocolStatus
Classes in org.apache.nutch.protocol used by org.apache.nutch.protocol.file Class and Description Content Protocol A retriever of url content. ProtocolException ProtocolOutput Simple aggregate to pass from protocol plugins both content and protocol status.
Classes in org.apache.nutch.protocol used by org.apache.nutch.protocol.ftp Class and Description Content Protocol A retriever of url content. ProtocolException ProtocolOutput Simple aggregate to pass from protocol plugins both content and protocol status. RobotRulesParser This class uses crawler-commons for handling the parsing of
robots.txt
files.
Classes in org.apache.nutch.protocol used by org.apache.nutch.protocol.http Class and Description Protocol A retriever of url content. ProtocolException
Classes in org.apache.nutch.protocol used by org.apache.nutch.protocol.http.api Class and Description Protocol A retriever of url content. ProtocolException ProtocolOutput Simple aggregate to pass from protocol plugins both content and protocol status. RobotRulesParser This class uses crawler-commons for handling the parsing of
robots.txt
files.
Classes in org.apache.nutch.protocol used by org.apache.nutch.scoring Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.scoring.depth Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.scoring.link Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.scoring.opic Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.scoring.tld Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.scoring.urlmeta Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.segment Class and Description Content
Classes in org.apache.nutch.protocol used by org.apache.nutch.util Class and Description Content
Classes in org.apache.nutch.protocol used by org.creativecommons.nutch Class and Description Content
- Prev
- Next