org.apache.nutch.protocol.ftp
Class FtpRobotRulesParser
- java.lang.Object
- org.apache.nutch.protocol.RobotRulesParser
- org.apache.nutch.protocol.ftp.FtpRobotRulesParser
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable
public class FtpRobotRulesParser extends RobotRulesParser
This class is used for parsing robots for urls belonging to FTP protocol. It extends the generic RobotRulesParser
class and contains Ftp protocol specific implementation for obtaining the robots file.
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
-
Fields inherited from class org.apache.nutch.protocol.RobotRulesParser
agentNames, CACHE, EMPTY_RULES, FORBID_ALL_RULES
Constructor Summary
Constructors Constructor and Description FtpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)
Method Summary
Methods Modifier and Type Method and Description crawlercommons.robots.BaseRobotRules
getRobotRulesSet(Protocol ftp,
URL url)
The hosts for which the caching of robots rules is yet to be done, it sends a Ftp request to the host corresponding to the URL
passed, gets robots file, parses the rules and caches the rules object to avoid re-work in future.
-
Methods inherited from class org.apache.nutch.protocol.RobotRulesParser
getConf, getRobotRulesSet, main, parseRules, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
FtpRobotRulesParser
public FtpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)
Method Detail
-
getRobotRulesSet
public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol ftp, URL url)
The hosts for which the caching of robots rules is yet to be done, it sends a Ftp request to the host corresponding to the URL
passed, gets robots file, parses the rules and caches the rules object to avoid re-work in future.
- Specified by:
- <code>getRobotRulesSet</code> in class <code>RobotRulesParser</code>
- Parameters:
- <code>ftp</code> - The [<code>Protocol</code>](../../../../../org/apache/nutch/protocol/Protocol.html) object
- <code>url</code> - URL
- Returns:
- robotRules A <code>BaseRobotRules</code> object for the rules