[TOC]

org.apache.nutch.protocol.ftp

Class FtpRobotRulesParser

    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable

public class FtpRobotRulesParser
extends RobotRulesParser

This class is used for parsing robots for urls belonging to FTP protocol. It extends the generic RobotRulesParser class and contains Ftp protocol specific implementation for obtaining the robots file.

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

-    

Fields inherited from class org.apache.nutch.protocol.RobotRulesParser

agentNames, CACHE, EMPTY_RULES, FORBID_ALL_RULES

Constructor Summary

Constructors Constructor and Description FtpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)

Method Summary

Methods Modifier and Type Method and Description crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol ftp, URL url) The hosts for which the caching of robots rules is yet to be done, it sends a Ftp request to the host corresponding to the URL passed, gets robots file, parses the rules and caches the rules object to avoid re-work in future.

-    

Methods inherited from class org.apache.nutch.protocol.RobotRulesParser

getConf, getRobotRulesSet, main, parseRules, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

FtpRobotRulesParser

public FtpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)

Method Detail

-  

getRobotRulesSet

public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol ftp,
                                                    URL url)

The hosts for which the caching of robots rules is yet to be done, it sends a Ftp request to the host corresponding to the URL passed, gets robots file, parses the rules and caches the rules object to avoid re-work in future.

  - Specified by: 
  - <code>getRobotRulesSet</code> in class <code>RobotRulesParser</code> 
  - Parameters:
  - <code>ftp</code> - The [<code>Protocol</code>](../../../../../org/apache/nutch/protocol/Protocol.html) object
  - <code>url</code> - URL 
  - Returns:
  - robotRules A <code>BaseRobotRules</code> object for the rules      

Copyright © 2014 The Apache Software Foundation