[TOC]

org.apache.nutch.protocol.httpclient

Class Http

    • All Implemented Interfaces:
    • org.apache.hadoop.conf.Configurable, Pluggable, Protocol

public class Http
extends HttpBase

This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. It takes care of HTTPS protocol as well as cookies in a single fetch session.

  • Author:
  • Susam Pal

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG

-    

Fields inherited from class org.apache.nutch.protocol.http.api.HttpBase

accept, acceptLanguage, BUFFER_SIZE, maxContent, maxCrawlDelay, proxyHost, proxyPort, RESPONSE_TIME, responseTime, timeout, tlsPreferredCipherSuites, tlsPreferredProtocols, useHttp11, useProxy, userAgent

-    

Fields inherited from interface org.apache.nutch.protocol.Protocol

CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID

Constructor Summary

Constructors Constructor and Description Http() Constructs this plugin.

Method Summary

Methods Modifier and Type Method and Description protected Response getResponse(URL url, CrawlDatum datum, boolean redirect) Fetches the url with a configured HTTP client and gets the response. static void main(String[] args) Main method. void setConf(org.apache.hadoop.conf.Configuration conf) Reads the configuration from the Nutch configuration files and sets the configuration.

-    

Methods inherited from class org.apache.nutch.protocol.http.api.HttpBase

getAccept, getAcceptLanguage, getConf, getMaxContent, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getTlsPreferredCipherSuites, getTlsPreferredProtocols, getUseHttp11, getUserAgent, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG

Constructor Detail

-  

Http

public Http()

Constructs this plugin.

Method Detail

-  

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)

Reads the configuration from the Nutch configuration files and sets the configuration.

  - Specified by: 
  - <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code> 
  - Overrides: 
  - <code>setConf</code> in class <code>HttpBase</code> 
  - Parameters:
  - <code>conf</code> - Configuration       
-  

main

public static void main(String[] args)
                 throws Exception

Main method.

  - Parameters:
  - <code>args</code> - Command line arguments 
  - Throws: 
  - <code>Exception</code>       
-  

getResponse

protected Response getResponse(URL url,
                   CrawlDatum datum,
                   boolean redirect)
                        throws ProtocolException,
                               IOException

Fetches the url with a configured HTTP client and gets the response.

  - Specified by: 
  - <code>getResponse</code> in class <code>HttpBase</code> 
  - Parameters:
  - <code>url</code> - URL to be fetched
  - <code>datum</code> - Crawl data
  - <code>redirect</code> - Follow redirects if and only if true 
  - Returns:
  - HTTP response 
  - Throws: 
  - <code>ProtocolException</code> 
  - <code>IOException</code>      

Copyright © 2014 The Apache Software Foundation