org.apache.nutch.protocol.httpclient
Class Http
- java.lang.Object
- org.apache.nutch.protocol.http.api.HttpBase
- org.apache.nutch.protocol.httpclient.Http
public class Http extends HttpBase
This class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. It takes care of HTTPS protocol as well as cookies in a single fetch session.
- Author:
- Susam Pal
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
-
Fields inherited from class org.apache.nutch.protocol.http.api.HttpBase
accept, acceptLanguage, BUFFER_SIZE, maxContent, maxCrawlDelay, proxyHost, proxyPort, RESPONSE_TIME, responseTime, timeout, tlsPreferredCipherSuites, tlsPreferredProtocols, useHttp11, useProxy, userAgent
-
Fields inherited from interface org.apache.nutch.protocol.Protocol
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID
Constructor Summary
Constructors Constructor and Description Http()
Constructs this plugin.
Method Summary
Methods Modifier and Type Method and Description protected Response
getResponse(URL url,
CrawlDatum datum,
boolean redirect)
Fetches the url
with a configured HTTP client and gets the response.
static void
main(String[] args)
Main method.
void
setConf(org.apache.hadoop.conf.Configuration conf)
Reads the configuration from the Nutch configuration files and sets the configuration.
-
Methods inherited from class org.apache.nutch.protocol.http.api.HttpBase
getAccept, getAcceptLanguage, getConf, getMaxContent, getProtocolOutput, getProxyHost, getProxyPort, getRobotRules, getTimeout, getTlsPreferredCipherSuites, getTlsPreferredProtocols, getUseHttp11, getUserAgent, logConf, main, processDeflateEncoded, processGzipEncoded, useProxy
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
Http
public Http()
Constructs this plugin.
Method Detail
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf)
Reads the configuration from the Nutch configuration files and sets the configuration.
- Specified by:
- <code>setConf</code> in interface <code>org.apache.hadoop.conf.Configurable</code>
- Overrides:
- <code>setConf</code> in class <code>HttpBase</code>
- Parameters:
- <code>conf</code> - Configuration
-
main
public static void main(String[] args) throws Exception
Main method.
- Parameters:
- <code>args</code> - Command line arguments
- Throws:
- <code>Exception</code>
-
getResponse
protected Response getResponse(URL url, CrawlDatum datum, boolean redirect) throws ProtocolException, IOException
Fetches the url
with a configured HTTP client and gets the response.
- Specified by:
- <code>getResponse</code> in class <code>HttpBase</code>
- Parameters:
- <code>url</code> - URL to be fetched
- <code>datum</code> - Crawl data
- <code>redirect</code> - Follow redirects if and only if true
- Returns:
- HTTP response
- Throws:
- <code>ProtocolException</code>
- <code>IOException</code>