[TOC]

org.apache.nutch.crawl

Class URLPartitioner

    • All Implemented Interfaces:
    • org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Partitioner

public class URLPartitioner
extends Object
implements org.apache.hadoop.mapred.Partitioner<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>

Partition urls by host, domain name or IP depending on the value of the parameter 'partition.url.mode' which can be 'byHost', 'byDomain' or 'byIP'

Field Summary

Fields Modifier and Type Field and Description static String PARTITION_MODE_DOMAIN static String PARTITION_MODE_HOST static String PARTITION_MODE_IP static String PARTITION_MODE_KEY

Constructor Summary

Constructors Constructor and Description URLPartitioner()

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf job) int getPartition(org.apache.hadoop.io.Text key, org.apache.hadoop.io.Writable value, int numReduceTasks) Hash by domain name.

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

-  

PARTITION_MODE_KEY

public static final String PARTITION_MODE_KEY
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_KEY)       
-  

PARTITION_MODE_HOST

public static final String PARTITION_MODE_HOST
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_HOST)       
-  

PARTITION_MODE_DOMAIN

public static final String PARTITION_MODE_DOMAIN
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_DOMAIN)       
-  

PARTITION_MODE_IP

public static final String PARTITION_MODE_IP
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_IP)       

Constructor Detail

-  

URLPartitioner

public URLPartitioner()

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

close

public void close()
-  

getPartition

public int getPartition(org.apache.hadoop.io.Text key,
               org.apache.hadoop.io.Writable value,
               int numReduceTasks)

Hash by domain name.

  - Specified by: 
  - <code>getPartition</code> in interface <code>org.apache.hadoop.mapred.Partitioner<org.apache.hadoop.io.text,org.apache.hadoop.io.writable></org.apache.hadoop.io.text,org.apache.hadoop.io.writable></code>       

Copyright © 2014 The Apache Software Foundation