- Prev Class
- Next Class
org.apache.nutch.crawl
Class URLPartitioner
- java.lang.Object
- org.apache.nutch.crawl.URLPartitioner
- All Implemented Interfaces:
- org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Partitioner
public class URLPartitioner extends Object implements org.apache.hadoop.mapred.Partitioner<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>
Partition urls by host, domain name or IP depending on the value of the parameter 'partition.url.mode' which can be 'byHost', 'byDomain' or 'byIP'
Field Summary
Fields Modifier and Type Field and Description static String
PARTITION_MODE_DOMAIN
static String
PARTITION_MODE_HOST
static String
PARTITION_MODE_IP
static String
PARTITION_MODE_KEY
Constructor Summary
Constructors Constructor and Description URLPartitioner()
Method Summary
Methods Modifier and Type Method and Description void
close()
void
configure(org.apache.hadoop.mapred.JobConf job)
int
getPartition(org.apache.hadoop.io.Text key,
org.apache.hadoop.io.Writable value,
int numReduceTasks)
Hash by domain name.
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
PARTITION_MODE_KEY
public static final String PARTITION_MODE_KEY
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_KEY)
-
PARTITION_MODE_HOST
public static final String PARTITION_MODE_HOST
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_HOST)
-
PARTITION_MODE_DOMAIN
public static final String PARTITION_MODE_DOMAIN
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_DOMAIN)
-
PARTITION_MODE_IP
public static final String PARTITION_MODE_IP
- See Also:
- [Constant Field Values](../../../../constant-values.html#org.apache.nutch.crawl.URLPartitioner.PARTITION_MODE_IP)
Constructor Detail
-
URLPartitioner
public URLPartitioner()
Method Detail
-
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
- <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>
-
close
public void close()
-
getPartition
public int getPartition(org.apache.hadoop.io.Text key, org.apache.hadoop.io.Writable value, int numReduceTasks)
Hash by domain name.
- Specified by:
- <code>getPartition</code> in interface <code>org.apache.hadoop.mapred.Partitioner<org.apache.hadoop.io.text,org.apache.hadoop.io.writable></org.apache.hadoop.io.text,org.apache.hadoop.io.writable></code>
- Prev Class
- Next Class