[TOC]

org.apache.nutch.parse

Class ParseSegment

  • java.lang.Object
    • org.apache.hadoop.conf.Configured
    • org.apache.nutch.parse.ParseSegment
    • All Implemented Interfaces:
    • Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper,Content,org.apache.hadoop.io.Text,ParseImpl>, org.apache.hadoop.mapred.Reducer, org.apache.hadoop.util.Tool

public class ParseSegment
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.WritableComparable<?>,Content,org.apache.hadoop.io.Text,ParseImpl>, org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable>

Field Summary

Fields Modifier and Type Field and Description static org.slf4j.Logger LOG static String SKIP_TRUNCATED

Constructor Summary

Constructors Constructor and Description ParseSegment() ParseSegment(org.apache.hadoop.conf.Configuration conf)

Method Summary

Methods Modifier and Type Method and Description void close() void configure(org.apache.hadoop.mapred.JobConf job) static boolean isTruncated(Content content) Checks if the page's content is truncated. static void main(String[] args) void map(org.apache.hadoop.io.WritableComparable key, Content content, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) void parse(org.apache.hadoop.fs.Path segment) void reduce(org.apache.hadoop.io.Text key, Iterator values, org.apache.hadoop.mapred.OutputCollector output, org.apache.hadoop.mapred.Reporter reporter) int run(String[] args)

-    

Methods inherited from class org.apache.hadoop.conf.Configured

getConf, setConf

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

-    

Methods inherited from interface org.apache.hadoop.conf.Configurable

getConf, setConf

Field Detail

-  

LOG

public static final org.slf4j.Logger LOG
-  

SKIP_TRUNCATED

public static final String SKIP_TRUNCATED
  - See Also:
  - [Constant Field Values](../../../../constant-values.html#org.apache.nutch.parse.ParseSegment.SKIP_TRUNCATED)       

Constructor Detail

-  

ParseSegment

public ParseSegment()
-  

ParseSegment

public ParseSegment(org.apache.hadoop.conf.Configuration conf)

Method Detail

-  

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
  - Specified by: 
  - <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>        
-  

close

public void close()
  - Specified by: 
  - <code>close</code> in interface <code>Closeable</code> 
  - Specified by: 
  - <code>close</code> in interface <code>AutoCloseable</code>        
-  

map

public void map(org.apache.hadoop.io.WritableComparable<?> key,
       Content content,
       org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,ParseImpl> output,
       org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
  - Specified by: 
  - <code>map</code> in interface <code>org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.writablecomparable<?>,Content,org.apache.hadoop.io.Text,ParseImpl&gt;</org.apache.hadoop.io.writablecomparable<?></code> 
  - Throws: 
  - <code>IOException</code>       
-  

isTruncated

public static boolean isTruncated(Content content)

Checks if the page's content is truncated.

  - Parameters:
  - <code>content</code> -  
  - Returns:
  - If the page is truncated <code>true</code>. When it is not, or when it could be determined, <code>false</code>.       
-  

reduce

public void reduce(org.apache.hadoop.io.Text key,
          Iterator<org.apache.hadoop.io.Writable> values,
          org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.Writable> output,
          org.apache.hadoop.mapred.Reporter reporter)
            throws IOException
  - Specified by: 
  - <code>reduce</code> in interface <code>org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,org.apache.hadoop.io.writable></org.apache.hadoop.io.text,org.apache.hadoop.io.writable,org.apache.hadoop.io.text,org.apache.hadoop.io.writable></code> 
  - Throws: 
  - <code>IOException</code>       
-  

parse

public void parse(org.apache.hadoop.fs.Path segment)
           throws IOException
  - Throws: 
  - <code>IOException</code>       
-  

main

public static void main(String[] args)
                 throws Exception
  - Throws: 
  - <code>Exception</code>       
-  

run

public int run(String[] args)
        throws Exception
  - Specified by: 
  - <code>run</code> in interface <code>org.apache.hadoop.util.Tool</code> 
  - Throws: 
  - <code>Exception</code>      

Copyright © 2014 The Apache Software Foundation