[TOC]

org.apache.nutch.segment

Class ContentAsTextInputFormat

  • java.lang.Object
    • org.apache.hadoop.mapred.FileInputFormat
    • org.apache.hadoop.mapred.SequenceFileInputFormat
      • org.apache.nutch.segment.ContentAsTextInputFormat

    • All Implemented Interfaces:
    • org.apache.hadoop.mapred.InputFormat

public class ContentAsTextInputFormat
extends org.apache.hadoop.mapred.SequenceFileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>

An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.

Nested Class Summary

-    

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat

org.apache.hadoop.mapred.FileInputFormat.Counter

Field Summary

-    

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat

LOG

Constructor Summary

Constructors Constructor and Description ContentAsTextInputFormat()

Method Summary

Methods Modifier and Type Method and Description org.apache.hadoop.mapred.RecordReader getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)

-    

Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat

listStatus

-    

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat

addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

-    

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

-  

ContentAsTextInputFormat

public ContentAsTextInputFormat()

Method Detail

-  

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
                                                                                                         org.apache.hadoop.mapred.JobConf job,
                                                                                                         org.apache.hadoop.mapred.Reporter reporter)
                                                                                                           throws IOException
  - Specified by: 
  - <code>getRecordReader</code> in interface <code>org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,org.apache.hadoop.io.text></code> 
  - Overrides: 
  - <code>getRecordReader</code> in class <code>org.apache.hadoop.mapred.SequenceFileInputFormat<org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,org.apache.hadoop.io.text></code> 
  - Throws: 
  - <code>IOException</code>      

Copyright © 2014 The Apache Software Foundation