- Prev Class
- Next Class
org.apache.nutch.segment
Class ContentAsTextInputFormat
- java.lang.Object
- org.apache.hadoop.mapred.FileInputFormat
- org.apache.hadoop.mapred.SequenceFileInputFormat
- org.apache.nutch.segment.ContentAsTextInputFormat
- org.apache.hadoop.mapred.FileInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat
public class ContentAsTextInputFormat extends org.apache.hadoop.mapred.SequenceFileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
org.apache.hadoop.mapred.FileInputFormat.Counter
Field Summary
-
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
Constructor Summary
Constructors Constructor and Description ContentAsTextInputFormat()
Method Summary
Methods Modifier and Type Method and Description org.apache.hadoop.mapred.RecordReader
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
-
Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat
listStatus
-
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Constructor Detail
-
ContentAsTextInputFormat
public ContentAsTextInputFormat()
Method Detail
-
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws IOException
- Specified by:
- <code>getRecordReader</code> in interface <code>org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,org.apache.hadoop.io.text></code>
- Overrides:
- <code>getRecordReader</code> in class <code>org.apache.hadoop.mapred.SequenceFileInputFormat<org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,org.apache.hadoop.io.text></code>
- Throws:
- <code>IOException</code>
- Prev Class
- Next Class