Class ContentAsTextInputFormat

[TOC]

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

org.apache.nutch.segment

java.lang.Object
- org.apache.hadoop.mapred.FileInputFormat
- org.apache.hadoop.mapred.SequenceFileInputFormat
- - org.apache.nutch.segment.ContentAsTextInputFormat

- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat

public class ContentAsTextInputFormat
extends org.apache.hadoop.mapred.SequenceFileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>

An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat

org.apache.hadoop.mapred.FileInputFormat.Counter

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat

LOG

Constructor Summary

Constructors Constructor and Description ContentAsTextInputFormat()

Method Summary

Methods Modifier and Type Method and Description org.apache.hadoop.mapred.RecordReader getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)

Methods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat

listStatus

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat

addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

ContentAsTextInputFormat

public ContentAsTextInputFormat()

Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
                                                                                                         org.apache.hadoop.mapred.JobConf job,
                                                                                                         org.apache.hadoop.mapred.Reporter reporter)
                                                                                                           throws IOException

  - Specified by: 
  - <code>getRecordReader</code> in interface <code>org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,org.apache.hadoop.io.text></code> 
  - Overrides: 
  - <code>getRecordReader</code> in class <code>org.apache.hadoop.mapred.SequenceFileInputFormat<org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,org.apache.hadoop.io.text></code> 
  - Throws: 
  - <code>IOException</code>

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method