org.apache.nutch.segment
Class SegmentReader
- java.lang.Object
- org.apache.hadoop.conf.Configured
- org.apache.nutch.segment.SegmentReader
- All Implemented Interfaces:
- Closeable, AutoCloseable, org.apache.hadoop.conf.Configurable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer
public class SegmentReader extends org.apache.hadoop.conf.Configured implements org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.Text,NutchWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Dump the content of a segment.
Nested Class Summary
Nested Classes Modifier and Type Class and Description static class
SegmentReader.InputCompatMapper
static class
SegmentReader.SegmentReaderStats
static class
SegmentReader.TextOutputFormat
Implements a text output format
Field Summary
Fields Modifier and Type Field and Description static org.slf4j.Logger
LOG
Constructor Summary
Constructors Constructor and Description SegmentReader()
SegmentReader(org.apache.hadoop.conf.Configuration conf,
boolean co,
boolean fe,
boolean ge,
boolean pa,
boolean pd,
boolean pt)
Method Summary
Methods Modifier and Type Method and Description void
close()
void
configure(org.apache.hadoop.mapred.JobConf job)
void
dump(org.apache.hadoop.fs.Path segment,
org.apache.hadoop.fs.Path output)
void
get(org.apache.hadoop.fs.Path segment,
org.apache.hadoop.io.Text key,
Writer writer,
Map
void
getStats(org.apache.hadoop.fs.Path segment,
SegmentReader.SegmentReaderStats stats)
void
list(List
static void
main(String[] args)
void
reduce(org.apache.hadoop.io.Text key,
Iterator
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
-
LOG
public static final org.slf4j.Logger LOG
Constructor Detail
-
SegmentReader
public SegmentReader()
-
SegmentReader
public SegmentReader(org.apache.hadoop.conf.Configuration conf, boolean co, boolean fe, boolean ge, boolean pa, boolean pd, boolean pt)
Method Detail
-
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
- <code>configure</code> in interface <code>org.apache.hadoop.mapred.JobConfigurable</code>
-
close
public void close()
- Specified by:
- <code>close</code> in interface <code>Closeable</code>
- Specified by:
- <code>close</code> in interface <code>AutoCloseable</code>
-
reduce
public void reduce(org.apache.hadoop.io.Text key, Iterator<NutchWritable> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> output, org.apache.hadoop.mapred.Reporter reporter) throws IOException
- Specified by:
- <code>reduce</code> in interface <code>org.apache.hadoop.mapred.Reducer<org.apache.hadoop.io.text,nutchwritable,org.apache.hadoop.io.text,org.apache.hadoop.io.text></org.apache.hadoop.io.text,nutchwritable,org.apache.hadoop.io.text,org.apache.hadoop.io.text></code>
- Throws:
- <code>IOException</code>
-
dump
public void dump(org.apache.hadoop.fs.Path segment, org.apache.hadoop.fs.Path output) throws IOException
- Throws:
- <code>IOException</code>
-
get
public void get(org.apache.hadoop.fs.Path segment, org.apache.hadoop.io.Text key, Writer writer, Map<String,List<org.apache.hadoop.io.Writable>> results) throws Exception
- Throws:
- <code>Exception</code>
-
list
public void list(List<org.apache.hadoop.fs.Path> dirs, Writer writer) throws Exception
- Throws:
- <code>Exception</code>
-
getStats
public void getStats(org.apache.hadoop.fs.Path segment, SegmentReader.SegmentReaderStats stats) throws Exception
- Throws:
- <code>Exception</code>
-
main
public static void main(String[] args) throws Exception
- Throws:
- <code>Exception</code>