org.apache.hadoop.mapred.lib
Class InputSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.InputSampler<K,V>
All Implemented Interfaces:
Configurable, Tool

public class InputSampler<K,V>
extends Object
implements Tool

Utility for collecting samples and writing a partition file for TotalOrderPartitioner.


Nested Class Summary
static class InputSampler.IntervalSampler<K,V>
          Sample from s splits at regular intervals.
static class InputSampler.RandomSampler<K,V>
          Sample from random points in the input.
static interface InputSampler.Sampler<K,V>
          Interface to sample using an InputFormat.
static class InputSampler.SplitSampler<K,V>
          Samples the first n records from s splits.
 
Constructor Summary
InputSampler(JobConf conf)
           
 
Method Summary
 Configuration getConf()
          Return the configuration used by this object.
static void main(String[] args)
           
 int run(String[] args)
          Driver for InputSampler from the command line.
 void setConf(Configuration conf)
          Set the configuration to be used by this object.
static
<K,V> void
writePartitionFile(JobConf job, InputSampler.Sampler<K,V> sampler)
          Write a partition file for the given job, using the Sampler provided.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InputSampler

public InputSampler(JobConf conf)
Method Detail

getConf

public Configuration getConf()
Description copied from interface: Configurable
Return the configuration used by this object.

Specified by:
getConf in interface Configurable

setConf

public void setConf(Configuration conf)
Description copied from interface: Configurable
Set the configuration to be used by this object.

Specified by:
setConf in interface Configurable

writePartitionFile

public static <K,V> void writePartitionFile(JobConf job,
                                            InputSampler.Sampler<K,V> sampler)
                               throws IOException
Write a partition file for the given job, using the Sampler provided. Queries the sampler for a sample keyset, sorts by the output key comparator, selects the keys for each rank, and writes to the destination returned from TotalOrderPartitioner.getPartitionFile(org.apache.hadoop.mapred.JobConf).

Throws:
IOException

run

public int run(String[] args)
        throws Exception
Driver for InputSampler from the command line. Configures a JobConf instance and calls writePartitionFile(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.lib.InputSampler.Sampler).

Specified by:
run in interface Tool
Parameters:
args - command specific arguments.
Returns:
exit code.
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2009 The Apache Software Foundation