MapReduce – Running MapReduce in Windows file system – Debug MapReduce in Eclipse


Hadoop_logoThe distributed nature of Hadoop MapReduce framework make the debugging little harder. Often we want to test our MR jobs in a small amount of data before deploThere are some good tutorials to configure Hadoop development with Eclipse. The major concern with the HDFS file system nature, it is hard to map the debugger in the windows environment. This is a little hack, that will make Hadoop to understand or take input from the windows file system and run the map reduce job locally. This will faster and flexible way of developing.

Lets extend the LocalFileSystem and override with our windows file system


package org.ananth.learning.fs;

import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.permission.FsPermission;
import java.io.IOException;

import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.permission.FsPermission;

public class WindowsLocalFileSystem extends LocalFileSystem{


 /**
 *
 *
 */
 public WindowsLocalFileSystem() {
 super();

}


 public boolean mkdirs (
 final Path path,
 final FsPermission permission)
 throws IOException {
 final boolean result = super.mkdirs(path);
 this.setPermission(path, permission);
 return result;
 }


 public void setPermission (
 final Path path,
 final FsPermission permission)
 throws IOException {
 try {
 super.setPermission(path, permission);
 }
 catch (final IOException e) {
 System.err.println("Cant help it, hence ignoring IOException setting persmission for path \"" + path +
 "\": " + e.getMessage());
 }
 }


}

Then all you need to do on your driver class is,


package org.ananth.learning.mapper;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MutualfriendsDriver extends Configured implements Tool{

/**
 * @param args
 * @throws Exception
 */
 public static void main(String[] args) throws Exception {

 ToolRunner.run(new MutualfriendsDriver(), null);
 }

 @Override
 public int run(String[] arg0) throws Exception {
 Configuration conf = getConf();
 conf.set("fs.default.name", "file:///");
 conf.set("mapred.job.tracker", "local");
 conf.set("fs.file.impl", "org.ananth.learning.fs.WindowsLocalFileSystem");
 conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization,"
 + "org.apache.hadoop.io.serializer.WritableSerialization");

 Job job = new Job(conf,"Your Job name");

// Set your Mapper and Reducer for the JOB

// Set your input and output class

 FileInputFormat.addInputPath(job, new Path("input"));
 FileOutputFormat.setOutputPath(job, new Path("output"));
 job.waitForCompletion(Boolean.TRUE);
 return 0;
 }

}

The Path, input and output should be located on the project root directory. Now you all set, you can run the MR job in you windows local machine.

Advertisements

4 thoughts on “MapReduce – Running MapReduce in Windows file system – Debug MapReduce in Eclipse

  1. Hi – I tried to run this setup but not sure why I am getting issues. Putting my error log, Thanks in advance.
    16/04/09 15:33:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    Cant help it, hence ignoring IOException setting persmission for path “file:/tmp/hadoop-Siddharth/mapred/staging/Siddharth168173809/.staging”: Failed to set permissions of path: \tmp\hadoop-Siddharth\mapred\staging\Siddharth168173809\.staging to 0700
    16/04/09 15:33:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    Cant help it, hence ignoring IOException setting persmission for path “file:/tmp/hadoop-Siddharth/mapred/staging/Siddharth168173809/.staging/job_local168173809_0001”: Failed to set permissions of path: \tmp\hadoop-Siddharth\mapred\staging\Siddharth168173809\.staging\job_local168173809_0001 to 0700
    16/04/09 15:33:08 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    16/04/09 15:33:08 INFO input.FileInputFormat: Total input paths to process : 1
    16/04/09 15:33:08 WARN snappy.LoadSnappy: Snappy native library not loaded
    Cant help it, hence ignoring IOException setting persmission for path “file:/tmp/hadoop-Siddharth/mapred/staging/Siddharth168173809/.staging/job_local168173809_0001/job.split”: Failed to set permissions of path: \tmp\hadoop-Siddharth\mapred\staging\Siddharth168173809\.staging\job_local168173809_0001\job.split to 0644
    Cant help it, hence ignoring IOException setting persmission for path “file:/tmp/hadoop-Siddharth/mapred/staging/Siddharth168173809/.staging/job_local168173809_0001/job.splitmetainfo”: Failed to set permissions of path: \tmp\hadoop-Siddharth\mapred\staging\Siddharth168173809\.staging\job_local168173809_0001\job.splitmetainfo to 0644
    Cant help it, hence ignoring IOException setting persmission for path “file:/tmp/hadoop-Siddharth/mapred/staging/Siddharth168173809/.staging/job_local168173809_0001/job.xml”: Failed to set permissions of path: \tmp\hadoop-Siddharth\mapred\staging\Siddharth168173809\.staging\job_local168173809_0001\job.xml to 0644
    16/04/09 15:33:08 INFO mapred.JobClient: Running job: job_local168173809_0001
    16/04/09 15:33:08 INFO mapred.LocalJobRunner: Waiting for map tasks
    16/04/09 15:33:08 INFO mapred.LocalJobRunner: Starting task: attempt_local168173809_0001_m_000000_0
    16/04/09 15:33:09 INFO mapred.Task: Using ResourceCalculatorPlugin : null
    16/04/09 15:33:09 INFO mapred.MapTask: Processing split: file:/C:/Users/Siddharth/workspace/WordCount/input/test.txt:0+30
    16/04/09 15:33:09 INFO mapred.MapTask: io.sort.mb = 100
    16/04/09 15:33:09 INFO mapred.MapTask: data buffer = 79691776/99614720
    16/04/09 15:33:09 INFO mapred.MapTask: record buffer = 262144/327680
    16/04/09 15:33:09 INFO mapred.LocalJobRunner: Map task executor complete.
    16/04/09 15:33:09 WARN mapred.LocalJobRunner: job_local168173809_0001
    java.lang.Exception: java.lang.ClassCastException: interface javax.xml.soap.Text
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
    Caused by: java.lang.ClassCastException: interface javax.xml.soap.Text
    at java.lang.Class.asSubclass(Class.java:3404)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:964)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:673)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    16/04/09 15:33:09 INFO mapred.JobClient: map 0% reduce 0%
    16/04/09 15:33:09 INFO mapred.JobClient: Job complete: job_local168173809_0001
    16/04/09 15:33:09 INFO mapred.JobClient: Counters: 0

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s