steup a single Hadoop 2.4 on Mac OS X 10.9.3

# steup a single Hadoop 2.4 on Mac OS X 10.9.3
## install
* `brew install hadoop`

## Setup passphraseless ssh
* try `ssh localhost`
* $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
* $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

## Environment
* check /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh
`export JAVA_HOME="$(/usr/libexec/java_home)"`
* cd /usr/local/Cellar/hadoop/2.4.0
* try bin/hadoop
> $ bin/hadoop version
> Hadoop 2.4.0
> Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1583262
> Compiled by jenkins on 2014-03-31T08:29Z
> Compiled with protoc 2.5.0
> From source with checksum 375b2832a6641759c6eaf6e3e998147
> This command was run using /usr/local/Cellar/hadoop/2.4.0/libexec/share/hadoop/common/hadoop-common-2.4.0.jar

## try Standalone mode
* `cd /usr/local/Cellar/hadoop/2.4.0`
* `mkdir input`
* `cp libexec/etc/hadoop/*.xml input`
* `bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z]+'`
* `cat output/*`

## try Pseudo-Distributed mode
* vi libexec/etc/hadoop/core-site.xml
>
>
> fs.defaultFS
> hdfs://localhost:9000
>

>

* vi libexec/etc/hadoop/hdfs-site.xml
>
>
> dfs.replication
> 1
>

>

### run MapReduce job locally
#### hdfs file system
* rm -fr /tmp/hadoop-username; rm -fr /private/tmp/hadoop-username
* Format the filesystem:
`$ bin/hdfs namenode -format`
"INFO common.Storage: Storage directory /tmp/hadoop-username/dfs/name has been successfully formatted.""

#### start daemon
* Start NameNode daemon and DataNode daemon:
`$ sbin/start-dfs.sh`
Check java processes with org.apache.hadoop.hdfs.server.namenode.NameNode & org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.
Check log with `ls -lstr libexec/logs/`
Check http://localhost:9000/
* Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/

#### hdfs command
* Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/username
$ bin/hdfs dfs -mkdir /user/username/input
$ bin/hdfs dfs -ls /user/
$ jps
29398 Jps
25959 DataNode
25839 NameNode
26109 SecondaryNameNode

#### run mapreduce
* Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
* Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
* Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*
$ bin/hdfs dfs -cat output/*

#### stop hdfs
* stop hdfs
$ sbin/stop-dfs.sh

### run MapReduce job on YARN
#### start hdfs
* sbin/start-dfs.sh
* bin/hdfs dfs -rm -r output
* bin/hdfs dfs -rm -r input

#### config yarn
* etc/hadoop/mapred-site.xml:
>
>
> mapreduce.framework.name
> yarn
>

>

* etc/hadoop/yarn-site.xml:
>
>
> yarn.nodemanager.aux-services
> mapreduce_shuffle
>

>

#### Start ResourceManager daemon and NodeManager daemon:
* sbin/start-yarn.sh
* jps
99082 SecondaryNameNode
98803 NameNode
99215 Jps
97753 NodeManager
97649 ResourceManager
98929 DataNode
* Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/

#### run a mapreduce
* bin/hdfs dfs -put libexec/etc/hadoop input
* bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
* bin/hdfs dfs -cat /user/yinlei/output/part-r-00000
> 4 dfs.class
> 4 dfs.audit.logger
> 3 dfs.server.namenode.
> 2 dfs.audit.log.maxbackupindex
> 2 dfs.period
> 2 dfs.audit.log.maxfilesize
> 1 dfsmetrics.log
> 1 dfsadmin
> 1 dfs.servers
> 1 dfs.replication
> 1 dfs.file
> 1 dfs.data.dir
> 1 dfs.name.dir


[SingleCluster.html](http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html)
[YARN](http://www.ibm.com/developerworks/jp/bigdata/library/bd-yarn-intro/?cmp=dw&cpb=dwope&ct=dwrss&cr=dwrss&ccy=jp&csr=062714)