steup a single Hadoop 2.4 on Mac OS X 10.9.3

steup a single Hadoop 2.4 on Mac OS X 10.9.3

install

  • brew install hadoop

Setup passphraseless ssh

  • try ssh localhost
  • $ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
  • $ cat ~/.ssh/iddsa.pub >> ~/.ssh/authorizedkeys

Environment

  • check /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/hadoop-env.sh export JAVA_HOME="$(/usr/libexec/java_home)"
  • cd /usr/local/Cellar/hadoop/2.4.0
  • try bin/hadoop
    $ bin/hadoop version
    Hadoop 2.4.0
    Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1583262
    Compiled by jenkins on 2014-03-31T08:29Z
    Compiled with protoc 2.5.0
    From source with checksum 375b2832a6641759c6eaf6e3e998147
    This command was run using /usr/local/Cellar/hadoop/2.4.0/libexec/share/hadoop/common/hadoop-common-2.4.0.jar</li>
    

try Standalone mode

  • cd /usr/local/Cellar/hadoop/2.4.0
  • mkdir input
  • cp libexec/etc/hadoop/*.xml input
  • bin/hadoop jar libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z]+'
  • cat output/*

try Pseudo-Distributed mode

  • vi libexec/etc/hadoop/core-site.xml

    <configuration>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://localhost:9000</value>
      </property>
    </configuration>
    
  • vi libexec/etc/hadoop/hdfs-site.xml

    <configuration>
      <property>
          <name>dfs.replication</name>
          <value>1</value>
      </property>
    </configuration>
    

run MapReduce job locally

hdfs file system

  • rm -fr /tmp/hadoop-username; rm -fr /private/tmp/hadoop-username
  • Format the filesystem:
    $ bin/hdfs namenode -format
    “INFO common.Storage: Storage directory /tmp/hadoop-username/dfs/name has been successfully formatted.”“

start daemon

  • Start NameNode daemon and DataNode daemon:
    $ sbin/start-dfs.sh
    Check java processes with org.apache.hadoop.hdfs.server.namenode.NameNode & org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.
    Check log with ls -lstr libexec/logs/
    Check http://localhost:9000/
  • Browse the web interface for the NameNode; by default it is available at:
    NameNode - http://localhost:50070/

hdfs command

  • Make the HDFS directories required to execute MapReduce jobs:
    $ bin/hdfs dfs -mkdir /user
    $ bin/hdfs dfs -mkdir /user/username
    $ bin/hdfs dfs -mkdir /user/username/input
    $ bin/hdfs dfs -ls /user/
    $ jps
    29398 Jps
    25959 DataNode
    25839 NameNode
    26109 SecondaryNameNode

run mapreduce

  • Copy the input files into the distributed filesystem:
    $ bin/hdfs dfs -put etc/hadoop input
  • Run some of the examples provided:
    $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output ‘dfs[a-z.]+’
  • Examine the output files:
    Copy the output files from the distributed filesystem to the local filesystem and examine them:
    $ bin/hdfs dfs -get output output
    $ cat output/*
    $ bin/hdfs dfs -cat output/*

stop hdfs

  • stop hdfs
    $ sbin/stop-dfs.sh

run MapReduce job on YARN

start hdfs

  • sbin/start-dfs.sh
  • bin/hdfs dfs -rm -r output
  • bin/hdfs dfs -rm -r input

config yarn

  • etc/hadoop/mapred-site.xml:

    <configuration>
      <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
      </property>
    </configuration>
    
  • etc/hadoop/yarn-site.xml:

    <configuration>
      <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
      </property>
    </configuration>
    

Start ResourceManager daemon and NodeManager daemon:

  • sbin/start-yarn.sh
  • jps
    99082 SecondaryNameNode
    98803 NameNode
    99215 Jps
    97753 NodeManager
    97649 ResourceManager
    98929 DataNode
  • Browse the web interface for the ResourceManager; by default it is available at:
    ResourceManager - http://localhost:8088/

run a mapreduce

  • bin/hdfs dfs -put libexec/etc/hadoop input
  • bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output ‘dfs[a-z.]+’
  • bin/hdfs dfs -cat /user/yinlei/output/part-r-00000

    4 dfs.class
    4 dfs.audit.logger
    3 dfs.server.namenode.
    2 dfs.audit.log.maxbackupindex
    2 dfs.period
    2 dfs.audit.log.maxfilesize
    1 dfsmetrics.log
    1 dfsadmin
    1 dfs.servers
    1 dfs.replication
    1 dfs.file
    1 dfs.data.dir
    1 dfs.name.dir</li>
    

SingleCluster.html YARN



支 持 本 站: 捐赠服务器等运维费用,感谢您的支持!
支 持 本 站: 捐赠服务器等运维费用,感谢您的支持!

发布时间: