美文网首页
Hadoop入门开始

Hadoop入门开始

作者: 每天进步一丢儿丢儿 | 来源:发表于2019-10-18 17:34 被阅读0次

hadoop的安装

下载Hadoop发行版本,将Hadoop的发行版安装文件上传到Linux的任意目录,例如/usr/local下,解压Hadoop发行版安装文件。

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh,设置JAVA_HOME配置项为Java安装的根路径。

image.png

进入hadoop安装文件夹下尝试如下指令:bin/hadoop查看hadoop脚本使用文档。文档如下:

Usage: hadoop [OPTIONS]  SUBCOMMAND [SUBCOMMAND OPTIONS]

or  hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]

其中classname是用户提供的Java类

OPTIONS(选项)可以是空或者一下任意指令

buildpaths                       attempt to add class files from build tree
--config dir                     Hadoop config directory
--debug                          turn on shell script debug mode
--help                           usage information
hostnames list[,of,host,names]   hosts to use in slave mode
hosts filename                   list of hosts to use in slave mode
loglevel level                   set the log4j level for this command
workers                          turn on worker mode

SUBCOMMAND is one of:


    Admin Commands:

daemonlog     get/set the log level for each daemon

    Client Commands:

archive       create a Hadoop archive
checknative   check native Hadoop and compression libraries availability
classpath     prints the class path needed to get the Hadoop jar and the required libraries
conftest      validate configuration XML files
credential    interact with credential providers
distch        distributed metadata changer
distcp        copy file or directories recursively
dtutil        operations related to delegation tokens
envvars       display computed Hadoop environment variables
fs            run a generic filesystem user client
gridmix       submit a mix of synthetic job, modeling a profiled from production load
jar <jar>     run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath       prints the java.library.path
kdiag         Diagnose Kerberos Problems
kerbname      show auth_to_local principal conversion
key           manage keys via the KeyProvider
rumenfolder   scale a rumen input trace
rumentrace    convert logs into a rumen trace
s3guard       manage metadata on S3
trace         view and modify Hadoop tracing settings
version       print the version

    Daemon Commands:

kms           run KMS, the Key Management Server

SUBCOMMAND may print help when invoked w/o parameters or with -h.

Hadoop的启动方式

单机模式

默认情况下,Hadoop被配置成以非分布式模式运行的一个独立Java进程。这对调试非常有帮助。
下面的实例将已解压的 conf 目录拷贝作为输入,查找并显示匹配给定正则表达式的条目。输出写入到指定的output目录。

$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
$ cat output/*

伪分布式模式

Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。

$ vim /usr/local/hadoop/etc/hadoop/core-site.xml

添加如下配置

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
$ vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

添加如下配置

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

设置 SSH免密码登录
先通过如下指令查看ssh本地登录是否需要密码

$ ssh localhost

如果不需要密码无法登录,请执行如下指令

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

本地运行MapReduce作业。

  1. 格式化一个新的分布式文件系统
$ bin/hdfs namenode -format
  1. 启动NameNode守护进程和DataNode守护进程
$ sbin/start-dfs.sh

执行该指令的时候出现如下错误


image.png

解决方法:在start-dfs.sh和stop-dfs.sh文件顶部的空白部分加上如下配置

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
  1. 在浏览器中问http://ip:9870,如果无法访问需要在防火墙中开放9870端口访问权限
image.png
  1. 创建执行MapReduce作业所需的HDFS目录
  $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>

如果你在创建文件夹的时候出现如下错误:
mkdir: `hdfs://localhost:9000/home/hadoop': No such file or directory
你需要更改上述指令为:$ bin/hdfs dfs -mkdir -p /user
原因如下:
It is because the parent directories do not exist yet either. Try hdfs dfs -mkdir -p /user/Hadoop/twitter_data. The -p flag indicates that all nonexistent directories leading up to the given directory are to be created as well.

As for the question you posed in the comments, simply type into your browser http://<host name of the namenode>:<port number>/.

相关文章

网友评论

      本文标题:Hadoop入门开始

      本文链接:https://www.haomeiwen.com/subject/rueymctx.html