hadoop的安装
下载Hadoop发行版本,将Hadoop的发行版安装文件上传到Linux的任意目录,例如/usr/local下,解压Hadoop发行版安装文件。
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh,设置JAVA_HOME配置项为Java安装的根路径。
image.png
进入hadoop安装文件夹下尝试如下指令:bin/hadoop查看hadoop脚本使用文档。文档如下:
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
其中classname是用户提供的Java类
OPTIONS(选项)可以是空或者一下任意指令
buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
daemonlog get/set the log level for each daemon
Client Commands:
archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
jnipath prints the java.library.path
kdiag Diagnose Kerberos Problems
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
s3guard manage metadata on S3
trace view and modify Hadoop tracing settings
version print the version
Daemon Commands:
kms run KMS, the Key Management Server
SUBCOMMAND may print help when invoked w/o parameters or with -h.
Hadoop的启动方式
单机模式
默认情况下,Hadoop被配置成以非分布式模式运行的一个独立Java进程。这对调试非常有帮助。
下面的实例将已解压的 conf 目录拷贝作为输入,查找并显示匹配给定正则表达式的条目。输出写入到指定的output目录。
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
$ cat output/*
伪分布式模式
Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。
$ vim /usr/local/hadoop/etc/hadoop/core-site.xml
添加如下配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
$ vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
添加如下配置
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
设置 SSH免密码登录
先通过如下指令查看ssh本地登录是否需要密码
$ ssh localhost
如果不需要密码无法登录,请执行如下指令
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
本地运行MapReduce作业。
- 格式化一个新的分布式文件系统
$ bin/hdfs namenode -format
- 启动NameNode守护进程和DataNode守护进程
$ sbin/start-dfs.sh
执行该指令的时候出现如下错误
image.png
解决方法:在start-dfs.sh和stop-dfs.sh文件顶部的空白部分加上如下配置
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
- 在浏览器中问http://ip:9870,如果无法访问需要在防火墙中开放9870端口访问权限
image.png
- 创建执行MapReduce作业所需的HDFS目录
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
如果你在创建文件夹的时候出现如下错误:
mkdir: `hdfs://localhost:9000/home/hadoop': No such file or directory
你需要更改上述指令为:$ bin/hdfs dfs -mkdir -p /user
原因如下:
It is because the parent directories do not exist yet either. Try hdfs dfs -mkdir -p /user/Hadoop/twitter_data. The -p flag indicates that all nonexistent directories leading up to the given directory are to be created as well.
As for the question you posed in the comments, simply type into your browser http://<host name of the namenode>:<port number>/.










网友评论