1. 官方网站
2. 下载
https://hadoop.apache.org/releases.html
3. 环境
- 系统:CentOS 7.6
- JAVA:JRE 1.8
4. 配置系统环境
- 修改
/etc/hosts
文件,增加下面的行(如果系统已经注册在DNS服务器,有A记录,这步可以省略)
192.168.56.201 node1
192.168.56.202 node2
192.168.56.203 node3
- 配置所有机器免密码登陆,使用
ssh-keygen
生成公钥和密钥
[root@node1 etc]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:vSs9kZ7yxb16okc8e40afXJkICUQRI4Ig0xP3sh2F7c root@node1
The key's randomart image is:
+---[RSA 2048]----+
| o..+ .o*o. . |
| o= = . = . o |
| * + o E . . |
| . . . . . . |
| S .o o|
| oo+o o |
| o.++o+oo|
| o =o+.+=.|
| +++o*. |
+----[SHA256]-----+
- 使用
ssh-copy-id
把公钥拷贝到对应机器上
[root@node1 etc]# ssh-copy-id node2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'node2 (192.168.56.202)' can't be established.
ECDSA key fingerprint is SHA256:S+fUIkc5tzfLcZ8OKjyo5Gj89fYbiM9Q/r+K6k9LZkQ.
ECDSA key fingerprint is MD5:52:96:7a:52:d1:e7:3b:99:d9:b7:a0:2a:87:71:78:f3.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'node2'"
and check to make sure that only the key(s) you wanted were added.
- 安装java,配置好yum源,使用yum安装就行
$ yum -y install java java-devel
- 配置JAVA_HOME,
/etc/profile.d/java.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
5. 安装和配置hadoop
- 下载hadoop
$ cd /opt
$ wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
- 解压hadoop
$ tar xf hadoop
- 配置Hadoop环境变量,
/etc/profile.d/hadoop.sh
export HADOOP_HOME=/opt/hadoop-3.2.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
- 重读环境变量
$ source /etc/profile
- 给hadoop建立一个文件夹,如果在生产系统最好建立一个文件系统
mkdir -p /data/hadoop
- 修改
$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</configuration>
- 修改
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
,配置副本个数以及数据存放的路径
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:50090</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
- 修改
$HADOOP_HOME/etc/hadoop/mapred-site.xml
,配置使用yarn框架执行mapreduce处理程序,与之前版本多了后面两部
不配置mapreduce.application.classpath这个参数mapreduce运行时会报错:
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop-3.2.0/etc/hadoop,
/opt/hadoop-3.2.0/share/hadoop/common/*,
/opt/hadoop-3.2.0/share/hadoop/common/lib/*,
/opt/hadoop-3.2.0/share/hadoop/hdfs/*,
/opt/hadoop-3.2.0/share/hadoop/hdfs/lib/*,
/opt/hadoop-3.2.0/share/hadoop/mapreduce/*,
/opt/hadoop-3.2.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop-3.2.0/share/hadoop/yarn/*,
/opt/hadoop-3.2.0/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
- 修改
$HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop-3.2.0/etc/hadoop,
/opt/hadoop-3.2.0/share/hadoop/common/*,
/opt/hadoop-3.2.0/share/hadoop/common/lib/*,
/opt/hadoop-3.2.0/share/hadoop/hdfs/*,
/opt/hadoop-3.2.0/share/hadoop/hdfs/lib/*,
/opt/hadoop-3.2.0/share/hadoop/mapreduce/*,
/opt/hadoop-3.2.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop-3.2.0/share/hadoop/yarn/*,
/opt/hadoop-3.2.0/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
- worker文件里面添加
node1
node2
node3
- 把环境变量文件
/etc/profile.d/hadoop.sh
,/opt/hadoop-3.2.0
拷贝到node2和node3
$ scp /etc/profile.d/hadoop.sh node2:/etc/profile.d/hadoop.sh
$ scp /etc/profile.d/hadoop.sh node3:/etc/profile.d/hadoop.sh
$ scp -r /opt/hadoop-3.2.0 node2:/opt/
$ scp -r /opt/hadoop-3.2.0 node3:/opt/
- 初始化data
$ hdfs namenode -format
- 启动和停止命令
$ $HADOOP_HOME/bin/start-all.sh
$ $HADOOP_HOME/bin/stop-all.sh
- 使用浏览器访问
http://192.168.56.201:50070
image.png
网友评论