==简介:== 本为为hadoop/spark开发环境的伪分布式搭建,基于centos6.4系统,由三个虚拟机构成
搭建版本详情
- 系统 centos 6.4
- hadoop版本 hadoop-2.6.0-cdh5.7.0
- spark 版本 spark-2.2.0-bin-2.6.0-cdh5.7.0
- 管理员:root 用户:hadoop(拥有所有权限)
网络IP的配置
# vim /etc/sysconfig/network 中配置
NETWORKING=yes
#hadoop000为Master节点
HOSTNAME=hadoop000
HOSTNAME=slave1
HOSTNAME=slave2
#vim /etc/hosts 中 IP地址和主机名的配置
10.23.14.217 hadoop000
10.23.15.161 slave1
10.23.12.24 slave2
配置Master无密码SSh到slave
image
image
hadoop完全分布式的安装
- 下载hadoop2.6.5安装包,我习惯放在/home/hadoop/software中
- 解压压缩包 tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C /home/hadoop/app (习惯)
- 配置环境变量
#hadoop的环境变量
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
- 创建hadoop的关键目录
#在配置文件中使用
mkdir /home/hadoop
cd /home/hadoop
mkdir hadoop2.6.0
cd /home/hadoop/hadoop2.6.0
mkdir tmp dfs
cd /home/hadoop/hadoop2.6.0/dfs
mkdir data name
- hadoop配置文件的配置
#${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
#${HADOOP_HOME}/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop000:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop2.6.0/tmp</value>
</property>
</configuration>
#${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop2.6.0/dfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop2.6.0/dfs/data</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
#${HADOOP_HOME}/etc/hadoop/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hadoop000:9001</value>
</property>
- 配置slave
##${HADOOP_HOME}/etc/hadoop/slave
slave1
slave2
- 同配置好的hadoop文件发给其它的slave
sudo scp -r /home/hadoop/app/hadoop-2.6.0-cdh5.7.0 hadoop@Slave1:/home/hapood/app
sudo scp -r /home/hadoop/app/hadoop-2.6.0-cdh5.7.0 hadoop@Slave2:/home/hapood/app
- 格式化HDFS并启动集群
${HADOOP_HOME}/bin/hdfs namenode -format
${HADOOP_HOME}/sbin/start-dfs.sh //这个是启动
/${HADOOP_HOME}//sbin/stop-dfs.sh //这个是关闭
spark完全分布式的安装
- 下载spark的安装包(网站:http://archive.cloudera.com/cdh5/cdh/5/),一般放在/home/hadoop/software中
- 解压spark的压缩包:tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C /home/hadoop/app (习惯)
-添加spark的环境变量
#spark的环境变量,在~/.bash_profile中
export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
export PATH=$SPARK_HOME/bin:$PATH
- 修改spark的环境参数 spark-env.sh
#${SPARK_HOME}/conf/spark-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
export SPARK_MASTER_IP=10.23.14.217
export SPARK_MASTER_HOST=10.23.14.217
export SPARK_LOCAL_IP=10.23.12.24
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=2
export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
- 设置slave
${SPARK_HOME}/conf/slaves
slave1
slave2
- 将配置好的spark文件复制到Slave1和Slave2节点
sudo scp -r /home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0 hadoop@Slave1:/home/hapood/app
sudo scp -r /home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0 hadoop@Slave2:/home/hapood/app
- 两个slave也分别设置~/.bash_profile,记得修改 spark-env.sh 中的LOCAL_IP
- 然后在启动集群
# ${SPARK_HOME}/sbinstart-all.sh
最终结果
-
在Master中看到
image
-
在两个slave中看到
image









网友评论