- 1. 资源规划
- 2. 安装介质
- 3. 环境准备
- 4. 集群配置
- 5. 安装Hadoop
- 清理logs和tmp
- 格式化HDFS
- 停止NameNode(NN)
- sbin/hadoop-daemon.sh stop namenode # 过时(hadoop-2.x)
- 停止DataNode(DN)
- sbin/hadoop-daemon.sh stop datanode # 过时(hadoop-2.x)
- sbin/yarn-daemon.sh start resourcemanager # 过时(hadoop-2.x)
- sbin/yarn-daemon.sh start nodemanager # 过时(hadoop-2.x)
- sbin/yarn-daemon.sh stop resourcemanager # 过时(hadoop-2.x)
- sbin/yarn-daemon.sh stop nodemanager # 过时(hadoop-2.x)
vagrant
用户下操作root
用户下操作- bigdata-hk-node1
- bigdata-hk-node2
- bigdata-hk-node3
- sbin/mr-jobhistory-daemon.sh start historyserver # 过时(hadoop-2.x)
- sbin/mr-jobhistory-daemon.sh stop historyserver # 过时(hadoop-2.x)
- 附录
1. 资源规划
组件 | bigdata-hk-node1 | bigdata-hk-node2 | bigdata-hk-node3 |
---|---|---|---|
OS | centos7.6 | centos7.6 | centos7.6 |
JDK | jvm | jvm | jvm |
HDFS | NN/DN | DN | 2NN/DN |
YARN | NM | RM/NM/JobHistoryServer | NM |
2. 安装介质
版本:hadoop-3.1.3.tar.gz
下载:http://archive.apache.org/dist/hadoop/core
3. 环境准备
- 安装虚拟机
参考:《Vagrant安装CentOS-7.6》
- 安装JDK
- SSH免密
参考:《CentOS7.6-SSH免密》
说明:请留意Hadoop-3.x端口变更情况,详情参考附录:《Hadoop-3.x端口变更》。
4. 集群配置
已配置,参考:《Vagrant安装CentOS-7.6》。
5. 安装Hadoop
先在节点bigdata-hk-node1
上安装,之后分发到其他节点。(以下操作默认使用vagrant
用户操作)
cd /share
tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
# 清理安装介质中无用信息(可选)
rm -rf /opt/module/hadoop-3.1.3/share/doc
rm -rf /opt/module/hadoop-3.1.3/*/*.cmd
rm -rf /opt/module/hadoop-3.1.3/*/*/*.cmd
5.1. HDFS单机部署
创建相关目录
## Hadoop临时目录,与core-site.xml的${hadoop.tmp.dir}属性相关
mkdir -p /opt/module/hadoop-3.1.3/tmp
chmod -R a+w /opt/module/hadoop-3.1.3/tmp
hdfs配置
配置hadoop-env.sh。
vi /opt/module/hadoop-3.1.3/etc/hadoop/hadoop-env.sh
内容如下(修改):
export JAVA_HOME=/opt/module/jdk1.8.0_221
配置core-site.xml。
vi /opt/module/hadoop-3.1.3/etc/hadoop/core-site.xml
配置如下:
<!--默认文件系统的名称(uri's的authority部分用来指定host, port等。默认是本地文件系统。)-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata-hk-node1:8020</value>
</property>
<!-- Hadoop的临时目录,服务端参数,修改需重启。NameNode的Image/Edit目录依赖此配置 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/tmp</value>
</property>
<!-- WEB UI访问数据使用的用户名 -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>vagrant</value>
</property>
<!-- 不开启权限检查 -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- 兼容hive配置 -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.vagrant.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.vagrant.groups</name>
<value>*</value>
</property>
配置hdfs-site.xml。
vi /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml
配置如下:
<!-- NN WebUI端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>bigdata-hk-node1:9870</value>
</property>
<!-- NN有一个工作线程池,用来处理不同DN的并发心跳以及客户端并发的元数据操作。 -->
<property>
<name>dfs.namenode.handler.count</name>
<value>21</value>
</property>
<!-- 2NN WebUI端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata-hk-node3:9868</value>
</property>
<!-- HDFS数据副本数,默认3副本:本节点+同机架节点+其他机架节点,一般不大于datanode的节点数,建议默认3副本-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- HDFS中启用权限检查配置-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- 兼容hive配置 -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
格式化NameNode
格式化HDFS用于初始化NameNode管理的镜像和操作日志文件。
```bash
清理logs和tmp
rm -rf /opt/module/hadoop-3.1.3/logs/ rm -rf /opt/module/hadoop-3.1.3/tmp/
格式化HDFS
cd /opt/module/hadoop-3.1.3/ bin/hdfs namenode -format
出现下列提示表示格式化成功。
INFO common.Storage: Storage directory ///tmp/dfs/name has been successfully formatted.
- **单机启动hdfs**
**core-site.xml,这里配哪一台(`bigdata-hk-node1`),哪一台启动NameNode(NN)。**
```bash
cd /opt/module/hadoop-3.1.3/
# 启动NameNode(NN)
# sbin/hadoop-daemon.sh start namenode # 过时(hadoop-2.x)
bin/hdfs --daemon start namenode
# 启动DataNode(DN)
# sbin/hadoop-daemon.sh start datanode # 过时(hadoop-2.x)
bin/hdfs --daemon start datanode
启动完成后,输入jps查看进程,如果看到以下进程,表示NameNode节点基本ok了。
jps
**** DataNode
**** NameNode
Web UI验证(NN):http://bigdata-hk-node1:9870
- 单机停止hdfs ```bash cd /opt/module/hadoop-3.1.3/
停止NameNode(NN)
sbin/hadoop-daemon.sh stop namenode # 过时(hadoop-2.x)
bin/hdfs —daemon stop namenode
停止DataNode(DN)
sbin/hadoop-daemon.sh stop datanode # 过时(hadoop-2.x)
bin/hdfs —daemon stop datanode
<a name="Ce6ca"></a>
## 5.2. YARN单机部署
- **yarn配置**
1. 配置mapred-env.sh。
```bash
vi /opt/module/hadoop-3.1.3/etc/hadoop/mapred-env.sh
配置如下(新增,默认该文件全部是注释的):
export JAVA_HOME=/opt/module/jdk1.8.0_221
配置mapred-site.xml。
vi /opt/module/hadoop-3.1.3/etc/hadoop/mapred-site.xml
配置如下:
<!-- 配置MapReduce运行环境,yarn/yarn-tez -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 日志查看IPC及WEB UI配置-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>bigdata-hk-node2:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>bigdata-hk-node2:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<!-- 配置正在运行中的日志在hdfs上的存放路径 -->
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/jobhistory/done_intermediate</value>
</property>
<!-- 配置运行过的日志存放在hdfs上的存放路径 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/jobhistory/done</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/module/hadoop-3.1.3/etc/hadoop,
/opt/module/hadoop-3.1.3/share/hadoop/common/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/common/*,
/opt/module/hadoop-3.1.3/share/hadoop/hdfs,
/opt/module/hadoop-3.1.3/share/hadoop/hdfs/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/hdfs/*,
/opt/module/hadoop-3.1.3/share/hadoop/mapreduce/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/mapreduce/*,
/opt/module/hadoop-3.1.3/share/hadoop/yarn,
/opt/module/hadoop-3.1.3/share/hadoop/yarn/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/yarn/*
</value>
</property>
说明:通过命令
hadoop classpath
可获取Hadoop ClassPath,配置于:${mapreduce.application.classpath},解决异常:“Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster”。配置yarn-env.sh。
vi /opt/module/hadoop-3.1.3/etc/hadoop/yarn-env.sh
配置如下(新增,可加在文件末尾
G
):export JAVA_HOME=/opt/module/jdk1.8.0_221
配置yarn-site.xml。
vi /opt/module/hadoop-3.1.3/etc/hadoop/yarn-site.xml
配置如下:
<!-- RM节点 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata-hk-node2</value>
</property>
<!-- NM上运行的附属服务,用于提升Shuffle计算性能。mapreduce_shuffle/spark_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 如果没配置ApplicationMaster入口无法使用,默认:${yarn.resourcemanager.hostname}:8088 -->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>bigdata-hk-node2:8088</value>
</property>
<!-- 开启日志聚集功能,日志聚合到HDFS提供给WEB UI查看 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://bigdata-hk-node2:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- 是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
/opt/module/hadoop-3.1.3/etc/hadoop,
/opt/module/hadoop-3.1.3/share/hadoop/common/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/common/*,
/opt/module/hadoop-3.1.3/share/hadoop/hdfs,
/opt/module/hadoop-3.1.3/share/hadoop/hdfs/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/hdfs/*,
/opt/module/hadoop-3.1.3/share/hadoop/mapreduce/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/mapreduce/*,
/opt/module/hadoop-3.1.3/share/hadoop/yarn,
/opt/module/hadoop-3.1.3/share/hadoop/yarn/lib/*,
/opt/module/hadoop-3.1.3/share/hadoop/yarn/*
</value>
</property>
说明:通过命令
hadoop classpath
可获取Hadoop ClassPath,配置于:${yarn.application.classpath},解决异常:“Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster”。
单机启动YARN
yarn-site.xml,这里配哪一台(
bigdata-hk-node2
),哪一台启动ResourceManager(RM)。若YARN准备部署的节点与NN不同,则需要提前分发hadoop至YARN部署节点。# `vagrant`用户下操作
scp -r /opt/module/hadoop-3.1.3 vagrant@bigdata-hk-node2:/opt/module/
登录
bigdata-hk-node2
验证YARN: ```bash cd /opt/module/hadoop-3.1.3/
启动ResourceManager(RM)
sbin/yarn-daemon.sh start resourcemanager # 过时(hadoop-2.x)
bin/yarn —daemon start resourcemanager
启动NodeManager(NM)
sbin/yarn-daemon.sh start nodemanager # 过时(hadoop-2.x)
bin/yarn —daemon start nodemanager
输入`jps`查看进程,如果看到以下进程,表示ResourceManager节点基本ok了。
```bash
jps
**** ResourceManager
**** NodeManager
Web UI验证(YARN):http://bigdata-hk-node2:8088
- 单机停止YARN ```bash cd /opt/module/hadoop-3.1.3/
停止ResourceManager(RM)
sbin/yarn-daemon.sh stop resourcemanager # 过时(hadoop-2.x)
bin/yarn —daemon stop resourcemanager
停止NodeManager(NM)
sbin/yarn-daemon.sh stop nodemanager # 过时(hadoop-2.x)
bin/yarn —daemon stop nodemanager
<a name="0AyJR"></a>
## 5.3. HDFS集群部署
**第一次启动前,清除各节点tmp目录,并重新格式化NameNode(NN)。**
```bash
# 清理logs和tmp
rm -rf /opt/module/hadoop-3.1.3/logs/*
rm -rf /opt/module/hadoop-3.1.3/tmp/*
# 重新格式化HDFS
cd /opt/module/hadoop-3.1.3/
bin/hdfs namenode -format
配置从节点
vi /opt/module/hadoop-3.1.3/etc/hadoop/workers
配置如下:
bigdata-hk-node1
bigdata-hk-node2
bigdata-hk-node3
注意:以上配置单行内容不能出现空格,且最后一行不能为空行。
环境变量设置
sudo vi /etc/profile.d/bigdata_env.sh # :$(或G:`shift+g`)到达行尾添加
配置如下:
# HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
分发Hadoop ```bash
vagrant
用户下操作scp -r /opt/module/hadoop-3.1.3 vagrant@bigdata-hk-node2:/opt/module/ scp -r /opt/module/hadoop-3.1.3 vagrant@bigdata-hk-node3:/opt/module/
root
用户下操作
sudo scp /etc/profile.d/bigdata_env.sh root@bigdata-hk-node2:/etc/profile.d/ sudo scp /etc/profile.d/bigdata_env.sh root@bigdata-hk-node3:/etc/profile.d/
**注:如果Hadoop临时目录(`tmp`)不在Hadoop包中,从节点上仍然要先手动创建并赋权。**
- **环境变量生效**
```bash
# 所有节点激活环境变量或重新登录
source /etc/profile
集群启动hdfs
**core-site.xml,这里配哪一台(`bigdata-hk-node1`),哪一台启动NameNode(NN)。**
cd /opt/module/hadoop-3.1.3/
sbin/start-dfs.sh
启动完成后,输入
jps
查看进程,如果看到以下进程,表示NameNode节点基本ok了。 ```bashbigdata-hk-node1
jps DataNode NameNode
bigdata-hk-node2
jps ** DataNode
bigdata-hk-node3
jps DataNode SecondaryNameNode
Web UI验证:[http://bigdata-hk-node1:9870](http://bigdata-hk-node1:9870)
- **集群停止hdfs**
```bash
cd /opt/module/hadoop-3.1.3/
sbin/stop-dfs.sh
5.4. YARN集群部署
集群启动YARN
** yarn-site.xml,这里配哪一台(`bigdata-hk-node2`),哪一台启动ResourceManager(RM)。**
```bash cd /opt/module/hadoop-3.1.3/
启动YARN
sbin/start-yarn.sh
启动日志服务,默认日志路径:/tmp/hadoop-yarn/staging/history
由于日志存储于HDFS上,故使用日志服务需同时启动HDFS和YARN
sbin/mr-jobhistory-daemon.sh start historyserver # 过时(hadoop-2.x)
bin/mapred —daemon start historyserver
输入`jps`查看进程,如果看到以下进程,表示ResourceManager节点基本ok了。
```bash
# bigdata-hk-node1
jps
**** NodeManager
# bigdata-hk-node2
jps
**** ResourceManager
**** NodeManager
**** JobHistoryServer
# bigdata-hk-node3
jps
**** NodeManager
Web UI验证(YARN):http://bigdata-hk-node2:8088
Web UI验证(JobHistoryServer):http://bigdata-hk-node2:19888/jobhistory
- 集群停止YARN ```bash cd /opt/module/hadoop-3.1.3
停止日志服务
sbin/mr-jobhistory-daemon.sh stop historyserver # 过时(hadoop-2.x)
bin/mapred —daemon stop historyserver
停止YARN
附录
附录:Hadoop-3.x端口变更
分类 | 用途 | Haddop-2.x | Haddop-3.x |
---|---|---|---|
NNPorts | fs.defaultFS | 8020/9000 | 8020/9000/9820 |
NNPorts | NN HTTP UI | 50070 | 9870 |
NNPorts | NN HTTPS UI | 50470 | 9871 |
SNN ports | SNN HTTP | 50091 | 9869 |
SNN ports | SNN HTTP UI | 50090 | 9868 |
DN ports | DN IPC | 50020 | 9867 |
DN ports | DN | 50010 | 9866 |
DN ports | DN HTTP UI(dfs.datanode.http.address) | 50075 | 9864 |
DN ports | DN HTTPS UI(dfs.datanode.https.address) | 50475 | 9865 |
YARN ports | YARN UI | 8088 | 8088 |