hadoop为什么采用物理hadoop服务器配置

在Ubuntu上安装Hadoop(单机模式)_服务器应用_Linux公社-Linux系统门户网站
你好,游客
在Ubuntu上安装Hadoop(单机模式)
来源:oschina.net&
作者:贱圣
安装准备JDK 6配置SSH配置hadoop-env.shconf/hdfs-site.xml通过 NameNode 来格式化 HDFS 文件系统
最近开始学习Hadoop,在这里记录一下学习的过程。
& & Hadoop是一个用Java编写的用于运行与物理集群中的框架,吸收了GFS和mapreduce编程模型的特性。Hadoop的HDFS是一个高容错的分布式文件系统,并且它被设计运行于低成本的硬件上,能够提供很大的数据吞吐量,很适合那些数据量很大的应用程序。
& & 接下来,开始准备安装hadoop。我用的操作系统是
12.10 server,hadoop的版本是1.2.0。
安装准备 JDK 6
Hadoop需要在JDK1.5以上的环境下才能运行,目前推荐使用JDK 6 。
$ sudo apt-get update$ sudo apt-get install openjdk-6-jdk
安装完成之后,文件会被放置在 /usr/lib/jvm/java-6-openjdk-amd64 这个路径下。
Hadoop需要用SSH来管理它的节点。针对单机的情况,我们需要配置SSH让运行hadoop的用户能够登录本机。
首先,我们需要为运行hadoop的用户生成一个SSH key:
$ ssh-keygen -t rsa -P ""
然后,让你可以通过新生成的key来登录本地机器。
&$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
准备结束,下面开始安装hadoop。
在hadoop官网上下载 1.2.0 版本的软件,解压,然后把文件放到 /usr/local/ 路径下:
$ tar -zxvf hadoop-1.2.0.tar.gz$ mv hadoop-1.2.0 hadoop$ cp -r hadoop/ /usr/local/
接下来开始设置一些环境变量,需要在 ~/.bashrc 文件里面添加一下内容:
# Set Hadoop-related environment variablesexport HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
# Some convenient aliases and functions for running Hadoop-related commandsunalias fs && /dev/nullalias fs="hadoop fs"unalias hls && /dev/nullalias hls="fs -ls"
lzohead () {& & hadoop fs -cat $1 | lzop -dc | head -1000 | less}
# Add Hadoop bin/ directory to PATHexport PATH=$PATH:$HADOOP_HOME/bin
在修改完成后保存,重新登录,相应的环境变量就配置好了。 接下来开始配置Hadoop相关的东西,首先来看张图,了解下HDFS的结构:
相关资讯 & & &
& (11/16/:33)
& (10/27/:55)
& (06/15/:11)
& (11/09/:37)
& (08/01/:09)
& (11/06/:42)
   同意评论声明
   发表
尊重网上道德,遵守中华人民共和国的各项有关法律法规
承担一切因您的行为而直接或间接导致的民事或刑事法律责任
本站管理人员有权保留或删除其管辖留言中的任意内容
本站有权在网站内转载或引用您的评论
参与本评论即表明您已经阅读并接受上述条款Kubernetes部署大数据组件系列二:一键部署Hadoop集群 - 简书
Kubernetes部署大数据组件系列二:一键部署Hadoop集群
系列一中忘了说明,用Kubernetes部署大数据容器平台,有一个很大的好处是压根不用操心容器间的网络通信,不管是同一物理服务器内,还是跨物理服务器间的网络通信,你都不用操心,只需要把容器间互相关心的端口暴露好,把容器间的service name映射好,就OK了。
本篇教大家部署Hadoop 2.7.3集群,暂时没有做HA和联邦:Docker适合封装无状态的、单进程的应用程序,如果用它来部署Hadoop分布式集群,还是比较复杂的,主要考虑几个问题:
1,需要制作几个镜像?
2,哪些配置属于通用的,可以直接打包到镜像中?
3,会变更的配置,如何在容器启动时自动的、灵活的替换?
4,哪些服务端口需要暴露?
5,哪些目录需要持久化?
6,NameNode的Format怎么实现和如何避免重复Format?
7,用哪些脚本来启动哪些服务?
8,需要提前设置哪些环境变量?
9,Pod间的依赖和启动顺序如何控制?等等等。
物理拓扑及Hadoop集群角色如下:
10.0.8.182
Docker镜像制作、K8s yaml脚本编辑、kubectl客户端
10.10.4.57
私有镜像库
10.10.4.56
NameNode、DataNode、ResourceManager、NodeManager
10.10.4.57
DataNode、NodeManager
10.10.4.60
DataNode、NodeManager
1,先制作一个基础镜像:包含Ubuntu 16.04 LTS、jdk1.8.0_111、各类常用工具(net-tools、iputils、vim、wget)、SSH免密及开机启动、时钟与宿主机同步、无防火墙等。该镜像PUSH到HARBOR私有镜像库,以便后面制作Hadoop容器时引用它。制作过程略过,想用的话直接从链接下载即可:
2,鉴于Hadoop2.x版本有YARN的存在,所以制作两个镜像,称为:主镜像,包含:NameNode、ResourceManager守护进程;从镜像,包含:DataNode、NodeManager守护进程。
新建一个hadoop2.7.3目录,结构如下:
从官网下载hadoop-2.7.3.tar.gz;
取出其/etc/hadoop中的配置,放入到conf文件中,修改:core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、并将slaves清空;
docker-entrypoint-namenode.sh 制作主镜像时,重命名为docker-entrypoint.sh,启动相关服务进程;
docker-entrypoint-datanode.sh 制作从镜像时,重命名为docker-entrypoint.sh,启动相关服务进程;
Dockerfile,定义环境变量、申明端口、拷贝及替换文件等操作。
如下几个目录是需要持久化的,在后面的yaml文件中体现:core-site.xml:hadoop.tmp.dirhdfs-site.xml:dfs.namenode.name.dirhdfs-site.xml:dfs.namenode.data.dir
哪些配置文件作为通用文件,哪些配置文件的内容需要动态替换,NameNode 什么情况下做Format,主、从镜像各启动什么服务等,请自行参考链接中的脚本:
制作好镜像,并推送到HARBOR私有镜像库中,以便K8s编排容器时使用:
hadoop2.7.3 mv docker-entrypoint-namenode.sh docker-entrypoint.sh
hadoop2.7.3 sudo docker build -t hadoop-2.7.3-namenode-resourcemanager:0.0.1 .
hadoop2.7.3 sudo docker tag hadoop-2.7.3-namenode-resourcemanager:0.0.1
registry.k8s./bigdata/hadoop-2.7.3-namenode-resourcemanager:0.0.1
hadoop2.7.3 sudo docker push registry.k8s./bigdata/hadoop-2.7.3-namenode-resourcemanager:0.0.1
hadoop2.7.3 mv docker-entrypoint-datanode.sh docker-entrypoint.sh
hadoop2.7.3 sudo docker build -t hadoop-2.7.3-datanode-nodemanager:0.0.1 .
hadoop2.7.3 sudo docker tag hadoop-2.7.3-datanode-nodemanager:0.0.1
registry.k8s./bigdata/hadoop-2.7.3-datanode-nodemanager:0.0.1
hadoop2.7.3 sudo docker push registry.k8s./bigdata/hadoop-2.7.3-datanode-nodemanager:0.0.1
相关镜像已经推送到私有镜像库:
3,经过上述的步骤,主、从两个镜像就制作好了,下面,新生成一个hadoop文件夹,用来编写yaml文件,并执行K8s编排任务:
由于容器间需要相互通信,且需要对外提供服务,所以我们仍然采用Deployment方式来编排,每个Pod启动一个容器,Pod间暴露相关接口,每个Pod都可以对外提供服务,启动主镜像的Pod使用0.0.0.0监听自己的端口,启动从镜像的其他Pod使用主Pod的service name,作为目的hostname,持久化的 细节及动态替换hostname的细节,请参考yaml文件:
按照顺序启动Pod(这个过程可以做成脚本,一键执行):
hadoop kubectl create -f hadoop-namenode-resourcemanager.yaml --validate=false
hadoop kubectl create -f hadoop-datanode-nodemanager01.yaml --validate=false
hadoop kubectl create -f hadoop-datanode-nodemanager02.yaml --validate=false
hadoop kubectl create -f hadoop-datanode-nodemanager05.yaml --validate=false
当主镜像的Pod要更换service name时,只需要替换从镜像Pod yaml中的ConfigMap hostname值即可
apiVersion: v1
kind: ConfigMap
name: hdp-2-cm
hostname: "hdp-1-svc"
可以看到4个Pod分配到了指定的物理服务器上:
可以看到每个Pod对外提供的端口:
使用ksp-1主机的IP加hdp-1--9x6b9暴露的/TCP,可以看到HDFS Web中DataNode的情况:
使用ksp-1主机的IP加hdp-1--9x6b9暴露的/TCP,可以看到YARN Web中NodeManager的情况:
补充:对上述操作部署好的hadoop集群进行简单验证,hdfs的基本功能正常,执行一个mapreduce wordcount用例时,出现错误提示:
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount file:///hadoop-2.7.3/NOTICE.txt file:///hadoop-2.7.3/output2
17/07/03 05:32:10 INFO client.RMProxy: Connecting to ResourceManager at hdp-1-svc/12.0.112.23:8032
17/07/03 05:32:11 INFO input.FileInputFormat: Total input paths to process : 1
17/07/03 05:32:12 INFO mapreduce.JobSubmitter: number of splits:1
17/07/03 05:32:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_6_0006
17/07/03 05:32:12 INFO impl.YarnClientImpl: Submitted application application_6_0006
17/07/03 05:32:12 INFO mapreduce.Job: The url to track the job: http://hdp-1--9x6b9:8088/proxy/application_6_0006/
17/07/03 05:32:12 INFO mapreduce.Job: Running job: job_6_0006
17/07/03 05:32:18 INFO mapreduce.Job: Job job_6_0006 running in uber mode : false
17/07/03 05:32:18 INFO mapreduce.Job:
map 0% reduce 0%
17/07/03 05:32:19 INFO mapreduce.Job: Task Id : attempt_6_0006_m_, Status : FAILED
Container launch failed for container_6_002 : java.lang.IllegalArgumentException: java.net.UnknownHostException: hdp-2--g0njg
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:356)
at org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:238)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:266)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.&init&(ContainerManagementProtocolProxy.java:244)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:409)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:375)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hdp-2--g0njg
... 12 more
这个错误的原因:因为创建Pod时,会给每个容器随机分配一个hostname,我们前面是使用svc-name来保证服务间的通信,但是mapreduce执行时,使用的是hostname,而每台容器启动后,/etc/hosts中是仅有自己的主机名和ip的映射,是不包含其他容器的。而且,也无法动态的向/etc/hosts里面增加其他容器的主机名和ip的映射,所以分配ApplicationMaster时,无法响应客户端的请求。
修复起来较麻烦:需要修改主、从镜像的dockerfile,docker-entrypoint.sh,重新制作主,从镜像,然后修改yaml,之后按照原有流程启动即可。修改方案就是使用Headless Service,不再使用cluster ip,而是在 Pod之间直接使用各容器的私有IP,并在yaml指定Pod启动的容器的主机名,和svc-name配置成一致的。不过,这种方式下,外网不能直接访问集群了(比如外网的一个用户想登陆HDFS的web ui),需要启动一个Nginx容器,由它做一个DNS代理,能够让外网访问集群IP和端口即可,请自行搜索解决方案,这里不再描述。
相关修改如下,修改之后如何重新搭建集群,请参考前文。另,yaml我只修改了2个,其他节点的yaml照着修改就行了:Dockerfile:
FROM registry.k8s./bigdata/ubuntu16.04_jdk1.8.0_111:0.0.2
MAINTAINER Wang Liang &&
ARG DISTRO_NAME=hadoop-2.7.3
ARG DISTRO_NAME_DIR=/hadoop-2.7.3
ENV HADOOP_HOME=$DISTRO_NAME_DIR
ENV HADOOP_PREFIX=$DISTRO_NAME_DIR
ENV HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
ENV YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
ENV HADOOP_TMP_DIR=$HADOOP_HOME/tmp
ENV HADOOP_DFS_DIR=$HADOOP_HOME/dfs
ENV HADOOP_DFS_NAME_DIR=$HADOOP_DFS_DIR/name
ENV HADOOP_DFS_DATA_DIR=$HADOOP_DFS_DIR/data
ENV HADOOP_LOGS=$HADOOP_HOME/logs
ENV Master=localhost
ENV USER=root
# Hdfs ports
# Mapred ports
#Yarn ports
EXPOSE 32 42 8088
#Other ports
ADD hadoop-2.7.3.tar.gz /
WORKDIR $DISTRO_NAME_DIR
#ENV ZOO_USER=zookeeper \
ZOO_CONF_DIR=/conf \
ZOO_DATA_DIR=/data \
ZOO_UI_DIR=/zkui \
ZOO_DATA_LOG_DIR=/datalog \
ZOO_PORT=2181 \
ZOO_TICK_TIME=2000 \
ZOO_INIT_LIMIT=5 \
ZOO_SYNC_LIMIT=2
# Add a user and make dirs
#RUN set -x \
&& adduser -D "$ZOO_USER" \
&& mkdir -p "$ZOO_DATA_LOG_DIR" "$ZOO_DATA_DIR" "$ZOO_CONF_DIR" \
&& chown "$ZOO_USER:$ZOO_USER" "$ZOO_DATA_LOG_DIR" "$ZOO_DATA_DIR" "$ZOO_CONF_DIR"
#ARG DISTRO_NAME=hadoop-2.7.3
RUN rm -r -f $HADOOP_CONF_DIR
RUN mkdir -p "$HADOOP_TMP_DIR" "$HADOOP_DFS_NAME_DIR" "$HADOOP_DFS_DATA_DIR" "$HADOOP_LOGS"
ADD conf $HADOOP_CONF_DIR
COPY conf /conf_tmp
ENV PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# Download Apache Zookeeper, verify its PGP signature, untar and clean up
#RUN set -x \
&& tar -xzf "$DISTRO_NAME.tar.gz"
#WORKDIR $DISTRO_NAME
#VOLUME ["$ZOO_DATA_DIR", "$ZOO_DATA_LOG_DIR"]
#EXPOSE $ZOO_PORT
#ENV PATH=$PATH:/$DISTRO_NAME/bin:$ZOO_UI_DIR \
ZOOCFGDIR=$ZOO_CONF_DIR
COPY docker-entrypoint.sh /
ENTRYPOINT ["/docker-entrypoint.sh"]
docker-entrypoint-namenode.sh:
#!/bin/bash
source /etc/environment
source ~/.bashrc
source /etc/profile
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
sed "s/HOSTNAME/"$Master"/g"
/conf_tmp/core-site.xml & $HADOOP_CONF_DIR/core-site.xml
sed "s/HOSTNAME/"$Master"/g"
/conf_tmp/mapred-site.xml & $HADOOP_CONF_DIR/mapred-site.xml
sed "s/HOSTNAME/"$Master"/g"
/conf_tmp/yarn-site.xml & $HADOOP_CONF_DIR/yarn-site.xml
if [ "`ls -A $HADOOP_DFS_NAME_DIR`" = "" ]; then
echo "$DIRECTORY is indeed empty"
$HADOOP_PREFIX/bin/hdfs namenode -format
echo "$DIRECTORY is not empty"
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode
#$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
#$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh
start historyserver
#exec nohup /etc/init.d/ssh start &
/etc/init.d/ssh start
#exec /bin/bash
do sleep 1000; done
docker-entrypoint-datanode.sh:
#!/bin/bash
source /etc/environment
source ~/.bashrc
source /etc/profile
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
sed "s/HOSTNAME/"$Master"/g"
/conf_tmp/core-site.xml & $HADOOP_CONF_DIR/core-site.xml
sed "s/HOSTNAME/"$Master"/g"
/conf_tmp/mapred-site.xml & $HADOOP_CONF_DIR/mapred-site.xml
sed "s/HOSTNAME/"$Master"/g"
/conf_tmp/yarn-site.xml & $HADOOP_CONF_DIR/yarn-site.xml
#if [ "`ls -A $HADOOP_DFS_NAME_DIR`" = "" ]; then
#echo "$DIRECTORY is indeed empty"
#$HADOOP_PREFIX/bin/hdfs namenode -format
#echo "$DIRECTORY is not empty"
#$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode
#$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
#$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh
start historyserver
#exec nohup /etc/init.d/ssh start &
/etc/init.d/ssh start
#exec /bin/bash
do sleep 1000; done
hadoop-namenode-resourcemanager.yaml :
apiVersion: v1
kind: Service
name: hdp-1-svc
app: hdp-1-svc
clusterIP: None
#注意这里,要写成None,才能使用headless service
- port: 9000
name: hdfs
- port: 50070
name: hdfsweb
- port: 19888
name: jobhistory
- port: 8088
name: yarn
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 50075
name: hdfs5
- port: 50090
name: hdfs6
- port: 10020
name: mapred2
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 49707
name: other1
- port: 2122
name: other2
- port: 31010
name: hdfs7
- port: 8020
name: hdfs8
app: hdp-1
type: NodePort
apiVersion: v1
kind: ConfigMap
name: hdp-1-cm
master: "0.0.0.0"
hostname: "hdp-1-svc"
apiVersion: extensions/v1beta1
kind: Deployment
name: hdp-1
replicas: 1
app: hdp-1
hostname: hdp-1-svc
#指定容器启动时的主机名,配置成svc-name即可
nodeSelector:
containers:
- name: myhadoop-nn-rm
imagePullPolicy: Always
image: registry.k8s./bigdata/hadoop-2.7.3-namenode-resourcemanager:0.0.1
securityContext:
privileged: true
resources:
memory: "2Gi"
cpu: "500m"
- containerPort: 9000
name: hdfs
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 50070
name: hdfsweb
- containerPort: 50075
name: hdfs5
- containerPort: 50090
name: hdfs6
- containerPort: 19888
name: jobhistory
- containerPort: 10020
name: mapred2
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn
- containerPort: 49707
name: other1
- containerPort: 2122
name: other2
- containerPort: 31010
name: hdfs7
- containerPort: 8020
name: hdfs8
- name: Master
valueFrom:
configMapKeyRef:
name: hdp-1-cm
key: master
- name: HOSTNAME
valueFrom:
configMapKeyRef:
name: hdp-1-cm
key: hostname
readinessProbe:
- "zkok.sh"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
- "zkok.sh"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: name
mountPath: /hadoop-2.7.3/dfs/name
- name: data
mountPath: /hadoop-2.7.3/dfs/data
- name: tmp
mountPath: /hadoop-2.7.3/tmp
- name: logs
mountPath: /hadoop-2.7.3/logs
- name: name
path: /home/data/bjrddata/hadoop/name021
- name: data
path: /home/data/bjrddata/hadoop/data021
- name: tmp
path: /home/data/bjrddata/hadoop/tmp021
- name: logs
path: /home/data/bjrddata/hadoop/logs021
hadoop-datanode-nodemanager01.yaml :
apiVersion: v1
kind: Service
name: hdp-2-svc
app: hdp-2-svc
clusterIP: None
- port: 9000
name: hdfs
- port: 50070
name: hdfsweb
- port: 19888
name: jobhistory
- port: 8088
name: yarn
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 50075
name: hdfs5
- port: 50090
name: hdfs6
- port: 10020
name: mapred2
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 49707
name: other1
- port: 2122
name: other2
- port: 31010
name: hdfs7
- port: 8020
name: hdfs8
app: hdp-2
type: NodePort
apiVersion: v1
kind: ConfigMap
name: hdp-2-cm
master: "hdp-1-svc"
hostname: "hdp-2-svc"
apiVersion: extensions/v1beta1
kind: Deployment
name: hdp-2
replicas: 1
app: hdp-2
hostname: hdp-2-svc
nodeSelector:
containers:
- name: myhadoop-dn-nm
imagePullPolicy: Always
image: registry.k8s./bigdata/hadoop-2.7.3-datanode-nodemanager:0.0.1
securityContext:
privileged: true
resources:
memory: "2Gi"
cpu: "500m"
- containerPort: 9000
name: hdfs
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 50070
name: hdfsweb
- containerPort: 50075
name: hdfs5
- containerPort: 50090
name: hdfs6
- containerPort: 19888
name: jobhistory
- containerPort: 10020
name: mapred2
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn
- containerPort: 49707
name: other1
- containerPort: 2122
name: other2
- containerPort: 31010
name: hdfs7
- containerPort: 8020
name: hdfs8
- name: Master
valueFrom:
configMapKeyRef:
name: hdp-2-cm
key: master
- name: HOSTNAME
valueFrom:
configMapKeyRef:
name: hdp-2-cm
key: hostname
readinessProbe:
- "zkok.sh"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
- "zkok.sh"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: name
mountPath: /hadoop-2.7.3/dfs/name
- name: data
mountPath: /hadoop-2.7.3/dfs/data
- name: tmp
mountPath: /hadoop-2.7.3/tmp
- name: logs
mountPath: /hadoop-2.7.3/logs
- name: name
path: /home/data/bjrddata/hadoop/name022
- name: data
path: /home/data/bjrddata/hadoop/data022
- name: tmp
path: /home/data/bjrddata/hadoop/tmp022
- name: logs
path: /home/data/bjrddata/hadoop/logs022
重新部署集群:(可以看到Cluster ip已经没有了)
wordcount运行结果:
root@hdp-5-svc:/hadoop-2.7.3# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount hdfs://hdp-1-svc:9000/test/NOTICE.txt hdfs://hdp-1-svc:9000/test/output01
17/07/03 09:53:57 INFO client.RMProxy: Connecting to ResourceManager at hdp-1-svc/192.168.25.14:8032
17/07/03 09:53:57 INFO input.FileInputFormat: Total input paths to process : 1
17/07/03 09:53:57 INFO mapreduce.JobSubmitter: number of splits:1
17/07/03 09:53:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_0_0001
17/07/03 09:53:58 INFO impl.YarnClientImpl: Submitted application application_0_0001
17/07/03 09:53:58 INFO mapreduce.Job: The url to track the job: http://hdp-1-svc:8088/proxy/application_0_0001/
17/07/03 09:53:58 INFO mapreduce.Job: Running job: job_0_0001
17/07/03 09:54:04 INFO mapreduce.Job: Job job_0_0001 running in uber mode : false
17/07/03 09:54:04 INFO mapreduce.Job:
map 0% reduce 0%
17/07/03 09:54:09 INFO mapreduce.Job:
map 100% reduce 0%
17/07/03 09:54:14 INFO mapreduce.Job:
map 100% reduce 100%
17/07/03 09:54:15 INFO mapreduce.Job: Job job_0_0001 completed successfully
17/07/03 09:54:15 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=11392
FILE: Number of bytes written=261045
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=15080
HDFS: Number of bytes written=8969
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2404
Total time spent by all reduces in occupied slots (ms)=2690
Total time spent by all map tasks (ms)=2404
Total time spent by all reduce tasks (ms)=2690
Total vcore-milliseconds taken by all map tasks=2404
Total vcore-milliseconds taken by all reduce tasks=2690
Total megabyte-milliseconds taken by all map tasks=2461696
Total megabyte-milliseconds taken by all reduce tasks=2754560
Map-Reduce Framework
Map input records=437
Map output records=1682
Map output bytes=20803
Map output materialized bytes=11392
Input split bytes=102
Combine input records=1682
Combine output records=614
Reduce input groups=614
Reduce shuffle bytes=11392
Reduce input records=614
Reduce output records=614
Spilled Records=1228
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=173
CPU time spent (ms)=1800
Physical memory (bytes) snapshot=
Virtual memory (bytes) snapshot=
Total committed heap usage (bytes)=
Shuffle Errors
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=14978
File Output Format Counters}

我要回帖

更多关于 hadoop集群服务器配置 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信