Panduan Hadoop Mod Selamat (Kerberos) pada Nod Tunggal
Pseudo-Distributed, dalam VM Debian 13
Rancangan saya untuk meneliti konfigurasi Hadoop dalam mod selamat, mengikut amalan standard Hadoop yang memperuntukkan setiap servis dijalankan oleh pengguna khusus, akhirnya terlaksana. Walaupun hanya menggunakan nod tunggal, memahami asasnya terlebih dahulu sudah memadai sebelum beralih kepada kluster berbilang nod. Di sini saya dokumentasikan langkah perjalanannya.
Saya menjalankan eksperimen ini dalam VM Debian 13.
Keperluan Pakej
openssl-1.1
: Pakej ini sudah usang dan tidak lagi disediakan dalam Debian. Anda perlu membinanya dari sumber. Rujuk Langkah-langkah Membina Pakej OpenSSl 1.1 dari Sumber.Arch Linux menyediakan pakej ini dalam repositori utamanya.
pdsh
: Utiliti parallel remote shell yang membolehkan arahan dijalankan serentak pada banyak hos melalui SSH, rsh, atau modul komunikasi lain.Java SE 8: Muat turun dan ekstrak ke dalam
/usr/local
atau mana-mana direktori lain, asalkan laluannya dieksport.
Prinsipal servis
Rujuk Kerberos principals for Hadoop Daemons.
Pengguna | Komponen Hadoop | Prinsipal servis | Fail Keytab | Opsyen |
---|---|---|---|---|
hdfs | hdfs@REALM | /etc/security/keytab/hdfs.service.keytab | -randkey | |
hdfs | NameNode | nn/_HOST@REALM | /etc/security/keytab/nn.service.keytab | -randkey |
hdfs | Secondary NameNode | sn/_HOST@REALM | /etc/security/keytab/sn.service.keytab | -randkey |
hdfs | DataNode | dn/_HOST@REALM | /etc/security/keytab/dn.service.keytab | -randkey |
yarn | yarn@REALM | /etc/security/keytab/yarn.service.keytab | -randkey | |
yarn | ResourceManager | rm/_HOST@REALM | /etc/security/keytab/rm.service.keytab | -randkey |
yarn | NodeManager | nm/_HOST@REALM | /etc/security/keytab/nm.service.keytab | -randkey |
mapred | mapred@REALM | /etc/security/keytab/mapred.service.keytab | -randkey | |
mapred | JobHistoryServer | jhs/_HOST@REALM | /etc/security/keytab/jhs.service.keytab | -randkey |
user | user@REALM | $HOME/keytab | -norandkey |
- Cipta pengguna:
bash
sudo groupadd -g 2000 hadoop sudo useradd -m -U -G sudo,hadoop -s /usr/bin/zsh hdfs sudo useradd -m -U -G sudo,hadoop -s /usr/bin/zsh yarn sudo useradd -m -U -G sudo,hadoop -s /usr/bin/zsh mapred
Tetapan Lokal
Senarai Direktori dan Fail Kerberos
/etc ├── krb5.conf └── krb5.keytab
/etc/krb5kdc ├── kadm5.acl └── kdc.conf
/etc/security/keytab └── rujuk jadual prinsipal servis di atas
/var/lib/krb5kdc ├── .k5.CLUSTER.VM ├── principal ├── principal.kadm5 ├── principal.kadm5.lock └── principal.ok
Senarai Direktori dan Fail Hadoop
/etc/hadoop/keystore
/etc/profile.d/hadoop.sh
#!/bin/bash # ==================== JAVA CONFIGURATION ==================== export JAVA_HOME="/usr/local/jdk1.8.0_461" export PATH="$JAVA_HOME/bin:$PATH" # ==================== HADOOP CORE CONFIGURATION ==================== export HADOOP_HOME="/opt/hadoop-3.4.1" export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop" export HADOOP_COMMON_HOME="$HADOOP_HOME" export HADOOP_HDFS_HOME="$HADOOP_HOME" export HADOOP_MAPRED_HOME="$HADOOP_HOME" export HADOOP_YARN_HOME="$HADOOP_HOME" # ==================== HADOOP CLASSPATH & LIBRARIES ==================== export HADOOP_CLASSPATH="$HADOOP_HOME/share/hadoop/tools/lib/*:$HADOOP_CONF_DIR/*" export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native" export LD_LIBRARY_PATH="$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH" # ==================== HADOOP OPTIONS (ringkas) ==================== export HADOOP_HEAPSIZE="1000" export HADOOP_OPTS="\ -Djava.library.path=$HADOOP_HOME/lib/native \ -Djava.awt.headless=true \ -XX:+UseContainerSupport \ -XX:ErrorFile=/var/log/hadoop/hs_err_pid%p.log \ -Xmx${HADOOP_HEAPSIZE}m \ -server \ -XX:+UseG1GC \ -XX:MaxGCPauseMillis=200 $HADOOP_OPTS" #-Djava.security.krb5.conf=/etc/krb5.conf \ #-Djava.security.krb5.kdc=single.cluster.vm \ #-Djava.security.krb5.realm=CLUSTER.VM \ #-Dsun.security.krb5.debug=true \ #-Dsun.security.spnego.debug=true" #export HADOOP_JAAS_DEBUG=true # ==================== PATH CONFIGURATION ==================== export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH" # ==================== TMP DIRECTORIES ==================== export HADOOP_LOG_DIR="/var/log/hadoop" # ==================== LOGGING CONFIGURATION ==================== export HADOOP_ROOT_LOGGER="INFO,console" # Kerberos config (supaya konsisten tanpa bergantung pada -Djava.security.krb5.conf) export KRB5_CONFIG=/etc/krb5.conf # IPv4 lebih stabil untuk Hadoop (kurang isu IPv6 dalam RPC) export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS" # DNS → elak isu FQDN vs shortname dalam Kerberos export HADOOP_SECURITY_DNS_INTERFACE=enp1s0 # ubah ikut interface # atau # export HADOOP_SECURITY_DNS_NAMESERVER=192.168.0.1 # === JAVA 9+ COMPATIBILITY (jika menggunakan Java 11+) === # export HADOOP_OPTS="$HADOOP_OPTS --add-opens=java.base/java.lang=ALL-UNNAMED" # export HADOOP_OPTS="$HADOOP_OPTS --add-opens=java.base/java.lang.reflect=ALL-UNNAMED" # export HADOOP_OPTS="$HADOOP_OPTS --add-opens=java.base/java.io=ALL-UNNAMED" # export HADOOP_OPTS="$HADOOP_OPTS --add-opens=java.base/java.net=ALL-UNNAMED" # export HADOOP_OPTS="$HADOOP_OPTS --add-opens=java.base/java.nio=ALL-UNNAMED" # export HADOOP_OPTS="$HADOOP_OPTS --add-opens=java.base/java.util=ALL-UNNAMED" # ==================== VERIFICATION (optional) ==================== # Uncomment untuk debug environment variables # echo "Hadoop environment loaded successfully" # echo "JAVA_HOME: $JAVA_HOME" # echo "HADOOP_HOME: $HADOOP_HOME"
/opt/hadoop-3.4.1
/tmp
/var/lib/hadoop-hdfs
/var/log/hadoop
Tetapan ACL
bash
sudo chown root:yarn $HADOOP_HOME/bin/container-executor
sudo chmod 6050 $HADOOP_HOME/bin/container-executor
sudo setfacl -bR $HADOOP_HOME # cuci ACL lama
# $HADOOP_HOME/etc/hadoop
sudo chown -R root:root $HADOOP_CONF_DIR
# Mode asas (no world write/execute)
sudo find $HADOOP_CONF_DIR -type d -exec chmod 750 {} \;
sudo find $HADOOP_CONF_DIR -type f -exec chmod 640 {} \;
# Tambah ACL untuk service accounts
sudo setfacl -m u:hdfs:rx $HADOOP_CONF_DIR
sudo setfacl -m u:yarn:rx $HADOOP_CONF_DIR
sudo setfacl -m u:mapred:rx $HADOOP_CONF_DIR
sudo setfacl -m u:<user>:rx $HADOOP_CONF_DIR
# Fail dalamnya
sudo setfacl -R -m u:hdfs:r $HADOOP_CONF_DIR/*
sudo setfacl -R -m u:yarn:r $HADOOP_CONF_DIR/*
sudo setfacl -R -m u:mapred:r $HADOOP_CONF_DIR/*
sudo setfacl -R -m u:<user>:r $HADOOP_CONF_DIR/*
# NameNode & DataNode
sudo mkdir -p /var/lib/hadoop-hdfs
sudo chown -R hdfs:hadoop /var/lib/hadoop-hdfs
# Log
sudo mkdir -p /var/log/hadoop
sudo setfacl -m u:hdfs:rwx /var/log/hadoop
sudo setfacl -m u:yarn:rwx /var/log/hadoop
sudo setfacl -m u:mapred:rwx /var/log/hadoop
Penyediaan sijil dan kunci peribadi TLS dengan ‘keytool’ (pakej JAVA)
Directory tree (selain skrip):
/etc/hadoop/keystore ├── ca-bundle.pem ├── ca.csr ├── ca.jks ├── ca.pem ├── ca-trust.pem ├── root.jks ├── root.pem ├── root-trust.pem ├── server.csr ├── server.jks ├── server.pem └── truststore.jks
ssl-server.xml
$HADOOP_CONF_DIR/ssl-server.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<!-- Truststore (mengandungi CA yang dipercayai) -->
<property>
<name>ssl.server.truststore.location</name>
<value>/etc/hadoop/keystore/truststore.jks</value>
<description>Truststore to be used by NN and DN. Must be specified.</description>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value>change-it</value>
<description>Optional. Default value is "".</description>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".</description>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).</description>
</property>
<!-- Keystore (mengandungi sijil pelayan + kunci peribadi) -->
<property>
<name>ssl.server.keystore.location</name>
<value>/etc/hadoop/keystore/server.jks</value>
<description>Keystore to be used by NN and DN. Must be specified.</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value>change-it</value>
<description>Must be specified.</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value>change-it</value>
<description>Must be specified.</description>
</property>
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".</description>
</property>
<property>
<name>ssl.server.include.cipher.list</name>
<value>
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
</value>
</property>
<property>
<name>ssl.server.exclude.cipher.list</name>
<value>
.*_WITH_AES_.*_CBC_.*,
TLS_RSA_WITH_AES_128_GCM_SHA256,
TLS_RSA_WITH_AES_256_GCM_SHA384
</value>
<description>Optional. The weak security cipher suites that you want excluded.</description>
</property>
<!-- Protocol versions yang selamat -->
<property>
<name>ssl.server.exclude.protocol.list</name>
<value>SSLv2,SSLv3,TLSv1,TLSv1.1</value>
</property>
<property>
<name>ssl.server.include.protocol.list</name>
<value>TLSv1.2,TLSv1.3</value>
</property>
</configuration>
ssl-client.xml
$HADOOP_CONF_DIR/ssl-client.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>ssl.client.truststore.location</name>
<value>/etc/hadoop/keystore/truststore.jks</value>
<description>Truststore to be used by clients like distcp. Must be
specified.</description>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value>change-it</value>
<description>Optional. Default value is "".</description>
</property>
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
</configuration>
Konfigurasi utama
core-site.xml
$HADOOP_CONF_DIR/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://single.cluster.vm:9820</value>
</property>
<!-- Authentication type -->
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<!-- Authorisation on -->
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>
<!-- NameNode / SecondaryNameNode / DataNode → hdfs -->
RULE:[2:$1/$2@$0]([nsd]n/.*@CLUSTER\.VM)s/.*/hdfs/
<!-- ResourceManager / NodeManager → yarn -->
RULE:[2:$1/$2@$0]([rn]m/.*@CLUSTER\.VM)s/.*/yarn/
<!-- JobHistoryServer → mapred -->
RULE:[2:$1/$2@$0](jhs/.*@CLUSTER\.VM)s/.*/mapred/
<!-- Fallback -->
DEFAULT
</value>
</property>
<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value>
</property>
<property>
<name>hadoop.proxyuser.superuser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.superuser.groups</name>
<value>*</value>
</property>
</configuration>
hadoop-env.sh
Tambahkan dua baris di bawah:
$HADOOP_CONF_DIR/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_461
export HADOOP_LOG_DIR=/var/log/hadoop
workers
$HADOOP_CONF_DIR/workers
single.cluster.vm
container-executor.cfg
$HADOOP_CONF_DIR/container-executor.cfg
yarn.nodemanager.linux-container-executor.group=yarn #configured value of yarn.nodemanager.linux-container-executor.group
banned.users=hdfs,mapred,bin #comma separated list of users who can not run applications
min.user.id=1000 #Prevent other super-users
allowed.system.users=yarn ##comma separated list of system users who CAN run applications
feature.tc.enabled=false
HDFS
hdfs-site.xml
$HADOOP_CONF_DIR/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>nn/_HOST@CLUSTER.VM</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytab/nn.service.keytab</value>
</property>
<property>
<name>dfs.namenode.rpc-address</name>
<value>single.cluster.vm:9820</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>sn/_HOST@CLUSTER.VM</value>
</property>
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/etc/security/keytab/sn.service.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>dn/_HOST@CLUSTER.VM</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/security/keytab/dn.service.keytab</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/var/lib/hadoop-hdfs/namenode</value>
</property>
<property>
<name>dfs.namenode.https-address</name>
<value>single.cluster.vm:9871</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>single.cluster.vm:9869</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/lib/hadoop-hdfs/datanode</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>single.cluster.vm:9864</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>single.cluster.vm:9865</value>
</property>
<property>
<name>dfs.data.transfer.protection</name>
<value>authentication</value>
</property>
<property>
<name>dfs.https.server.keystore.resource</name>
<value>ssl-server.xml</value>
</property>
</configuration>
Jika keytab rosak atau tidak sah, padam fail keytab bersama servis principal dalam KDC dan kemudian hasilkan semula.
kinit
danssh
:bash
sudo su - hdfs kinit -kt /etc/security/keytab/hdfs.service.keytab hdfs ssh -v hdfs # (opsyen -v untuk melihat pengesahan autentikasi GSSAPI)
Semak resolusi nama servis dengan arahan berikut:
bash
hadoop kerbname resolve nn/single.cluster.vm@REALM
Name: resolve to resolve Name: nn/single.cluster.vm@CLUSTER.VM to hdfs
Gunakan
hadoop kdiag
untuk semak integriti keytab. Output dijangka:... == Log in user == UGI instance = hdfs@CLUSTER.VM (auth:KERBEROS) Has kerberos credentials: true Authentication method: KERBEROS Real Authentication method: KERBEROS ...
Semak keadaan konfigurasi. Antara output (jika set
log4j.logger.org.apache.hadoop.security=DEBUG
):bash
hadoop kdiag --nofail --resource core-site.xml --resource hdfs-site.xml \ --resource yarn-site.xml --resource mapred-site.xml \ --keytab /etc/security/keytab/hdfs.service.keytab \ --verifyshortname --principal hdfs@CLUSTER.VM
[DateTime] INFO security.KDiag: Loading resource core-site.xml [DateTime] INFO security.KDiag: Loading resource hdfs-site.xml [DateTime] INFO security.KDiag: Loading resource yarn-site.xml [DateTime] INFO security.KDiag: Loading resource mapred-site.xml ... Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17. >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbAsReq creating message >>> KrbKdcReq send: kdc=single.cluster.vm UDP:88, timeout=30000, number of retries =3, #bytes=221 >>> KDCCommunication: kdc=single.cluster.vm UDP:88, timeout=30000,Attempt =1, #bytes=221 >>> KrbKdcReq send: #bytes read=737 >>> KdcAccessibility: remove single.cluster.vm Looking for keys for: hdfs@CLUSTER.VM Added key: 17version: 2 Added key: 18version: 2 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbAsRep cons in KrbAsReq.getReply hdfs 2025-08-27 19:54:38,830 DEBUG security.UserGroupInformation: Hadoop login 2025-08-27 19:54:38,831 DEBUG security.UserGroupInformation: hadoop login commit 2025-08-27 19:54:38,831 DEBUG security.UserGroupInformation: Using existing subject: [hdfs@CLUSTER.VM, hdfs@CLUSTER.VM]
Format
NameNode
dan mulakandaemon
:bash
hdfs namenode -format start-dfs.sh jps
Uji dengan OpenSSL
bash
echo "=== Testing with OpenSSL ==="
openssl s_client -connect single.cluster.vm:9871 \
-CAfile /etc/hadoop/keystore/ca-bundle.pem \
-showcerts \
-brief
=== Testing with OpenSSL === CONNECTION ESTABLISHED Protocol version: TLSv1.3 Ciphersuite: TLS_AES_256_GCM_SHA384 Peer certificate: CN = single.cluster.vm, OU = Server, O = Hadoop-Server, L = xx, ST = xx, C = MY Hash used: SHA256 Signature type: ECDSA Verification: OK Server Temp Key: ECDH, P-256, 256 bits
Uji dengan keytool
bash
echo "=== Testing with Keytool ==="
keytool -printcert -sslserver single.cluster.vm:9871 \
-keystore /etc/hadoop/keystore/truststore.jks
Direktori dan ACL dalam HDFS
/mr-history/tmp
/mr-history/done
/tmp/logs
/user/history
/user/<nama-pengguna>
, i.e.raihan
Cipta direktori-direktori yang diperlukan dan tetapkan ACL:
bash
# Direktori untuk JobHistory (MRv2)
hdfs dfs -mkdir -p /mr-history/{done,tmp}
hdfs dfs -chown -R mapred:hadoop /mr-history
# tmp → direktori “sticky”, semua pengguna boleh menulis tetapi tidak boleh memadam fail milik orang lain
hdfs dfs -chmod 1777 /mr-history/tmp
# done → hanya owner, group, dan ACL dibenarkan
hdfs dfs -chmod 750 /mr-history/done
# ACL untuk /mr-history
# `default` → pastikan semua fail/subdir baru boleh dicapai mapred
hdfs dfs -setfacl -m default:user:mapred:rwx /mr-history/tmp
hdfs dfs -setfacl -m default:user:mapred:rwx /mr-history/done
# Direktori untuk JobHistory user (MRv2)
hdfs dfs -mkdir -p /user/history
hdfs dfs -chown mapred:hadoop /user/history
hdfs dfs -chmod 1777 /user/history
hdfs dfs -setfacl -m default:user:mapred:rwx /user/history
# Direktori home untuk user raihan
hdfs dfs -mkdir -p /user/raihan
hdfs dfs -chown raihan:supergroup /user/raihan
# Benarkan user yarn membaca direktori home raihan
hdfs dfs -setfacl -m user:yarn:r-x /user/raihan
# Direktori logs YARN (digunakan untuk log aplikasi)
hdfs dfs -mkdir -p /tmp/logs
hdfs dfs -chown yarn:hadoop /tmp/logs
hdfs dfs -chmod 1777 /tmp/logs
# Benarkan pengguna `raihan` menulis direktori /tmp
hdfs dfs -setfacl -m user:raihan:rwx /tmp
# Semakan hasil konfigurasi
echo "==> Semakan ACL:"
for d in /mr-history/tmp /mr-history/done /user/history /user/raihan /tmp /tmp/logs; do
echo "-- $d --"
hdfs dfs -getfacl $d
done
YARN
yarn-site.xml
$HADOOP_CONF_DIR/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>single.cluster.vm</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>0.0.0.0:8090</value>
</property>
<property>
<name>yarn.resourcemanager.principal</name>
<value>rm/_HOST@CLUSTER.VM</value>
</property>
<property>
<name>yarn.resourcemanager.keytab</name>
<value>/etc/security/keytab/rm.service.keytab</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.nodemanager.hostname</name>
<value>single.cluster.vm</value>
</property>
<property>
<name>yarn.nodemanager.webapp.https.address</name>
<value>0.0.0.0:8044</value>
</property>
<property>
<name>yarn.nodemanager.principal</name>
<value>nm/_HOST@CLUSTER.VM</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>/etc/security/keytab/nm.service.keytab</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/hadoop-yarn/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/hadoop-yarn/nm-log-dir</value>
</property>
<property>
<name>yarn.nodemanager.shuffle-server-heap-size-mb</name>
<value>512</value> <!-- Increase from default 256MB -->
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.secure-mode.pool-user-count</name>
<value>100</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
</configuration>
kinit
danssh
:bash
sudo su - yarn kinit -kt /etc/security/keytab/yarn.service.keytab yarn ssh -v yarn
Sahkan kefungsian konfigurasi dengan
hadoop kdiag
. Arahan ini akan menguji integriti fail keytab, kesahihan prinsip, dan kaedah autentikasi Kerberos yang sedang digunakan.Mulakan
daemon
:bash
start-yarn.sh jps
MAPRED
mapred-site.xml
$HADOOP_CONF_DIR/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_CONF_DIR:/*</value>
</property>
<!-- JobHistory configuration -->
<property>
<name>mapreduce.jobhistory.principal</name>
<value>jhs/_HOST@CLUSTER.VM</value>
</property>
<property>
<name>mapreduce.jobhistory.keytab</name>
<value>/etc/security/keytab/jhs.service.keytab</value>
</property>
<!-- JobHistory Server address -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>single.cluster.vm:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.https.address</name>
<value>single.cluster.vm:19890</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx384m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx384m</value>
</property>
<property>
<name>mapreduce.shuffle.service.enabled</name>
<value>true</value>
</property>
<!-- Increase max fetch failures (default: 10) -->
<property>
<name>mapreduce.reduce.shuffle.max-fetch-failures</name>
<value>50</value>
</property>
<!-- Increase connection timeout (default: 180000ms = 3min) -->
<property>
<name>mapreduce.reduce.shuffle.connect.timeout</name>
<value>300000</value> <!-- 5 minutes -->
</property>
<!-- Increase read timeout -->
<property>
<name>mapreduce.reduce.shuffle.read.timeout</name>
<value>300000</value> <!-- 5 minutes -->
</property>
<!-- Retry interval between fetch attempts -->
<property>
<name>mapreduce.reduce.shuffle.retry-interval-ms</name>
<value>5000</value> <!-- 5 seconds -->
</property>
<!-- Increase shuffle memory buffer -->
<property>
<name>mapreduce.reduce.shuffle.input.buffer.percent</name>
<value>0.25</value> <!-- Default: 0.70, reduce if memory constrained -->
</property>
<property>
<name>mapreduce.reduce.shuffle.memory.limit.percent</name>
<value>0.15</value> <!-- Default: 0.25 -->
</property>
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
</configuration>
kinit
danssh
:bash
sudo su - mapred kinit -kt /etc/security/keytab/mapred.service.keytab mapred ssh -v mapred
Sahkan kefungsian konfigurasi dengan
hadoop kdiag
.Mulakan
daemon
:bash
mapred --daemon start historyserver jps
Jalankan kerja sebagai pengguna
kinit
danssh
:bash
kinit ssh -v debianvm
Sahkan kefungsian konfigurasi dengan
hadoop kdiag
.Semak servis yang hidup:
bash
sudo jps
1374455 Jps 1363223 SecondaryNameNode 1372610 JobHistoryServer 1371358 NodeManager 1362943 DataNode 1371115 ResourceManager 1362795 NameNode
Quick job:
bash
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.1.jar pi 2 10
Number of Maps = 2 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Starting Job ... 2025-08-22 21:20:35,309 INFO mapreduce.Job: map 100% reduce 100% 2025-08-22 21:20:36,330 INFO mapreduce.Job: Job job_1755868376900_0003 completed successfully 2025-08-22 21:20:36,478 INFO mapreduce.Job: Counters: 54 ... Job Finished in 23.129 seconds Estimated value of Pi is 3.80000000000000000000
Selepas kerja selesai, log masuk sebagai pengguna hdfs
lalu berikan kebenaran kepada pengguna raihan
untuk menulis di direktori berikut:
bash
hdfs dfs -setfacl -R -m default:user:yarn:rwx /tmp/logs/raihan
Penelitian log sebagai YARN
bash
# To get the application list:
yarn application -appStates FINISHED -list
# To view the log (save it as a file for full view)
yarn logs -applicationId <application_ID> > output.log 2>&1