In part 1 of this tutorial key terminologies used in kerberos authentication were discussed. We demonstrated how to set up and configure a KDC server to issue tickets to authenticate users. We also demonstrated how to install and configure a kerberos client. The client can be used to authenticate from the same machine running KDC server or from a remote machine. You just need to point the client to the correct KDC server. In the previous article we also demonstrated how to change SSH and Hadoop configurations to enable kerberos authentication. In this tutorial we will not review concepts mentioned above so if you are not comfortable in those areas please refer to part 1 of this tutorial. In this tutorial we will just focus on using kerberos to authenticate users and services.
Each user that needs to access Hadoop requires its own kerberos principal. So you will need to create as many principals as there are users. Creating principals is done via the kadmin utility. You need to specify the name of the principal and the realm it will created in. To create a principal learner in LOCALHOST (replace this with realm you created in part 1 of the tutorial) realm use the command below. The learner name chosen as our principal name is a valid ubuntu user account. Before creating principals create a user account for each user. The users will belong to the hadoop user group. The commands below will create user accounts for hdfs, mapred, yarn and learner.
sudo adduser hdfs sudo adduser hdfs hadoop sudo adduser mapred sudo adduser mapred hadoop sudo adduser yarn hadoop sudo adduser learner sudo adduser learner hadoop
sudo kadmin.local addprinc learner@LOCALHOST
Using the construct above you create all principals that are required. Let’s also create a principal for the hdfs service.
addprinc hdfs@LOCALHOST
We can also create a user for yarn services
addprinc yarn@LOCALHOST
Also create a principal for mapred
addprinc mapred@LOCALHOST
Create a principal for the HTTP service
addprinc HTTP@LOCALHOST
To authenticate via kerberos with human interaction you use the kinit command to request tickets. You need to specify
In kerberos terminology Hadoop services such as yarn and hdfs are referred to as service principals. For each service principal you create encrypted kerberos keys referred to as keytabs. These keytabs are required for passwordless communication and authentication in a similar way SSH keys are used. The keys are distributed to every node in the Hadoop cluster. Each keytab points to a specific fully qualified domain name (FQDN) therefore each cluster node needs a keytab for every service principal. The keytab contains kerberos principals and their encrypted keys. Access to keytabs needs to be secured because their access gives principals rights and privileges.
To create keytabs you use the kadmin utility so all keytab creation commands are run from this shell. To create a keytab you specify the name of file that will store the keytab and the principal or principals that will be contained in the keytab. To create a keytab for the HTTP and hdfs principals you use the commands below.
sudo kadmin.local xst -norandkey -k hdfs.keytab hdfs@LOCALHOST HTTP@LOCALHOST
We also need to create a keytab containing the mapred and HTTP principals. The command below will create a keytab mapred containing the two principals.
xst -norandkey -k mapred.keytab mapred@LOCALHOST HTTP@LOCALHOST
Create a keytab named yarn that will contain the yarn and HTTP principals
xst -norandkey -k mapred.keytab yarn@LOCALHOST HTTP@LOCALHOST
After keytab files have been created we can inspect them using the klist command to check if they have been correctly created.
klist -e -k -t hdfs.keytab
Once our keytab files have been created we deploy them by moving them to a directory under etc directory. The deployment of keytab files must be done on all nodes in the Hadoop cluster. When you are using MRv1 as your execution engine you need to deploy hdfs and mapred keytabs. The command below is used to do that. When copying to a remote server it is advisable to use a secure method such as scp.
sudo mv hdfs.keytab mapred.keytab /etc/hadoop/conf/
When you are using yarn as your execution engine you need to deploy hdfs and yarn keytabs.
sudo mv hdfs.keytab mapred.keytab yarn.keytab /etc/hadoop/conf/
After your keytabs have been deployed we make them readable only by their respective users by assigning their ownership to correct users. This improves security because anybody who can access the keytabs will have all privileges belonging to that principal. The commands below change file ownership.
sudo chown hdfs:hadoop /etc/hadoop/conf/hdfs.keytab sudo chown mapred:hadoop /etc/hadoop/conf/mapred.keytab sudo chmod 400 /etc/hadoop/conf/*.keytab
To map a kerberos principal to a specific operating system user a rule specified in auth_to_local setting of krb5.conf configuration file is used. The default behaviour is to take the first part of the principal name as the operating system user if the principal is a member of the default realm specified in krb5.conf configuration file. For example the principal hdfs@LOCALHOST is mapped to hdfs user on the operating system if LOCALHOST has been specified as the default realm.
To authenticate via kerberos with human interaction you use the kinit command to request tickets. You need to specify the keytab and the principal requesting an access ticket. The construct for making such requests is shown below.
kinit -k -t hdfs.keytab hdfs
This tutorial reviewed the concepts that were covered in part and are needed to understand this tutorial. Creation of kerberos principals and operating system users was demonstrated. Creation of keytab files that allow kerberos authentication without a password was demonstrated. Deploying the keytabs was also demonstrated. Mapping a kerberos principal to an operating system user to facilitate authentication was demonstrated.
Hi,
Thanks for this information.
I have a query, in your blog you mentioned how passwordless kerberos security is setup.
if suppose admin setup password proof security then how we can access tools like hive/pig or hdfs.
Thanks in advance