Kerberos authentication in Google Dataproc clusters offers enhanced security and user isolation. It’s a crucial step for multi-tenancy, encryption, and user authentication within your Dataproc cluster. In this guide, we’ll walk you through the process of setting up Kerberos on a Dataproc cluster step by step.
Prerequisites
Before you begin, ensure you have:
- A Google Cloud project with Dataproc enabled.
Step 1: Create a Service Account
Start by creating a service account for Dataproc:
gcloud iam service-accounts create dp-svc-runner \
--description="Dataproc service account" \
--display-name="Dataproc SA"
Step 2: Create a KMS Key
If you don’t have one already, create a key ring:
gcloud kms keyrings create ahmed-keyring \
--location us-east1
Next, create a key within the key ring:
gcloud kms keys create dataproc-ahmed-key \
--location us-east1 \
--keyring ahmed-keyring \
--purpose encryption
Grant cryptoKeyDecrypter permission to the service account:
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member serviceAccount:dp-svc-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com \
--role roles/cloudkms.cryptoKeyDecrypter
Step 3: Give dataproc.worker
Permission
Grant the dataproc.worker
role to the service account:
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:dp-svc-runner@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/dataproc.worker"
Step 4: Creating the Kerberos Root Principal Password
Use the service account to create the Kerberos root principal password file:
echo "my-strong-and-complex-password" | \
gcloud kms encrypt \
--location=us-east1 \
--keyring=ahmed-keyring \
--key=dataproc-ahmed-key \
--plaintext-file=- \
--ciphertext-file=kerberos-root-principal-password.encrypted
Move the kerberos-root-principal-password.encrypted
file to a GCS bucket:
gsutil cp kerberos-root-principal-password.encrypted gs://dp-run-bucket/kerberos-rt-file/
Step 5: Creating the Cluster
Now, it’s time to create your Dataproc cluster using Terraform or the command line. Here’s a snippet of Terraform configuration:
resource "google_dataproc_cluster" "simplecluster" {
name = "simplecluster"
region = "us-east1"
project = "YOUR_PROJECT_ID"
cluster_config {
# Other configuration settings...
security_config {
kerberos_config {
kms_key_uri = "projects/YOUR_PROJECT_ID/locations/us-east1/keyRings/ahmed-keyring/cryptoKeys/dataproc-ahmed-key"
root_principal_password_uri = "gs://dp-run-bucket/kerberos-rt-file/kerberos-root-principal-password.encrypted"
}
}
}
}
Or use the command line:
gcloud dataproc clusters create cluster-name \
--region=region \
--image-version=1.3 \
--kerberos-root-principal-password-uri=gs://dp-run-bucket/kerberos-rt-file/kerberos-root-principal-password.encrypted \
--kerberos-kms-key=projects/YOUR_PROJECT_ID/locations/us-east1/keyRings/ahmed-keyring/cryptoKeys/dataproc-ahmed-key
Congratulations! You’ve successfully set up Kerberos on your Dataproc cluster, enhancing its security and authentication capabilities.
Remember to configure other aspects of your cluster as needed and explore further documentation to optimize your Dataproc cluster for your specific use cases.