This cookbook [currently as of now] can be used to setup a Cloudera Manager Server (Management Server) running on MySQL/Postgres database. But the intended use for this cookbook [rather a wishlist] is to do more.
An Auto Deployment
of a Cloudera Hadoop Cluster using Chef, Python and Cloudera API. This will help create cluster for a development/test/preproduction/production
environment on a click of a button.
Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data. Unlike traditional systems, Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware. CDH, Cloudera’s open source platform, is the most popular distribution of Hadoop and related projects in the world
Github - Cookbook Location
This cookbook can used to setup a cloudera hadoop cluster. There are 2 phases to this cookbook,
- first, preconfiguration which needs to be run on all nodes (edge/admin/master/worker nodes)
- second, cloudera manager configuration which needs be run on admin node, which will be used to setup the cluster.
Currently cookbook will setup all nodes and cloudera manager with required database in mysql/postgres as per requirement. [by default mysql
is installed as the database.]
Phase 1 - Perparing Nodes for Cloudera Hadoop Environment.
Phase 1 includes. include_recipe 'cdhmgr-chef-setup::pre_config'
selinux
updates.iptables
to be disabled.- Create
users
, currently we will be creatingsysadmin
on all the nodes. (cmadmin
user will be created in cloudera setup cookbook) - Update
sudo
ers list. sysctl
update./etc/hosts
file update, based on the nodes in the cluster. (currently we are using sample IPs)- Security setup, using
krb5
andsssd
configuration. [Currently commented as we do not have AD setup yet]
Details about Phase 1.
Setting selinux to permissive.
##
# Setting up selinux to permissive mode
##
include_recipe 'cdhmgr-chef-setup::pre_config_selinux_update'
Disable iptables.
##
# Iptables update - currently we are disabling
##
include_recipe 'cdhmgr-chef-setup::pre_config_iptables_setup'
Configuring security, setting up krb5.conf and sssd for ldap logins (currently by default disabled). Update the attributes
as per requirement to use this.
##
# Setting up krb5 and sssd
##
#include_recipe 'cdhmgr-chef-setup::pre_config_security_setup'
Creating cmadmin
user on all nodes, and has password less access to all nodes with sudo
permissions.
##
# Creating a user `sysadmin` with sudo permission.
##
include_recipe 'cdhmgr-chef-setup::pre_config_user_setup'
Setting up sysctl
for swapiness and basic setting for hadoop.
##
# Setting up `sysctl`
##
include_recipe 'cdhmgr-chef-setup::pre_config_sysctl_config'
Setting up disable_transparent_hugepage_defrag
and disable_transparent_hugepage_defrag_hugepage_enables
in rc.local
file. Install rpcbind
, telnet
.
##
# Setting up hadoop related config for each server.
##
include_recipe 'cdhmgr-chef-setup::pre_config_hadoop_commons_config'
Setting /etc/security/limits.d/hadoop-cluster.conf
file for limits.
##
# Setting ulimit for the cluster
##
include_recipe 'cdhmgr-chef-setup::pre_config_hadoop_ulimits'
Setting up connector / client based on database selections.
##
# Mysql Connector for the setup.
##
include_recipe 'cdhmgr-chef-setup::pre_config_database_connectors'
Finally setting up /etc/hosts
file based on the role information.
##
# Setting up `hostfile_setup`
##
include_recipe 'cdhmgr-chef-setup::pre_config_hostfile_setup'
Phase 2 - Cloudera Manager Installation
Phase 2 includes. include_recipe 'cdhmgr-chef-setup::cloudera_manager'
Setting up Cloudera manager, with mysql
or postgresql
[defaults to mysql
]
Databases created. [NOTE: Check passwords in the cookbook]
------------------------------------------------------------------------------
Services dbname user password
-----------------------------------------------------------------------------
Cloudera Manager Database cmdb cmadmin cmadmin_password
Activity Monitor amon amon amon_password
Reports Manager rman rman rman_password
Hive Metastore Server metastore hive hive_password
Sentry Server sentry sentry sentry_password
Cloudera Navigator Audit Server nav nav nav_password
Cloudera Navigator Metadata Server navms navms navms_password
Hue Database hue hue hue_password
Oozie oozie oozie oozie_password
Setting up database (mysql / postgres) based on role information.
#
# Setting up mysql database for cloudera manager
#
include_recipe 'cdhmgr-chef-setup::cloudera_manager_database_setup'
Setting up Cloudera Manager
#
# Installing Cloudera Manager
#
include_recipe 'cdhmgr-chef-setup::cloudera_manager_setup'
Setting up python api packages, which will be used later on when we are deploying cluster using cloudera api.
#
# Setting up python api requirements
#
include_recipe 'cdhmgr-chef-setup::cloudera_manager_python_api'
What’s cooking in the cookbook
?
This cookbook
will be used in the hadoop cluster nodes before we start setting it up with cloudera
. We will have specific cookbook
for each of the nodes based on roles.
Example : Below are the different types of nodes and will have specifics roles assigned (roles will be created once the cookbooks
are ready)
- Edge nodes. (Install default recipe
pre_config
) - Management nodes. (Install default recipe
pre_config
) - Admin nodes. (Install recipe
pre_config
andcloudera_manager
) - Slave nodes (datanodes). (Install default recipe
pre_config
)
Sample Role based on the setup.
NOTE: If we have internet connection on the nodes (may be through a proxy) then we need to only set the host information - rest of the role config below can be ignored safely
Basic Roles - which can be used to setup the environment
Only difference in the below basic
roles is the run_list
assigned to it.
NON-ADMIN
hasrun_list "recipe[cdhmgr-chef-setup::pre_config]"
ADMIN
hasrun_list "recipe[cdhmgr-chef-setup::pre_config]", "recipe[cdhmgr-chef-setup::cloudera_manager]"
NON-ADMIN
Nodes.
name "non_admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]"
#
# Default attributes to setup `/etc/hosts file`
#
default_attributes "cdhmgr-chef-setup" => {
"etc_hosts_entries" => {
"192.168.13.2" => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01 - Cloudera Hadoop Node" },
"192.168.13.3" => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02 - Cloudera Hadoop Node" },
"192.168.13.4" => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01 - Cloudera Hadoop Node" },
"192.168.13.6" => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
"192.168.13.7" => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
"192.168.13.9" => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
"192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
}
}
#
# Setting up override attributes
#
override_attributes
"cdhmgr-chef-setup" => {
"cloudera_mgr_services" => {
"server_host_fqdn_for_agent" => 'admin-node-server.ahmed.com'
}
}
ADMIN
Role.
name "admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]", "recipe[cdhmgr-chef-setup::cloudera_manager]"
#
# Default attributes to setup `/etc/hosts file`
#
default_attributes
"cdhmgr-chef-setup" => {
"etc_hosts_entries" => {
"192.168.13.2" => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01 - Cloudera Hadoop Node" },
"192.168.13.3" => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02 - Cloudera Hadoop Node" },
"192.168.13.4" => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01 - Cloudera Hadoop Node" },
"192.168.13.6" => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
"192.168.13.7" => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
"192.168.13.9" => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
"192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
}
}
#
# Setting up override attributes
#
override_attributes
"cdhmgr-chef-setup" => {
"cloudera_mgr_services" => {
"server_host_fqdn_for_agent" => 'admin-node-server.ahmed.com'
}
}
Custom Role - Adding more granular changes to repo and attributes.
Role for a NON-ADMIN
node, here we are setting up nodes, for cloudera hadoop environment.
name "non_admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]"
#
# Default attributes to setup `/etc/hosts file`
#
default_attributes
"cdhmgr-chef-setup" => {
"etc_hosts_entries" => {
"192.168.13.2" => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01 - Cloudera Hadoop Node" },
"192.168.13.3" => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02 - Cloudera Hadoop Node" },
"192.168.13.4" => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01 - Cloudera Hadoop Node" },
"192.168.13.6" => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
"192.168.13.7" => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
"192.168.13.9" => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
"192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
}
}
#
# Setting up override attributes
#
override_attributes
"mysql_connector" => {
"j" => {
"url" => "http://repo.ahmed.com/mysql-connector/Connector-J/mysql-connector-java-5.1.40.tar.gz"
}
},
"authorization" => {
"sudo" => {
"groups" => [ 'cmadmin' ]
}
}
Role for an ADMIN
node.
name "admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]", "recipe[cdhmgr-chef-setup::cloudera_manager]"
#
# Default attributes to setup `/etc/hosts file`
#
default_attributes
"cdhmgr-chef-setup" => {
"etc_hosts_entries" => {
"192.168.13.2" => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01 - Cloudera Hadoop Node" },
"192.168.13.3" => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02 - Cloudera Hadoop Node" },
"192.168.13.4" => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01 - Cloudera Hadoop Node" },
"192.168.13.6" => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
"192.168.13.7" => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
"192.168.13.9" => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
"192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
}
}
#
# Setting up override attributes
#
override_attributes
"cdhmgr-chef-setup" => {
"cloudera_mgr_services" => {
"common_packages" => [ 'oracle-j2sdk1.7', 'cloudera-manager-daemons', 'cloudera-manager-agent', 'openldap-clients' ],
"common_packages_versions" => [ '1.7.0+update67-1', '5.12.0-1.cm5120.p0.120.el6', '5.12.0-1.cm5120.p0.120.el6', '2.4.40-16.el6' ],
"server_packages" => [ 'cloudera-manager-server', 'openldap-clients' ],
"server_packages_versions" => [ '5.12.0-1.cm5120.p0.120.el6' , '2.4.40-16.el6' ],
"server_host_fqdn_for_agent" => 'localhost'
},
"database_to_install" => 'mysql'
},
"mysql_connector" => {
"j" => {
"url" => "http://repo.ahmed.com/mysql-connector/Connector-J/mysql-connector-java-5.1.40.tar.gz"
}
},
"authorization" => {
"sudo" => {
"groups" => [ 'cmadmin' ]
}
}
"yum_repository" => {
"cm" => {
"description" => "Packages for Cloudera Manager, Version 5, on RedHat or CentOS 6 x86_64",
"baseurl" => "http://repo.ahmed.com/cm5/redhat/6/x86_64/cm/5.12.0/",
"gpgkey" => "http://repo.ahmed.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera"
}
}
Helpful Links
Few of the links which would help in cloudera installations.