This cookbook [currently as of now] can be used to setup a Cloudera Manager Server (Management Server) running on MySQL/Postgres database. But the intended use for this cookbook [rather a wishlist] is to do more.

An Auto Deployment of a Cloudera Hadoop Cluster using Chef, Python and Cloudera API. This will help create cluster for a development/test/preproduction/production environment on a click of a button.

Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data. Unlike traditional systems, Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware. CDH, Cloudera’s open source platform, is the most popular distribution of Hadoop and related projects in the world

Github - Cookbook Location

This cookbook can used to setup a cloudera hadoop cluster. There are 2 phases to this cookbook,

  • first, preconfiguration which needs to be run on all nodes (edge/admin/master/worker nodes)
  • second, cloudera manager configuration which needs be run on admin node, which will be used to setup the cluster.

Currently cookbook will setup all nodes and cloudera manager with required database in mysql/postgres as per requirement. [by default mysql is installed as the database.]

Phase 1 - Perparing Nodes for Cloudera Hadoop Environment.

Phase 1 includes. include_recipe 'cdhmgr-chef-setup::pre_config'

  1. selinux updates.
  2. iptables to be disabled.
  3. Create users, currently we will be creating sysadmin on all the nodes. (cmadmin user will be created in cloudera setup cookbook)
  4. Update sudoers list.
  5. sysctl update.
  6. /etc/hosts file update, based on the nodes in the cluster. (currently we are using sample IPs)
  7. Security setup, using krb5 and sssd configuration. [Currently commented as we do not have AD setup yet]

Details about Phase 1.

Setting selinux to permissive.

##
# Setting up selinux to permissive mode
##
include_recipe 'cdhmgr-chef-setup::pre_config_selinux_update'

Disable iptables.

##
# Iptables update - currently we are disabling
##
include_recipe 'cdhmgr-chef-setup::pre_config_iptables_setup'

Configuring security, setting up krb5.conf and sssd for ldap logins (currently by default disabled). Update the attributes as per requirement to use this.

##
# Setting up krb5 and sssd
##
#include_recipe 'cdhmgr-chef-setup::pre_config_security_setup'

Creating cmadmin user on all nodes, and has password less access to all nodes with sudo permissions.

##
# Creating a user `sysadmin` with sudo permission.
##
include_recipe 'cdhmgr-chef-setup::pre_config_user_setup'

Setting up sysctl for swapiness and basic setting for hadoop.

##
# Setting up `sysctl`
##
include_recipe 'cdhmgr-chef-setup::pre_config_sysctl_config'

Setting up disable_transparent_hugepage_defrag and disable_transparent_hugepage_defrag_hugepage_enables in rc.local file. Install rpcbind, telnet.

##
# Setting up hadoop related config for each server.
##

include_recipe 'cdhmgr-chef-setup::pre_config_hadoop_commons_config'

Setting /etc/security/limits.d/hadoop-cluster.conf file for limits.

##
# Setting ulimit for the cluster
##
include_recipe 'cdhmgr-chef-setup::pre_config_hadoop_ulimits'

Setting up connector / client based on database selections.

##
# Mysql Connector for the setup.
##
include_recipe 'cdhmgr-chef-setup::pre_config_database_connectors'

Finally setting up /etc/hosts file based on the role information.

##
# Setting up `hostfile_setup`
##
include_recipe 'cdhmgr-chef-setup::pre_config_hostfile_setup'

Phase 2 - Cloudera Manager Installation

Phase 2 includes. include_recipe 'cdhmgr-chef-setup::cloudera_manager' Setting up Cloudera manager, with mysql or postgresql [defaults to mysql]

Databases created. [NOTE: Check passwords in the cookbook]

------------------------------------------------------------------------------
Services                            dbname      user        password
-----------------------------------------------------------------------------
Cloudera Manager Database           cmdb        cmadmin     cmadmin_password    
Activity Monitor                    amon        amon        amon_password
Reports Manager                     rman        rman        rman_password
Hive Metastore Server               metastore   hive        hive_password
Sentry Server                       sentry      sentry      sentry_password
Cloudera Navigator Audit Server     nav         nav         nav_password
Cloudera Navigator Metadata Server  navms       navms       navms_password
Hue Database                        hue         hue         hue_password
Oozie                               oozie       oozie       oozie_password

Setting up database (mysql / postgres) based on role information.

#
# Setting up mysql database for cloudera manager
#
include_recipe 'cdhmgr-chef-setup::cloudera_manager_database_setup'

Setting up Cloudera Manager

#
# Installing Cloudera Manager
#
include_recipe 'cdhmgr-chef-setup::cloudera_manager_setup'

Setting up python api packages, which will be used later on when we are deploying cluster using cloudera api.

#
# Setting up python api requirements
#
include_recipe 'cdhmgr-chef-setup::cloudera_manager_python_api'

What’s cooking in the cookbook ?

This cookbook will be used in the hadoop cluster nodes before we start setting it up with cloudera. We will have specific cookbook for each of the nodes based on roles.

Example : Below are the different types of nodes and will have specifics roles assigned (roles will be created once the cookbooks are ready)

  1. Edge nodes. (Install default recipe pre_config)
  2. Management nodes. (Install default recipe pre_config)
  3. Admin nodes. (Install recipe pre_config and cloudera_manager)
  4. Slave nodes (datanodes). (Install default recipe pre_config)

Sample Role based on the setup.

NOTE: If we have internet connection on the nodes (may be through a proxy) then we need to only set the host information - rest of the role config below can be ignored safely

Basic Roles - which can be used to setup the environment

Only difference in the below basic roles is the run_list assigned to it.

  • NON-ADMIN has run_list "recipe[cdhmgr-chef-setup::pre_config]"
  • ADMIN has run_list "recipe[cdhmgr-chef-setup::pre_config]", "recipe[cdhmgr-chef-setup::cloudera_manager]"

NON-ADMIN Nodes.

name "non_admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]"

#
# Default attributes to setup `/etc/hosts file`
#
default_attributes "cdhmgr-chef-setup" => {
    "etc_hosts_entries" => {
        "192.168.13.2"  => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01   - Cloudera Hadoop Node" },
        "192.168.13.3"  => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02   - Cloudera Hadoop Node" },
        "192.168.13.4"  => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01  - Cloudera Hadoop Node" },
        "192.168.13.6"  => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
        "192.168.13.7"  => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
        "192.168.13.9"  => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
        "192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
    }
}

#
# Setting up override attributes
#

override_attributes
    "cdhmgr-chef-setup" => {
        "cloudera_mgr_services" => {
            "server_host_fqdn_for_agent" => 'admin-node-server.ahmed.com'
        }
    }

ADMIN Role.

name "admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]", "recipe[cdhmgr-chef-setup::cloudera_manager]"


#
# Default attributes to setup `/etc/hosts file`
#
default_attributes
    "cdhmgr-chef-setup" => {
        "etc_hosts_entries" => {
            "192.168.13.2"  => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01   - Cloudera Hadoop Node" },
            "192.168.13.3"  => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02   - Cloudera Hadoop Node" },
            "192.168.13.4"  => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01  - Cloudera Hadoop Node" },
            "192.168.13.6"  => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
            "192.168.13.7"  => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
            "192.168.13.9"  => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
            "192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
        }
    }


#
# Setting up override attributes
#

override_attributes
    "cdhmgr-chef-setup" => {
        "cloudera_mgr_services" => {
            "server_host_fqdn_for_agent" => 'admin-node-server.ahmed.com'
        }
    }

Custom Role - Adding more granular changes to repo and attributes.

Role for a NON-ADMIN node, here we are setting up nodes, for cloudera hadoop environment.

name "non_admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]"

#
# Default attributes to setup `/etc/hosts file`
#
default_attributes
    "cdhmgr-chef-setup" => {
        "etc_hosts_entries" => {
            "192.168.13.2"  => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01   - Cloudera Hadoop Node" },
            "192.168.13.3"  => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02   - Cloudera Hadoop Node" },
            "192.168.13.4"  => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01  - Cloudera Hadoop Node" },
            "192.168.13.6"  => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
            "192.168.13.7"  => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
            "192.168.13.9"  => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
            "192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
        }
    }

#
# Setting up override attributes
#

override_attributes
    "mysql_connector" => {
        "j" => {
            "url" => "http://repo.ahmed.com/mysql-connector/Connector-J/mysql-connector-java-5.1.40.tar.gz"
        }
    },

    "authorization" => {
        "sudo" => {
            "groups" => [ 'cmadmin' ]
        }
    }

Role for an ADMIN node.

name "admin_roles_cdh_hadoop_cluster"
description "Pre Configuration for Cloudera Hadoop Cluster - This will be for all nodes in the cluster"
run_list "recipe[cdhmgr-chef-setup::pre_config]", "recipe[cdhmgr-chef-setup::cloudera_manager]"

#
# Default attributes to setup `/etc/hosts file`
#
default_attributes
    "cdhmgr-chef-setup" => {
        "etc_hosts_entries" => {
            "192.168.13.2"  => { "hostname" => "ahmedservere001.ahmed.com" , "aliases" => [ "ahmedservere001" ] , "comment" => "Edge 01   - Cloudera Hadoop Node" },
            "192.168.13.3"  => { "hostname" => "ahmedservere002.ahmed.com" , "aliases" => [ "ahmedservere002" ] , "comment" => "Edge 02   - Cloudera Hadoop Node" },
            "192.168.13.4"  => { "hostname" => "ahmedservera001.ahmed.com" , "aliases" => [ "ahmedservera001" ] , "comment" => "Admin 01  - Cloudera Hadoop Node" },
            "192.168.13.6"  => { "hostname" => "ahmedserverm001.ahmed.com" , "aliases" => [ "ahmedserverm001" ] , "comment" => "Master 01 - Cloudera Hadoop Node" },
            "192.168.13.7"  => { "hostname" => "ahmedserverm002.ahmed.com" , "aliases" => [ "ahmedserverm002" ] , "comment" => "Master 02 - Cloudera Hadoop Node" },
            "192.168.13.9"  => { "hostname" => "ahmedserverw001.ahmed.com" , "aliases" => [ "ahmedserverw001" ] , "comment" => "Worker 01 - Cloudera Hadoop Node" },
            "192.168.13.10" => { "hostname" => "ahmedserverw002.ahmed.com" , "aliases" => [ "ahmedserverw002" ] , "comment" => "Worker 02 - Cloudera Hadoop Node" }
        }
    }

#
# Setting up override attributes
#

override_attributes
    "cdhmgr-chef-setup" => {
        "cloudera_mgr_services" => {
            "common_packages" => [ 'oracle-j2sdk1.7', 'cloudera-manager-daemons', 'cloudera-manager-agent', 'openldap-clients' ],
            "common_packages_versions" => [ '1.7.0+update67-1', '5.12.0-1.cm5120.p0.120.el6', '5.12.0-1.cm5120.p0.120.el6', '2.4.40-16.el6' ],
            "server_packages" => [ 'cloudera-manager-server', 'openldap-clients' ],
            "server_packages_versions" => [ '5.12.0-1.cm5120.p0.120.el6' , '2.4.40-16.el6' ],
            "server_host_fqdn_for_agent" => 'localhost'
        },

        "database_to_install" => 'mysql'
    },

    "mysql_connector" => {
        "j" => {
            "url" => "http://repo.ahmed.com/mysql-connector/Connector-J/mysql-connector-java-5.1.40.tar.gz"
        }
    },

    "authorization" => {
        "sudo" => {
            "groups" => [ 'cmadmin' ]
        }
    }
    "yum_repository" => {
        "cm" => {
           "description" => "Packages for Cloudera Manager, Version 5, on RedHat or CentOS 6 x86_64",
            "baseurl" => "http://repo.ahmed.com/cm5/redhat/6/x86_64/cm/5.12.0/",
            "gpgkey" => "http://repo.ahmed.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera"
        }
    }

Few of the links which would help in cloudera installations.