puppet cloudera module 3.0.0

This is a major release of my Puppet module to deploy Cloudera Manager. The major change is that razorsedge/cloudera now supports the latest releases of dependent modules. razorsedge/cloudera was lagging behind due to the need to support Puppet Enterprise 3.0.1 installations and only recently did those installations finally upgrade.

Notable changes are:

https://forge.puppetlabs.com/razorsedge/cloudera
https://github.com/razorsedge/puppet-cloudera

Let me know if you have any feedback!

puppet cloudera module 2.0.2

This is a minor bugfix release of my Puppet module to deploy Cloudera Manager. When I released the module, I had assumed that the testing I did for the C5 beta2 would be 100% valid for C5 GA.  It turns out that Cloudera shipped a newer version of the Oracle 7 JDK and a symlink that the module creates on RedHat and Suse (/usr/java/default) was pointing at the wrong location.  Upgrading to razorsedge/cloudera 2.0.2 will fix the issue.

Lesson learned: Test, test, and test some more.

Thanks to yuzi-co for reporting the problem.

https://forge.puppetlabs.com/razorsedge/cloudera

https://github.com/razorsedge/puppet-cloudera

Let me know if you have any feedback!

puppet cloudera module 2.0.1

This is a major release of my Puppet module to deploy Cloudera Manager. The major change is that razorsedge/cloudera now supports Cloudera’s latest release, Cloudera Enterprise 5, which adds support for Cloudera Manager 5 and Cloudera’s Distribution of Apache Hadoop (CDH) 5. Additionally, this module and it’s deployment via Puppet Enterprise 3.2 has been certified by Cloudera to be tested and validated to work with Cloudera Enterprise 5.

Cloudera Certified This module is certified on Cloudera 5.

Other changes are:

  • All interaction with the cloudera module can now be done through the main ::cloudera class, including installation of the CM server. This means you can simply toggle the options in ::cloudera to have full functionality of the module.
  • Official operating system support for Debian 7.
  • Installation of Oracle JDK 7.
  • Recommended tuning of the vm.swappiness kernel parameter.
  • Installation of native LZO libraries when the parameter install_lzo => true is selected, even when installing via parcels.
  • Conversion of the README.md file to the Puppet Labs recommended README.markdown formatting.  This has dramatically improved the presentation of the things one needs to know about the module in order to quickly become productive.
  • Taking advantage of the new module metadata to add compatability information to the module page on the Puppet Forge.

If you have not seen the previous changes in version 1.0.1, here is a recap:

  • Allow for use of an external Java module. Not everyone will want to stick with the older version Oracle JDK that Cloudera ships in their software repositories. If you have a module that provides the Oracle JDK and sets $JAVA_HOME in the environment, then just set install_java => false in Class['cloudera'] and make sure the JDK is installed before calling Class['cloudera'].
  • Integrated installation of the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files. Set the parameter install_jce => true in Class['cloudera'] .

Deprecation Warnings

  • The class parameters and variables yumserver and yumpath have been renamed to reposerver and repopath respectively. This makes the name more generic as it applies to APT and Zypprepo as well as YUM package repositories.
  • The use_gplextras parameter has been renamed to install_lzo.

One note of mention is that this module does not support upgrading from CDH4 to CDH5 packages, including Impala, Search, and GPL Extras.

https://forge.puppetlabs.com/razorsedge/cloudera

https://github.com/razorsedge/puppet-cloudera

Let me know if you have any feedback!

Doing DevOps with Cloudera Manager

It looks like my work with Puppet has been picked up by the Cloudera Blog. James Ruddy blogged about utilizing my razorsedge/cloudera module in his article Deploy Cloudera Manager with Puppet. Cloudera provided a link to his article under the umbrella of automating the deployment of Cloudera Manager itself.

I think it is awesome that other folks are utilizing code I have worked on. I mainly write this stuff to scratch an itch, and I am happy when other people get some use out of it.

Video of my Hadoop Summit 2012 Presentation

It looks like the video of my presentation at the 2012 Hadoop Summit made it online. It is too bad the Summit website doese not link to it. This is a link to my presentation as well.

I will be speaking at Hadoop Summit 2012

After an initial “on-hold” status, it looks like I will get to speak at the Hadoop Summit 2012 in San Jose, CA. This will be my first time speaking at a conference and I am really looking forward to it. Below is the abstract I submitted:

Hadoop Operations: Starting Out Small / So Your Cluster Isn`t Yahoo-sized (yet)

Everyone hears about large clusters with thousands of machines and petabytes of storage yet not everyone starts their first Hadoop deployment with dozens of cabinets of equipment. What do you do when you don`t have quite as large of a deployment? What decisions should you make now and which should you postpone for later? This session is for SysAdmins that have not yet or just recently jumped into the Hadoop fray. You will be presented with the knowledge gained from two years of operational experience at a (currently) small Hadoop site. We will discuss things that are initially important for a small (10-100 node) cluster and what happens when you outgrow your first deployment.