Things to Come From the Cloudera/Hortonworks Merger

Now that the two Hadoop distribution giants have merged, it is time to call out what will happen to their overlapping software offerings. The following are my predictions:

Ambari is out – replaced by Cloudera Manager.
This is a no-brainer for anyone that has used the two tools. People can rant and rave about open source and freedom all they want, but Cloudera Manager is light-years ahead of Ambari in terms of functionality and features. I mean, Ambari can only deploy a single cluster. CM can deploy multiple clusters. And the two features I personally use the most in my job as a consultant are nowhere to be found in Ambari: Host/Role layout and a non-default Configuration view.

Tez is out – replaced by Spark.
Cloudera has already declared that Spark has replaced MapReduce. There is little reason for Tez to remain as a Hive execution engine when Spark does the same things and can also be used for general computation outside of Hive.

Hive LLAP is out – replaced by Impala.
Similar to Tez, there is no reason to keep interactive query performance tools for Hive around when Impala was designed to do just that. Remember: Hive is for batch and Impala is for exploration.

What do you think? Leave your thoughts in the comments.

Failed Disk Replacement with Navigator Encrypt

Hardware fails.  Especially hard disks.  Your Hadoop cluster will be operating with less capacity until that failed disk is replaced.  Using full disk encryption adds to the replacement trouble.  Here is how to do it without bringing down the entire machine (assuming of course that your disk is hot swappable).

Assumptions:

  • Cloudera Hadoop and/or Cloudera Kafka environment.
  • Cloudera Manager is in use.
  • Cloudera Navigator Encrypt is in use.
  • Physical hardware that will allow for a data disk to be hot swapped without powering down the entire machine. Otherwise you can pretty much skip steps 2 and 4.
  • We are replacing a data disk and not an OS disk.

Read more of this post

strict_variables and the RazorsEdge Puppet Modules

Over the past month I have been adding much needed support for running Puppet with strict_variables = true to all of the RazorsEdge Puppet modules. Thanks to coreone, I finally had a solution that did not require tearing out the legacy global variable support. As much as I think that continued inclusion of global variable support has become painful, I am still committed to keeping it around.

I also managed to get the Rspec testing Ruby gem dependencies configured such that things can still be tested on Ruby 1.8.7, 1.9.3, and 2.x as well as Puppet 2.7, 3.x, and 4.x. Travis-CI is also testing Ruby 2.4 and Puppet 5.x for all of the modules. As of now, only two modules are not passing the Puppet 5 Rspec tests and I hope to get those sorted soon.

https://forge.puppetlabs.com/razorsedge/certmaster
https://forge.puppetlabs.com/razorsedge/cloudera
https://forge.puppetlabs.com/razorsedge/func
https://forge.puppetlabs.com/razorsedge/hp_mcp
https://forge.puppetlabs.com/razorsedge/hp_spp
https://forge.puppetlabs.com/razorsedge/lsb
https://forge.puppetlabs.com/razorsedge/network
https://forge.puppetlabs.com/razorsedge/openlldp
https://forge.puppetlabs.com/razorsedge/openvmtools
https://forge.puppetlabs.com/razorsedge/razorsedge
https://forge.puppetlabs.com/razorsedge/snmp
https://forge.puppetlabs.com/razorsedge/tor
https://forge.puppetlabs.com/razorsedge/vmwaretools

Let me know if you have any feedback!

Hue Load Balancer TLS Errors

If you are configuring the Hue load balancer with Apache httpd 2.4 and TLS certificates, there is a chance that you may end up with errors. The httpd proxy will check the certificates of the target systems and if they do not pass some basic consistency checks, the proxied connection fails. This could happen if you are using self-signed certificates or a private certificate authority. The subject of the target certificate may be incorrect (ie the CommonName or CN may be wrong in the cert) or the subjectAlternativeName (SAN) may not match the subject.

Error messages in the Hue httpd logs in /var/log/hue-httpd/error_log may include:

  • AH01084: pass request body failed to
  • AH00898: Error during SSL Handshake with remote server returned by

Disabling target system certificate checks is a temporary solution. Add the following lines to the Hue load balancer httpd.conf.

SSLProxyCheckPeerCN off
SSLProxyCheckPeerName off

If using Cloudera Manager to configure the Hue High Availability, add the above lines to the Hue Load Balancer Advanced Configuration Snippet (Safety Valve) for httpd.conf.

Hue Load Balancer Advanced Configuration Snippet (Safety Valve) for httpd.conf dialog box in Cloudera Manager
Hue Load Balancer Advanced Configuration Snippet (Safety Valve) for httpd.conf dialog box in Cloudera Manager

Ideally, you would also fix the TLS certificates so that they pass the httpd certificate checks, but this fix will buy you the time to get your certificates requests regenerated and signed.

High availability and load balancing of Hue has been available since Hue version 3.9. The above error has been seen in CDH 5.10.1 on RHEL 7.3 with httpd 2.4.

Update:

June 27 2017
It looks like Cloudera is seeing this issue in CDH 5.11.0.

How To Rebuild Cloudera’s Spark

As a followup to the post How to upgrade Spark on CDH5.5, I will show you how to get a build environment up and running with a CentOS 7 virtual machine running via Vagrant and Virtual Box. This will allow for the quick build or rebuild of Cloudera’s version of Apache Spark from https://github.com/cloudera/spark.

Why?

You may want to rebuild Cloudera’s Spark in the event that you want to add functionality that was not compiled in by default. The Thriftserver and SparkR are two things that Cloudera does not ship (nor support), so if you are looking for these things, these instructions will help.

Using a disposable virtual machine will allow for a repeatable build and will keep your workstation computing environment clean of all the bits that may get installed.

Read more of this post

puppet cloudera module 3.0.0

This is a major release of my Puppet module to deploy Cloudera Manager. The major change is that razorsedge/cloudera now supports the latest releases of dependent modules. razorsedge/cloudera was lagging behind due to the need to support Puppet Enterprise 3.0.1 installations and only recently did those installations finally upgrade.

Notable changes are:

https://forge.puppetlabs.com/razorsedge/cloudera
https://github.com/razorsedge/puppet-cloudera

Let me know if you have any feedback!

puppet cloudera module 2.0.2

This is a minor bugfix release of my Puppet module to deploy Cloudera Manager. When I released the module, I had assumed that the testing I did for the C5 beta2 would be 100% valid for C5 GA.  It turns out that Cloudera shipped a newer version of the Oracle 7 JDK and a symlink that the module creates on RedHat and Suse (/usr/java/default) was pointing at the wrong location.  Upgrading to razorsedge/cloudera 2.0.2 will fix the issue.

Lesson learned: Test, test, and test some more.

Thanks to yuzi-co for reporting the problem.

https://forge.puppetlabs.com/razorsedge/cloudera

https://github.com/razorsedge/puppet-cloudera

Let me know if you have any feedback!