AWS CloudShell and Terraform

Amazon has a new service that can make using Hashicorp Terraform even easier. From the AWS service page:

AWS CloudShell is a browser-based shell that makes it easy to securely manage, explore, and interact with your AWS resources. CloudShell is pre-authenticated with your console credentials. Common development and operations tools are pre-installed, so no local installation or configuration is required.

Unfortunately, Terraform is not installed by default. But we can fix that very easily!

mkdir ~/bin
mv terraform ~/bin

Or, using my favorite, tfenv:

git clone ~/.tfenv
mkdir ~/bin
ln -s ~/.tfenv/bin/* ~/bin/
tfenv install

And that is it! You do not need to use Access Key / Secret Access Key as the CloudShell is running as your AWS Management Console user/role. Also, beware of installing code outside of $HOME as it will not persist across sessions.

My New FTTH Installation

Following up from my previous post, I finally got my new “InternetHog” Internet connection from Citynet the other day and it sure is nice.

The outside installers ran an aerial drop to the house the week prior to the inside installers arriving to finish the work.  The subscriber drop cable was brought in from two poles up the line where the fibre tap was located.  It was run to the power mast and then down to where a new junction box would be installed.  When the inside installers arrived, they worked on getting the junction box installed, indoor fibre run to the box, and spliced to the outside fibre.

Citynet Junction Box

Citynet junction box on the electrical conduit. Yellow “wire” is the indoor fibre.

Citynet Junction Box

Inside view of the junction box showing the coiled fibre and the splice between the outdoor line (black/white/blue) and the indoor fibre (yellow).

The inside installer performed what is called a fusion splice which entails stripping away the outer protective layers of the fibre, cutting (or cleaving) the glass, and then fusing the two ends together.  There was a fancy little electronic, tripod-mounted splicer that took care of all the difficult parts of lining the two ends up just right and melting them together.  Then a metal bar and a sleeve were fitted over the splice to strengthen it and protect it and set into the light-blueish holder in the middle of the junction box.

Citynet Junction Box

Close up of the junction box. The outdoor line holds two fiberglass strength members along with a clear tube containing the blue single-mode fibre. This is then fusion-spliced to the yellow indoor fibre.

Once the subscriber drop was spliced to the indoor fibre, it was connected to my new optical network terminal (ONT).  Citynet provided an Adtran 411 Micro GPON Indoor ONT.  This is similar to a DOCSIS cable modem or DSL modem.  It translates the electrical ethernet signals coming from my router into light pulses and sends them out onto the fibre network.  You can find all sorts of pictures and facts at the unrelated Sonic Adtran 411 support page.  The ONT was all I received, since I opted for the Internet-only service.  I understand that if you order TV and phone, you get another (or a different) device that will break out those services.  I also did not receive a ONT battery backup as I already have my own uninterruptible power supply (UPS).

Citynet Adtran 411 ONT

Citynet’s Adtran 411 optical network terminal.

I am testing out the full speed 1000/1000 Prime package.  Below are my speed test results.  I can’t really complain with a 1 millisecond ping, 929.64 Mbps download, and 938.16 Mbps upload speed. 🙂

Citynet 1000/1000 Speed Test Results

Speed test result to the local Citynet SPEEDTEST server.

Swap and Hadoop

TL;DR : Turn off swap completely. A properly designed and tuned Hadoop system will not need it.

So there you are, minding your Hadoop cluster, when alerts start to come in: The hosts are swapping! Oh, the horror! The end is nigh! Why do we even have this horrible swap space?

But all is not lost. Read on for a short history lesson and a mind-boggling revelation. Read more of this post

I’m Getting FTTH (Yay)

I have finally decided to bite the bullet and get Fibre to the Home (FTTH) installed at the house.  CityNet installed fibre to the poles around my community last year and is offering gigabit Internet access speeds.  I would have signed up as soon as it was available, but I have had to upgrade my router to something that was capable of pushing gigabit ethernet.  That project is finally completed (and it improved my Spectrum speeds quite a bit as well).

I have been a DOCSIS cable subscriber for the last 20 years across three states.  Long ago, I worked for Adelphia Communications where I supported pre-DOCSIS and DOCSIS 1.0 deployments, learning all about how CMTSs and RF worked.  Now I will get to learn a thing or two about how fibre deployments operate.  ONTs and GPON and the like are all new concepts and it is all very intriguing to me.

Slides and Video from My Talk at PDC 2019

I got a chance to speak to Hadoop folks at this years Pune Data Conference held in Pune, India.

My talk is titled Admins: Smoke Test Your Hadoop Cluster! This is the abstract:

Software smoke testing is a preliminary level of testing. It makes certain that all of the primary components of a system are functioning correctly. For example, when installing a new secured Hadoop cluster, running a series of quick tests to make sure that things like HDFS and MapReduce are operational can save a lot of headache before enabling Kerberos. Smoke tests can also save you time and embarrassment by making sure that things work before you turn the cluster over to your customer.

In this talk, Michael Arnold will explain the utility of testing Hadoop components after cluster builds and software upgrades. Michael will present code examples that you can use to confirm functionality of Spark, Kudu, HBase, Kafka, MapReduce, etc on your cluster.

This is the link to the slide presentation and video.

Things to Come From the Cloudera/Hortonworks Merger

Now that the two Hadoop distribution giants have merged, it is time to call out what will happen to their overlapping software offerings. The following are my predictions:

Ambari is out – replaced by Cloudera Manager.
This is a no-brainer for anyone that has used the two tools. People can rant and rave about open source and freedom all they want, but Cloudera Manager is light-years ahead of Ambari in terms of functionality and features. I mean, Ambari can only deploy a single cluster. CM can deploy multiple clusters. And the two features I personally use the most in my job as a consultant are nowhere to be found in Ambari: Host/Role layout and a non-default Configuration view.

Tez is out – replaced by Spark.
Cloudera has already declared that Spark has replaced MapReduce. There is little reason for Tez to remain as a Hive execution engine when Spark does the same things and can also be used for general computation outside of Hive.

Hive LLAP is out – replaced by Impala.
Similar to Tez, there is no reason to keep interactive query performance tools for Hive around when Impala was designed to do just that. Remember: Hive is for batch and Impala is for exploration.

What do you think? Leave your thoughts in the comments.

Hadoop Cluster Sizes

A few years ago, I presented Hadoop Operations: Starting Out Small / So Your Cluster Isn’t Yahoo-sized (yet) at a conference. It included a definition of Hadoop cluster sizes. I am posting those words here to ease future references to that definition.

Question: What is a tiny/small/medium/large [Hadoop] cluster?


  • Tiny: 1-9 nodes
  • Small: 10-99 nodes
  • Medium: 100-999 nodes
  • Large: 1000+ nodes
  • Yahoo-sized: 4000 nodes

Self-Signed CA … Whaaat?

<Begin documentation rant…>

Can we all please just stop this “Self-signed CA” nonsense?

Every   single  root certificate authority on the planet (and all known dimensions) is, by definition… *self signed*.

What you might want to say instead is “Public CA” vs “Private CA”.

<End documentation rant.>

Thanks Apple. (Not Really)

Thanks Apple, for making recent products that don’t do what I expect them to do.

For the need to buy a bunch of dongles to get all my existing peripherals to work with your laptop.

For one of said dongles (the Apple USB-C Digital AV Multiport Adapter) being unable to pass through enough power to charge my laptop or for use to pass data.

Use the USB-C port of this adapter for charging your Mac, not for data transfer or video.


This port delivers a maximum of 60W power, suitable for MacBook models and 13-inch MacBook Pro models. For the best charging performance on 15-inch MacBook Pro models, connect the power supply directly to your Mac, not through the adapter.

For forcing me to buy a dongle to use my headphones and charge my Apple phone at the same time.

For not providing the same port to plug said headphones into both the phone and the laptop.  I mean, make up your minds.  Is it lightning or not?  (And don’t tell me bluetooth is the future.  I guarantee you it is not for me.)  Thank you to Belkin for providing a solution.

For making the touchpad on your laptop so big that I lose my finger resting points and invariably palm click or double touch to the point of frustration.

For thinking its a good idea to reuse a connector plug format to push different protocols.  Is it Thunderbolt? Or is it mini Display Port?  Is it Thunderbolt 3?  Or is it USB-C?  Is that cable certified for the faster speeds?  Does it have the fancy logo?

I held on to my iPhone 5S for as long as I could, but in the end it just became too slow for my needs.  I held on to my 2015 MacBook Pro for as long as it let me, but it died a sad death last week due to battery expansion and loss of boot disk.

I want to remain an Apple hardware fan (partially because PC/Linux leaves so much to be desired) but it is getting harder every year to remain happy.

Failed Disk Replacement with Navigator Encrypt

Hardware fails.  Especially hard disks.  Your Hadoop cluster will be operating with less capacity until that failed disk is replaced.  Using full disk encryption adds to the replacement trouble.  Here is how to do it without bringing down the entire machine (assuming of course that your disk is hot swappable).


  • Cloudera Hadoop and/or Cloudera Kafka environment.
  • Cloudera Manager is in use.
  • Cloudera Navigator Encrypt is in use.
  • Physical hardware that will allow for a data disk to be hot swapped without powering down the entire machine. Otherwise you can pretty much skip steps 2 and 4.
  • We are replacing a data disk and not an OS disk.

Read more of this post