Articles

DIY – Getting started with Hadoop and Virtual Machines

In my recent presentation on Polybase in SQL Server 2016, I demonstrated how to get started using Hadoop with SQL Server 2016. In this post I will show you everything you can do to run the demonstration yourself. The first step is to create a virtual machine so that you can run a Linux instance for Hadoop. As I know that installing a virtual machine can be intimidating, this post explains what you need to do, and how to fix a problem you may run into when running a virtual machine.

Creating Your Own Virtual Machine

Previously, spinning up a virtual machine meant purchasing software. No more, as there is now an open source application. In the example shown here, the Linux operating system will be installed, you can put any operating system you want on your virtual machine, provided of course you have a license for it. If you don’t feel comfortable installing non-released versions of code like SQL Server 2016, on your pc, a virtual VirtualBoxmachine is a great way to test it out. You will need to provide your own operating system, but there are trial versions you can use for limited periods of time as well. The open source virtual machine Oracle VM Virtual Box is the only open source version of a virtual machine software. You can download it here. This software is needed prior to installing the Hortonworks Sandbox. Obviously Hortonworks is not the only version of Hadoop available, Cloudera has a Hadoop VM too, which you can download as well. Personally I am not a use fan of the Cloudera Manager, which is why I prefer Hortonworks, but either will work with polybase.

Troubleshooting Why the Virtual Machine Won’t Run

If you are using a Windows operating system, chances are it may be running Hyper-V. Hyper-V is Microsoft’s Virtual Machine. For you to be able to use Hyper-V, you will need to have a virtual machine file which is saved in the VHD format so you can load the image in Hyper-V. Hortonworks provides two versions of their sandbox, one in the VM Ware version and one in the Virtual Box format. For this reason, Hyper-V is not an option as a virtual machine for Hortonworks. If you are running Hyper-V, you can’t run another VM.

The boot configuration data store could not be opened. Access is denied

The error message received when Hyper-V is running is pretty cryptic. What does “The boot configuration data store could not be opened. Access is denied” supposed to mean anyway? When wandering around the internets trying to find an answer this question, there is a lot of <sarcasm>wonderful advice</sarcasm> which states that the only thing to do is to replace your computer entirely as the bios won’t support Virtual Machines, even after you check the bios and find out that yes, your bios does support virtualization. Don’t be deterred. The error can be fixed without new hardware. To resolve this error,  Hyper-V needs to be turned completely off. In case you were wondering, stopping the Hyper-V services won’t fix this. Instead Hyper-V must be disabled at the command prompt, then a reboot is required. Make sure you run the command prompt as administrator, because the command won’t run if you don’t. To run the command prompt as administrator, search for CMD. When the result comes back, right click on the cmd selection and select run as administrator. To turn off Hyper-V, run this command

bcdedit /set hypervisorlaunchtype off

Exit out of the command prompt. Remember the error won’t be resolved until after rebooting. After that the reboot, the boot configuration message won’t appear when running the VM, and you can successfully get started running the Hortonworks Sandbox.

TL;DR – Links

If you just want to get started running a VM and polybase and here are the links needed to make that happen.

Oracle Virtual Box

Hortonworks Sandbox

Cloudera Quickstart

SQL Server 2016 RC1

Microsoft Server 2016 Preview

Once the VM running is running with Hadoop, install SQL Server 2016 so that you can follow along on my next post where I talk about how to use polybase.  If you want to be notified when that post will be available, please subscribe to my blog and you will find out automatically.

Yours Always

Ginger Grant

Data aficionado et SQL Raconteur