Thank you so much for everyone who was able to attend my webinar http://pragmaticworks.com/Training/FreeTraining/ViewWebinar/WebinarID/676 . (If you weren’t able to attend, you can always click on the link for a recording)
It’s always hard to talk about Hadoop as the subject is so broad that there were a lot of things that I had to leave out, so it is fortunate that I have this blog to discuss the topics I wasn’t able to cover. I thought that I would take this time to respond to the questions I received.
Presentation Q & A
Do you need to Learn Java in order to develop with Hadoop?
No. If you wish to develop Hadoop in the cloud with HD Insight, you have the option of developing with .net. If you are working in the Linux environments, which is where a lot of Hadoop is being developed, you will need to learn Java.
Do you know of any courses or sessions available where you can learn about Big Data or Hadoop?
My friend Josh Luedeman is going to be teaching an online class on Big Data next year. If you don’t want to wait that long I recommend checking out a code camp in your area, such as Desert Code Camp where they are offering courses in Azure, or SQL Saturday, especially the BI editions
How do you recommend a person with a BI background in SQL get started in learning Hadoop and where can I get the VMs?
The two ways I recommend for a person with a BI background to get involved with Hadoop is either through a Hortonworks VM or in the Microsoft’s Azure cloud with HD Insight. Hortonworks provides a VM and Microsoft’s environment is hosted on their cloud. As the company that Microsoft partnered with to develop their Hadoop offerings, Hortonworks has very good documentation targeted to people who have more of a Microsoft BI stack background. If you chose to go with HD Insight, there is a lot of really good documentation and video training available as well.
How do you compare Hadoop with the PDW?
While both Hadoop and Microsoft’s PDW, which they now call APS, were both designed to handle big data, but the approaches are wildly different. Microsoft built the APS to handle the larger data requirements of people who have structured data, mostly housed in SQL Server. Hadoop was developed in an open source environment to handle unstructured data.
How can I transfer data into HD Insight?
This is a great question, which I promise to devote an entire blog post to very soon. I’ll give you the Reader’s Digest version here. There are a number of ways you can transfer data into HD Insight. The first step is to transfer the data into the Azure cloud, which you can do via SSIS, with a minor modification of the process I blogged about earlier here. The other methods you could use to transfer data are via secured FTP or by using Powershell. You will need to call the REST API which you use to provision an HDInsight Cluster. There is also a UI you can use within HDInsight to transfer data as well.
I really appreciate the interest in the Webinar.
Yours Always
Ginger Grant
Data aficionado et SQL Raconteur