What is HDInsight in Azure?
Big Data has exploded over the last few years into a large practice that is driving new business and digital transformation. Incredibly large companies have utilized Big Data concepts to produce amazing predictive analytics and services based on the data that they have collected. One tool that is being used to store large amounts of data and run analytics is Microsoft HDInsight in Azure.
What exactly makes up HDInsight?
It is important to understand that HDInsight is not a single application but a collection of applications that perform different functions of Big Data. Big Data itself is not just storing large amounts of data either but bringing in the data into a Big Data store easily, storing it correctly, and utilizing computing resources to run analysis against the data. HDInsight brings all of this together in one package.
How does Big Data work then?
As mentioned before, Big Data starts with data coming into your data store and doesn’t really stop. It is a continuation of receiving new data, storing, and consuming data to gather insights on your data to make educated business decisions. There are 5 steps in Big Data:
- Providing a data set
- Storing data
- Reduce data
- Streaming data
- Analyzing data
HDInsight has different applications to perform these steps.
Let’s take a look at HDInsight in Azure
As you can see, HDInsight is just another Azure application in the cloud that can be spun up without installation. In just a few minutes plus spin up time, you created multiple processing nodes, set up a data platform, configured storage and storage accounts, and set up security! On prem, this could take months to complete.
Clicking on HDInsight will take you to the setup for your HDInsight cluster. Each red asterisk is required. Azure will run the Cluster name against all other names within Azure to prevent duplicates.
The HDInsight cluster type has multiple different options available depending on your Big Data needs.
The Storage Account setup page requires you to select the type of storage you want to utilize including a sub page for setting up the cluster processing nodes you would like to use (Ensure you are using Custom (size, settings, apps) setup type). Note: the more processing nodes you select, the higher your Azure bill will be and it can increase exponentially.
The Head node is the primary node that controls the processing nodes with running services. The Worker nodes perform data processing.
After setup, you can spin up your cluster which will take approximately 10 – 20 minutes to complete and you are now ready to start leveraging your first Big Data solution.