The Data Massagist The Data Massagist by Pablo Junco

Innovate faster by migrating from Hadoop to Azure Databricks

March 15, 2021 · 6 min read
Databricks Migration
This content is mirrored from LinkedIn and may contain formatting inconsistencies. For the full experience — including comments and reactions — read the original on LinkedIn.

Innovate faster by migrating from Hadoop to Azure Databricks

Created on 2021-03-15 22:54

Published on 2021-03-15 23:28

Organizations are continually investing in Big Data, Analytics, and Artificial Intelligence (AI) to find new ways to reduce costs and accelerate decision-making. However, when organizations had centered their entire data and AI strategy around their on-premises architectures, they spend 50% of their time working on plumbing activities with no business value, such as upgrades, badging systems, and maintenance. 

After watching Tony Gilbert's session (VP of Sales at Databricks) during last Microsoft Ignite, I decided to write about Hadoop and the need to migrate to innovate faster. 

What is Hadoop?

Apache Hadoop is open-source software for reliable, scalable, distributed computing that can handle large datasets. The most popular and prevalent Hadoop distributions that exist today are Cloudera, MapR, and Hortonworks. 

In the case of Microsoft, Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. Organizations can use the most popular open-source frameworks such as Hadoop, Apache Spark, Hive, LLAP, Kafka, Storm, R, and more.

8 Reasons to migrate you Hadoop on-premise

Many organizations with Hadoop running on their data centers are considering a modernization project because they didn't realize the total value of Hadoop, and still need to deal with issues such as:

  • Administrative complexity
  • Inability to scale their fixed infrastructure cost-effectively
  • Lack of a shared collaborative environment for data engineers, data scientists, and developers

IT Leaders are all constantly in the dilemma of whether they have to renew or migrate their Hadoop deployments. According to my team, they are facing at least three of the following eight reasons when deciding to move their Hadoop on-premise to the cloud:

  1. Cost of ownership (TCO)
  2. License expiration
  3. End of support
  4. Performance and auto-scaling
  5. Better virtual machine types
  6. High availability and disaster recovery
  7. Compliance
  8. The need to accelerate the organization's innovation agenda 

For example, when an organization moves to ingest much more data and wants to implement new use cases with their data, they need more flexibility to go beyond their current clusters. When the compute and storage collapse together on the data nodes in the Hadoop on-premises architecture, their IT team cannot add more to the same data and cannot do things faster. Consequently, they need to use more hardware and then scale out the entire cluster. A similar scenario also applies when a provision for peak capacity is required because IT needs to provide a large cluster. 

Cloud providers such as Microsoft can help organizations separate compute to allow their businesses to drive down costs and increase agility reacting to the business's demands faster. 

Which technology should IT leaders explore for their migration?

We can consider Azure HDInsight for a rapid migration to the cloud for various use cases using Hive and Spark without code changes. On the other hand, if we want to modernize a deployment with mostly Spark workloads, the recommendation is Azure Databricks

But, What is Azure Databricks? Azure Databricks is a fast, easy, and collaborative Apache Spark-based service that simplifies building big data and AI solutions. It's an analytics platform optimized for Azure and available for organizations as a first-party managed service.

What makes Azure Databricks unique and differentiated in the market is the joint engineering partnership between Databricks and Microsoft that brings the best of Databricks and Azure. What Microsoft's customers can use today is the result of millions of dollars invested between both companies on research and development, building integration, making it enterprise secure and compliant. Something that GCP and AWS cannot offer to their customers. 

Azure Databricks features out-of-the-box Azure Active Directory integration, native data connectors, integrated billing, and compliance (i.e., ISO27001, ISO27018, HIPAA, & SOC2 Type 2). For example, developers can use the Azure Active Directory Authentication Library (ADAL) to acquire Azure Active Directory (Azure AD) access tokens programmatically. 

Azure Databricks also integrates with Azure Synapse to bring analytics, business intelligence (BI), and data science together in Microsoft's Modern Data Warehouse solution architecture. The high-performance connector between Azure Databricks and Azure Synapse enables fast data transfer between the services, including support for streaming data.

Incentives for on-premises customers to migrate to Azure Databricks 

If you are a customer from Microsoft, then you can save up to 52% when migrating your Hadoop On-premise to Azure Databricks.

  • You can get up to 37% savings over pay-as-you-go Databricks Units (DBUs) prices when your organization pre-purchase it as Databricks Commit Units (DBCU) for either one or three years. Your organization will have access to support directly from Azure Portal without additional contracts with Databricks or your partner of choice.  
  • If your organization is considering a migration, you should know that Microsoft is offering an extra 25% discount for a three-year pre-purchase plan larger than 150,000 DBCUs and a 15% discount for a one-year pre-purchase plan larger than 100,000 DBCUs. The offer is valid until June 2021. 

Closing

Thousands of organizations worldwide — including Daimler, GSK, Starbucks, Credit Suisse, City of Spokane, Komatsu Mining, HSBC, and AstraZeneca — rely on Azure Databricks for massive-scale data engineering, collaborative data science, full-lifecycle machine learning, and business analytics. Together, Databricks and Microsoft enable joint customers to build their analytics on Azure with the best of both

Learn more about migration to Azure Databricks and the offer by watching this on-demand session by Arsalan Tavakoli (SVP of Field Engineering, Databricks) and Priya Vijayarajendran (VP, Data & AI, Microsoft). In the video you will watch Brian White (Manager - Analytics at Komatsu Mining Corp) explaining how they are are using Azure Databricks to create predictive models for preventative maintenance and efficient mining.

Also, I want to invite you to learn to know more about how the City of Spokane reduced their ELT/ETL total cost of ownership (TCO) by 50% while improving data quality with DQLabs and Azure Databricks. 

Are you interested? Contact your Account Executive, Account Technology Strategist (ATS) or Data & AI Specialist to know more and see if your organization qualifies for a free migration evaluation. The evaluation includes assessing current tools, systems, and processes and a two-day workshop to identify value drivers, prioritize use cases, and define the future state architecture. 

----

References:

View on LinkedIn ← Back to Articles

Let’s talk!
Let's have cafecito together.

If you’re a Chief Data Officer (CDO), a data leader, or simply someone who believes in the power of preparing data for AI—you’re already a Data Massagist.

Whether you have an idea, a challenge, or just want a fresh perspective, let’s connect. I’m always open to collaborating, learning, and helping others move forward.

You can find me on LinkedIn (feel free to connect and send me a message), or book time with me directly for a virtual coffee (or "cafecito").