BigData Journey: 2019

Are we missing ACID transactions in big data technologies.

We dont have to as - Databricks, founded by the original creators of Apache Spark™ has recently open sourced yet another exciting technology: Delta Lake

What this Delta Lake is ?

Data lakes typically have multiple data pipelines reading and writing data concurrently, and data engineers have to go through a tedious process to ensure data integrity, due to the lack of transactions. Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level.

It is a storage layer that brings reliability to data lakes and provides:

• ACID transactions

• scalable metadata handling

• unified streaming and batch data processing.

Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

What Existing Data Lake can be compatible with this -

Existing Data Lake can be HDFS , Azure Data Lake Storage(WASB) ,Amazon S3 etc .

Soon we have FULL DML support - Delta Lake supports standard DML including UPDATE, DELETE and MERGE INTO providing developers more controls to manage their big datasets.

One more - Apache Spark™ is forecast to grow at a 67% CAGR from 2019 to 2022.

BigData Journey

Wednesday, 10 July 2019

Making Apache Spark™ Better with Delta Lake