Data Lake Vs Data Warehouse
We know that data is the business asset for any organisation which always keeps secure and accessible to business users whenever it required.
In the current era, two techniques are very popular to store the data for the business insights. Hence, we are going to differentiate them based on some technical terms.
One is Data Warehouse which is a highly structured store of the data that is requiring a significant amount of discovery, planning, data modeling, and development work before the data becomes available for analysis by the business users.
The second one is a Data Lake which is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed. We can say that Data Lake is a more organic store of data without regard for the perceived value or structure of the data.
Data Warehouses compared to Data Lakes - Depending on the business requirements, a typical organization will require both a data warehouse and a data lake as they serve different needs, and use cases.
Characteristics
|
Data Warehouse
|
Data Lake
|
Type of data stored
|
Structured data (most often in columns & rows in a relational database) from transactional systems, operational databases, and line of business applications
|
Any type of data structure,
any format, including structured, semi-structured, and unstructured data from IoT devices, websites, mobile apps, social media, and corporate applications |
Best way to ingest data
|
Batch processes
|
Streaming, micro-batch, or
batch processes |
Schema
|
Designed prior to the DW implementation (schema-on-write)
|
define the structure of the data at the time of analysis, referred to as schema on reading (schema-on-read)
|
Typical load pattern
|
ETL - (Extract, Transform, then Load)
|
ELT - (Extract, Load, and Transform at the time the data is loaded)
|
Price/Performance
|
Fastest query results using higher cost storage
|
Query results getting faster using low-cost storage
|
Data Quality
|
Highly curated data that serves as the central version of the truth
|
Any data that may or may not be curated (ie. raw data)
|
Users
|
Business analysts
|
Data scientists, Data developers, and Business analysts (using curated data)
|
Analytics pattern
|
Determine structure, acquire data, then analyze it; iterate back to change the structure as needed.
Batch reporting, BI and visualizations
|
Acquire data, analyze it, then iterate to determine its final structured form.
Machine Learning, Predictive analytics, data discovery and profiling
|
In contrast, the default expectation for a data lake is to acquire all of the data and retain all of the data.
Thanks for the wonderful blog
ReplyDeleteData lakes and data warehouses are two of the most well-known and popular strategies for storing and managing data. Both of them allow organizations to store large amounts of data in a variety of different formats. But what they do with that data once they have it is very different. While data warehouses are optimized for data analysis, data lakes are optimized for data storage and transformation.
ReplyDelete