Data Warehouse

Data Warehouse


The term Data Warehouse was introduced by Bill Inmon in 1990. According to Inmon, A data warehouse (OLAP) is a database, which is kept separate from the organization's operational databases (OLTP) and having subject oriented integrated time-variant, and non-volatile collection of data. The primary goal of a data warehouse is to provide a generalized and consolidated data in multi-dimensional view to enabling business users to make better decisions.
In simple words, A Data warehouse contains the data from all/many segments of the business and more organized specifically to "facilitate Reporting and Analysis.

Important facts about Data warehouse:
  1. A data warehouse (OLAP) as a collection of data-marts and each data-mart consists of one to many (OLTP) databases where the database is specific to a specific problem set.
  2. Data Warehouse provides a high performance for reporting and analytical queries which is used by executives to organize, understand and use their data to take strategic decisions.
  3. Dimensional Modeling techniques are used for the Data Warehouse design.
  4. Data is not updated frequently in the data warehouse and possesses consolidated historical data.
Data Warehouse Features
As we know that Data warehousing has a clear set of objectives such as data persistence, single easy to navigate data model, fast query performance, etc. and have the following key features -
Subject-Oriented - A data warehouse is subject oriented because it provides information around a subject (product, customers, suppliers, sales, revenue, and inventory etc) rather than the organization's ongoing operations. A data warehouse focuses on modeling and analysis of data for decision making.
Integrated - A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.
Time-Variant - The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view. The data warehouse clearly must account for changes in the source system.

Non-Volatile - Once data is in the data warehouse, it will never be changed means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in the operational database are not reflected in the data warehouse. 

Why Put Separate Data Warehouse?
There are following reasons to put separate data warehouse from the operational databases-
To maintain the high performance for both systems:
  • Operational Database is tuned for OLTP activities such as access methods, indexing, concurrency control, recovery
  • Data Warehouse is tuned for OLAP activities such as complex OLAP queries, multidimensional view, and consolidation.
Having different functions and different data:
  • Missing data: Business decision support requires historical data which operational Databases do not typically maintain the missing data.
  • Data consolidation: Data Warehouse requires consolidation (aggregation, summarization) of data from heterogeneous sources.
  • Data quality: different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.
To know more on OLTP vs OLAP

OLAP is a powerful analysis tool for forecasting, statistical computations, aggregations and involves more than just the multidimensional display of information. OLAP tools also must be able to extract and summarise the requested data according to the needs of an end user, and there are two approaches for this data extraction that need to be discussed.
Needs of Data Warehouses?
A typical organization generates lots of data during their day to day business activities. To take the remarkable decision to get success in the business, information assets (data) are seriously valuable to any enterprise, and because of this, these assets must be properly stored and readily accessible when they are needed. This information usually contains historical data which derived from transaction data and can include data from other sources. After storing the data in the data warehouse, it is ready to focus on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.

In this way, it provides a simple and concise view on particular subject issues by excluding data that are not useful in the decision support process. They are designed to accommodate ad hoc queries and updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques.
We need to clean and process our operational data which comes from various source systems before putting it into the warehouse and business users directly access data derived from several source systems through the data warehouse.  


No comments:

Post a Comment