DW- Architecture and Multidimensional Model
We know that data warehousing is a collection of methods, techniques, and tools which is used to support knowledge workers such as senior managers, directors, managers, and business analysts to conduct data analyses that help with performing decision-making processes and improving information resources.
Fundamentally, the data warehouse is the secured place for the historical data where data is never deleted and data updates are normally carried out when data warehouses are offline.
Data Warehouse (with a Staging Area and Data Marts) - Three-Layer Architecture
As per the enterprises/business requirements, you can easily customize your data warehouse architecture according to the different business group. For example, if you want to create a data warehouse for an ERP system (where purchasing, sales, and inventories are separated) then you can do this by adding different data marts, which are systems designed for a particular line (business modules) of business.
The main advantages of the three-layer architecture is that –
- It creates a common reference data model for a whole enterprise.
- At the same time, it sharply separates the problems of source data extraction and integration from those of data warehouse population.
- Identically, in some cases, the staging layer is also directly used to better accomplish some operational tasks, such as producing daily reports that cannot be satisfactorily prepared using the corporate applications or generating data flows to feed external processes periodically so as to benefit from cleaning and integration.
- This approach assumes that although a data warehouse is available, it is unable to solve all the queries formulated.
- This means that users may be interested in directly accessing source data from aggregate data (drill-through).
- Data Marts are dependent on the data warehouse and metadata but they are independent to provide information for a particular line of business.
Multidimensional databases have generated much research and market interest because they are fundamental for many decision-making support applications, such as data warehouse systems. Before building a data warehouse, you need to have defined the business requirements and also agreed upon the scope of your application, and created a conceptual design where you need to translate your requirements into a system deliverable. To do so, you have to choose the ideal data warehouse architecture to fulfill the logical and physical requirement by considering the following things -
- The specific data content
- Relationships within and between groups of data
- The system environment supporting your data warehouse
- The data transformations required
- The frequency with which data is refreshed
The multidimensional model is the base of any data warehouse and begins with the observation that the factors affecting decision-making processes are enterprise-specific facts, such as sales, shipments, hospital admissions, surgeries, and so on.
The concept of dimension gave life to the broadly used metaphor of cubes to represent multidimensional data. According to this metaphor, events are associated with cube cells and cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is called a hypercube.
Dimensions of the cube are the equivalent of entities in a database. For example, the sales in a store chain can be represented in a three-dimensional space whose dimensions are products, stores, and dates. A data cube allows data to be modeled and viewed in multiple dimensions.
Dimension tables - A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time.
Fact tables -A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation.
Requirements of Fact Tables
You must define a fact table for each star schema or snowflake schema. From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all of its foreign keys.
Conceptual Modeling of Data Warehousing
In the relational implementation, the dimensional designs are mapped to a relational set of tables such as dimensions & measures.
Star schema: It supports to de-normalization data concept. A star schema model can be depicted as a simple star: a central table contains fact data and multiple tables radiate out from it, connected by the primary and foreign keys of the database.
In a star schema implementation, Warehouse Builder stores the dimension data in a single table or view for all the dimension levels.
Snowflake schema: It supports to normalization data concept. The snowflake schema represents a dimensional model which is also composed of a central fact table and a set of constituent dimension tables which are further normalized into sub-dimension tables.
In a snowflake schema implementation, Warehouse Builder uses more than one table or view to store the dimension data. Separate database tables or views store data pertaining to each level in the dimension.
Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation.
No comments:
Post a Comment