Saturday, September 20, 2014

The Kimball Approach

The Kimball Approach

The Dimensional Data Model

1) Starts with tables
     a) Facts
     b) Dimensions

2) Facts contain metrics

3) Dimensions contain attributes 
  -> May contain repeating groups

4) Does not adhere to normalization theory

5) User accessible

=================================================

The Kimball Data Bus

1) Data is moved to staging area
  -> Data is scrubbed and made consistent

2) From Staging Data Marts are created
 
3) Data Marts are based on a single process
 
4) Sum of the data marts can constitute an Enterprise Data Warehouse

5) Conformed dimensions are the key to success

=================================================

The Kimball Design Approach

1)  Select business process

2)  Declare the grain

3)  Choose dimensions

4)  Identify facts (metrics)

=================================================

Kimball’s Philosophy

1) Make data easily accessible

2) Present the organization’s information consistently

3) Be adaptive and resilient to change

4) Protect information

5) Service as the foundation for improved decision making.

=================================================



The Inmon Model

The Inmon Model

1) Consists of all databases and information systems in an organization…..
     The CIF (Corporate Information Factory)

2) Defines overall database environment as:
    Operational
    Atomic data warehouse
    Departmental
    Individual
  
The Warehouse is part of the bigger whole (CIF)

================================================

Inmon Modeling

Three levels of data modeling

1) ERD (Entity Relationship Diagram)
Refines entities, attributes and relationships

2) Mid-Level model (*DIS*)
  
   i)    Data Item Sets
   ii)   Data sets by department  
   iii)  Four constructs
         a) Primary data groupings
         b) Secondary data groupings
         c) Connectors
         d) “Type of” data

3) Physical data model

Optimize for performance (de-normalize)

================================================





Inmon v/s Kimball Approach

Kimball vs. Inmon

Inmon:

Subject-Oriented
Integrated
Non-Volatile
Time-Variant
Top-Down
Integration Achieved via an Assumed Enterprise Data Model
Characterizes Data marts as Aggregates

Kimball

Business-Process-Oriented
Bottom-Up and Evolutionary
Stresses Dimensional Model, Not E-R
Integration Achieved via Conformed Dimensions
Star Schemas Enforce Query Semantics


Inmon
Kimball
Overall approach
Top-down
Bottom-up
Architectural structure
Enterprise-wide DW
feeds departmental DBs
Data marts model a
business process;
enterprise is achieved
with conformed dimensions
Complexity of method
Quite complex
Fairly simple
Data orientation
Subject or data driven
Process oriented
Tools
Traditional (ERDs and
DIS)
Dimensional modeling;
departs from traditional
relational modeling
End user accessibility
Low
High
Timeframe
Continuous & Discrete
Slowly Changing
Methods
Timestamps
Dimension keys

Kimball Slowly Changing Dimension Management

Define data management via versioning

Type I
 Change record as required
 No History

 Type II
 Manage all changes
 History is recorded

 Type III
 Some history is parallel
 Limit to defined history


Inmon Continuous & Discrete Dimension Management

Define data management via dates in your data

Continuous time
 When is a record active
 Start and end dates

Discrete time
 A point in time
 Snapshot

The Comparison(Philosophy)



















DWH

1) A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.

2) It usually contains historical data derived from transaction data, but it can include data from other sources.

3)  It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

When it comes to designing a data warehouse for your business, the two most commonly discussed methods are the approaches introduced by Bill Inmon and Ralph Kimball.

================================================================

Kimball, in 1997, stated that 
 "...the data warehouse is nothing more than the union of all the data marts"

 Kimball indicates a bottom-up data warehousing methodology in which individual data marts providing thin views into the organizational data could be created and later combined into a larger all-encompassing 
data warehouse.

Inmon responded in 1998 by saying, 
 "You can catch all the minnows in the ocean and stack them together and they still do not make a whale" 

 This indicates the opposing view that the data warehouse should be designed from top-down to include all corporate data. In this methodology, data marts are created only after the complete data warehouse has been created.

================================================================

Debates on which one is better and more effective have been on for years. But a clear cut answer has never been arrived upon, as both philosophies have their own advantages and differentiating factors, and enterprises continue to use either of these.

================================================================
 Bill Inmon, has formally defined a data warehouse in the following terms:

Subject-oriented
 The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;

Time-variant
 The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;

 Non-volatile
 Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting

 Integrated
 The database contains data from most or all of an organization's operational applications, and that this data is made consistent

Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

================================================================

 Ralph Kimball, a leading proponent of the dimensional approach to building data warehouses, provides a succinct definition for a data warehouse:
 “A copy of transaction data specifically structured for query and analysis.“

Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

================================================================


There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse systems in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse.