Saturday, September 20, 2014

DWH

1) A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.

2) It usually contains historical data derived from transaction data, but it can include data from other sources.

3)  It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

When it comes to designing a data warehouse for your business, the two most commonly discussed methods are the approaches introduced by Bill Inmon and Ralph Kimball.

================================================================

Kimball, in 1997, stated that 
 "...the data warehouse is nothing more than the union of all the data marts"

 Kimball indicates a bottom-up data warehousing methodology in which individual data marts providing thin views into the organizational data could be created and later combined into a larger all-encompassing 
data warehouse.

Inmon responded in 1998 by saying, 
 "You can catch all the minnows in the ocean and stack them together and they still do not make a whale" 

 This indicates the opposing view that the data warehouse should be designed from top-down to include all corporate data. In this methodology, data marts are created only after the complete data warehouse has been created.

================================================================

Debates on which one is better and more effective have been on for years. But a clear cut answer has never been arrived upon, as both philosophies have their own advantages and differentiating factors, and enterprises continue to use either of these.

================================================================
 Bill Inmon, has formally defined a data warehouse in the following terms:

Subject-oriented
 The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;

Time-variant
 The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;

 Non-volatile
 Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting

 Integrated
 The database contains data from most or all of an organization's operational applications, and that this data is made consistent

Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

================================================================

 Ralph Kimball, a leading proponent of the dimensional approach to building data warehouses, provides a succinct definition for a data warehouse:
 “A copy of transaction data specifically structured for query and analysis.“

Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

================================================================


There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse systems in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into a data warehouse.

No comments:

Post a Comment