Data Warehouse Concept

Data warehouse is mutidimensional data structure where Dimention is a category of information. For example, “Customer”, “Date”, and “Product” are all dimensions that could be applied meaningfully to a sales receipt. The primary function of dimensions is to filter, group and label.  The schema for a dimensional model contains a central fact table and multiple dimension tables. Fact table contains the measure typically numeric value to the lowest level of granularity of dimensions.

A dimensional model may produce a star schema or a snowflake schema.

If you compare with OLTP, it has few indexes and joins and data is denormalized.

Data warehouses are designed to accommodate ad hoc queries. A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques.

The end users of a data warehouse do not directly update the data warehouse. Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.

A typical data warehouse query scans thousands or millions of rows. Data warehouses usually store many months or years of data. This is to support historical analysis.

Leave a Reply