A dimension tables plays a critical role in data warehousing and has strong association with facts of the business process. It consists of descriptive attributes typically text fields which describe facts but are complete and have discreet values. Facts are the metrics that business houses use for making decisions. Dimensions are generally descriptions of facts and some are dimensions which say how to summarize facts to help analysts. The key functions of dimensions are to filter, group, and aggregate. Dimensions are hierarchical in nature and get classified to low level to explain facts in detail. Example: schedule can be broken to daily, weekly, monthly, quarterly, etc.
Primary aim of a dimension tables will be to create conformed dimensions that can be shared though out the data warehouse of the enterprise and join on to multiple fact tables which may be of various business processes.
Confirmed dimensions are critical to any enterprise since they create consistency in query outputs. Since they are consistent they can be linked up to various facts to represent cross functional impacts and benefits. Accidentally due to continuous practice and attention to data warehousing much of the dimensions remain reusable and reduce time to develop and deliver.
Dimensions change slowly and for varied reasons mostly driven by business changes and some are due to administrative reasons. These changes to dimensions are not scheduled or time based. This is called SCD (slowly changing dimensions).
They are classified in 6 types and managed accordingly.
Type 0 is a very passive and in some cases it stores historical data in dimension tables.
Type 1 overwrites old data and does not retain historical data.
The Type 2 method creates multiple records of a given natural key in the table with new surrogate keys. Hence type 2 facilitates historical data preservation.
Type 3 approach adds additional columns to the table instead of adding new entries like Type 2. Hence it has limitations on historical information.
Type 4 stores historical data (not all or all) in different tables and current data in a separate.
Type 6 follows a hybrid approach of type 1, type 2, and type 3. Different SCD’s may be useful on different columns of the same table.
Relationships with facts
Dimension table are generally small in size compared to facts. The primary key is always a surrogate key which is a unique number but not any operational data. Surrogate keys can be an overhead but they also bring in benefits which make it worthy of a standard practice. The use of surrogate key brings advantages such as performance boost while performing joins, protects from administrative changes to identifiers, put together dissimilar sources, and track changes to values of a dimension. All fact tables must maintain referential integrity hence the primary key (surrogate key) of the dimensions will act as foreign key in associated fact tables.
The dimension table should also have a natural key which identifies the dimension uniquely but does not need to be an integer.
Example: PRODUCT_ID in a product table which can take different forms but will remain unique for a product.