Sensible Surrogate Keys

Vijay Sankaran - May 2, 2003

Business Intelligence Strategy

I'd like to acknowledge that I got the inspiration for this article from one of my ETL developers. Through the early stages of my current project's cycle, I battled through thick and thin to get a surrogate key for a certain large field, the VIN (Vehicle Identification Number), on our database.

The field was 17 bytes long, uniquely identified a dimensional entity, and has some business meaning embedded into it. However, if the natural key was utilized (as opposed to the surrogate key), this 17 byte value would need to be on each record of all the fact tables, which would encompass nearly 1 billion rows or an additional 17GB plus indexes and overhead.

Some of the space bigots on my project opposed the natural keys and the ETL developers wanted to keep the natural key because it eliminated the dimensional lookup step from the process. They could not decide amongst themselves, so I did--"use surrogate keys."

Like the Ten Commandments, I've always been taught that using surrogate keys in a data warehouse is one of the golden rules of data warehousing. In Ralph Kimball's book The Data Warehouse Lifecycle Toolkit, he states "All data warehouse keys must be meaningless surrogate keys. You must not use the original production keys." This statement is highlighted by a light bulb and separated from the rest of the text. It sounds pretty definitive …

Please log in or sign up below to read the rest of the article.

Sensible Surrogate Keys

Continue reading...

Sponsors

Vendor Events

Guessing is not a strategy: How to build decision velocity with AI and real-time data

Newsletters