Project Management

Sensible Surrogate Keys

linkedin twitter facebook print Request to reuse this   Business Intelligence   Strategy  

I'd like to acknowledge that I got the inspiration for this article from one of my ETL developers. Through the early stages of my current project's cycle, I battled through thick and thin to get a surrogate key for a certain large field, the VIN (Vehicle Identification Number), on our database.

 

The field was 17 bytes long, uniquely identified a dimensional entity, and has some business meaning embedded into it. However, if the natural key was utilized (as opposed to the surrogate key), this 17 byte value would need to be on each record of all the fact tables, which would encompass nearly 1 billion rows or an additional 17GB plus indexes and overhead.

 

Some of the space bigots on my project opposed the natural keys and the ETL developers wanted to keep the natural key because it eliminated the dimensional lookup step from the process. They could not decide amongst themselves, so I did--"use surrogate keys."

 

Like the Ten Commandments, I've always been taught that using surrogate keys in a data warehouse is one of the golden rules of data warehousing. In Ralph Kimball's book The Data Warehouse Lifecycle Toolkit, he states "All data warehouse keys must be meaningless surrogate keys. You must not use the original production keys." This statement is highlighted by a light bulb and separated from the rest of the text. It sounds pretty definitive …


Please log in or sign up below to read the rest of the article.

ADVERTISEMENT

Continue reading...

Log In
OR
Sign Up
ADVERTISEMENTS

"Among those whom I like or admire, I can find no common denominator; but among those whom I love, I can: All of them can make me laugh."

- W.H. Auden

ADVERTISEMENT

Sponsors