In the past few years more and more teams have started to adopt agile approaches to data warehousing (DW) and business intelligence (BI). We have been at the forefront of this movement, having developed foundational techniques around agile data modelling, database refactoring, agile database testing, and many more that enable teams to be more agile in their approach to DW/BI development. This blog posting overviews the differences between how a traditional team and a disciplined agile team team approaches DW/BI development. To understand these differences, the following diagram compares the typical level of expended effort creating artifacts on traditional and disciplined agile teams.
There are several interesting differences to between the approaches. Disciplined agile DW teams will:
- Create a high-level conceptual model early. A high-level conceptual model, or more accurately diagam at this point, identifies the critical business entity types and the relationships between them. This provides vital insight into the domain while helping to capture key domain terminology, thus helping to drive consistency of wording in other artifacts (such as user stories and epics). Traditional teams will often make the mistake of over documenting the conceptual model early in the lifecycle, injecting delay into the team (with the corresponding opportunity cost of doing so) as well as the risk of making important decisions when you and your stakeholders have the least knowledge of the actual end goal.
- Evolve a light-weight logical data model (LDM) over time. If your agile DW team does this at all they will keep their LDM very light weight. Traditional teams will often invest heavily in their LDMs as they believe it is a mechanism to ensure quality and consistency through specifying it in. This often proves to be wishful thinking in practice. Disciplined agile teams instead invest their efforts in creating an executable specification in the form of regression tests (more on this below).
- Evolve a detailed physical data model (PDM) over time. Disciplined agile teams realize that a PDM, when created via a data modeling tool with full round-trip engineering (it generates schemas as well as imports existing schemas), effectively becomes the source code for the database. As the requirements evolve the team will evolve the PDM to reflect these new needs, generating schema changes as needed. They can work this way because they are able to easily refactor and regression test their database. This is different from the traditional approach where they often perform detailed modeling up front. This is motivated by the mistaken belief that production database schemas are difficult to evolve, something that agilists know not to be true.
- Develop a comprehensive regression test suite over time. These tests address several important issues. First, they validate the work of the team, showing that their work to date fulfills the requirements as they’ve been described to the team. Second, a regression test suite enables the team to safely evolve their work. Agile developers can make a small change, rerun their regression tests, and see whether they broke something (if so, then they either rollback their change or they fix what they broke). Third, when a test is written before the corresponding database schema or database code is developed, the test effectively becomes a detailed executable specification. Sophisticated agile DW teams will capture the kinds of information that were previously captured in LDMs in executable tests, and are thus much more likely to have consistent schemas than teams that still rely on static LDMs.
- Capture critical meta-data over time. Because the rest of your organization may not be completely agile there is often a need to continue to capture key meta data about data sources. This meta data should be kept as light as possible. If there isn’t a definite need for it then don’t capture it. If someone says “but we might need it someday” then wait until someday and invest in capturing the information at that point. Furthermore, instead of capturing meta data in a static manner (i.e. as documentation) try to identify ways to capture it as tests, or to generate it automatically from other information sources. Any documentation that you write today needs to be maintained over time, slowing you down.
This blog posting is an excerpt from the detailed article Disciplined Agile Data Warehousing.