On Tuesday, February 26th I ran a webcast entitled Disciplined Agile Data Warehousing/Business Intelligence. During the webcast we received several very good questions, some of which we had time for and some of which we didn’t get to. Regardless, we’ve decided to answer all of them here in this blog. In a few cases we’ve had to reword the questions to correct spelling or grammar mistakes and in a few cases we combined questions because they were effectively the same. We have organized the questions and answers into the following categories:
- Start here
- Vertical slides of DW/BI functionality
- Architecture and design
- Refactoring databases
- Other development practices
- Where can we learn more?
Where can we download the slides?
- A PDF of the slide deck can be found on Slideshare.net.
- The recording of the presentation can be found on the DAC webinars page.
- We have also created a Disciplined Agile DW/BI poster that you can download from the DAC posters page.
Vertical Slices of DW/BI Functionality
How do we start with high-risk, vertical slices?
Disciplined Agile teams take a risk-value approach to prioritizing their work, an extension to Scrum’s value-driven approach. The basic idea is that disciplined agile teams will implement the highest-risk requirements first so as to prove the architecture with working code early in the lifecycle. This strategy works quite well for DW/BI solutions, just like any other type of solution. To do so, you need to understand the risks that your team faces. For a DW/BI solution, these risks may include:
- Can you access the data in key operational data sources?
- Can you process the volume of incoming data?
- Can you address data quality issues found in operational data sources?
To address these risks, look for high-value requirements whose implementation would force your to address the risks. Implement those requirements first.
How can we include all of the stages of DW/BI development into iterations/sprints e.g. Data modeling, staging, profiling, dd, DQ, ETL, reporting, testing into single iteration?
This can be a struggle for any team new to agile, not just a DW/BI team. The challenge is that the majority of organizations have taken a Tayloristic approach to organizing the way that they work. They have specialists who each do a portion of the work, handing off their portion to the next person once they’ve completed it. It is virtually impossible for specialists to get all of the work done to develop a working, vertical slice of the solution within the timeframe of a two-week iteration, let alone one that is shorter. The overhead of specialists trying to work in a Tayloristic, “software factory” strategy is just too great. Unfortunately the culture within the data community tends to motivate over specialization and the overhead surrounding it.
What agile teams need are generalizing specialists, T-skilled cross-functional people who work together collaboratively. Each person has one or more specialties, they need to be able to do something useful, but they also have a general knowledge of the rest of the process and are willing to pick up new skills from one another. When your team is made up of people like this the wait time between tasks (modelling, development, testing, …) starts to disappear as does all the bureaucracy (reviews, traceability matrices, …) around coordinating such activities.
The fundamental challenge is that you likely don’t have generalizing specialists right now. As we like to say at Scott Ambler + Associates, you go to war with the army that you’ve got. You need to build a team of specialists right now because that’s the type of people you have. Insist that they produce a vertical slice of the solution during the current iteration. There will very likely be a lot of complaining about this at first, often because the team can’t imagine how they can pull this off. If possible, colocate them in an agile team room (sometimes called a tiger team room or war room) to get them working side-by-side. This will help to improve communication between the people involved and provide better ways to collaborate (such as agile modelling at a whiteboard). Push the idea that they should be doing non-solo work – such as pair programming, mob programming, or modelling with others – so as to share skills and get the work done quickly. They will need a strong agile coach to help them to learn these new strategies and to break themselves of their ineffective specialist habits. An important thing to observe is that many other teams have discovered how to work this way – you can too.
Vertical slice is good but if only half of a report is created in a single iteration it may not be useful to the stakeholders. What should we do?
The idea is to get something that works completely done each iteration. The stakeholders, often via the Product Owner (PO), will determine whether what you’ve built can be deployed into production. This is why we use the term “potentially consumable solution“, an improvement over Scrum’s “potentially shippable software” – it’s potentially consumable, but that doesn’t mean it has to be deployed only that the option to deploy is there.
Having said this, if possible find a way to complete the entire report in a single iteration. Sometimes easier said than done.
How to handle data coming from multiple source during data modeling within a single sprint when we are expected to develop a report in end?
This is a very common occurrence. The solution is that you only model enough for that report at the time. Early in the lifecycle during Inception you will have done a bit of high-level modelling to explore the initial scope and to identify your technical strategy (your architecture). These high-level visions are fleshed out during Construction each iteration via Agile Modeling practices such as just-in-time (JIT) model storming, look-ahead modelling (backlog refinement in Scrum), and even iteration modelling (an aspect of iteration/sprint planning).
Architecture and Design
But many of the companies are not using data vault? And may companies are reluctant in using this?
The Data Vault 2 method is not required to be agile, but it is an approach that we highly suggest due to its practicality and flexibility.
While using Data Vault how can we overcome challenges with teams supporting and creating data from various companies and geographies. Are there are control risks associated?
Regardless of the architecture and design methodology that you follow, you will have risks associated with using data from multiple sources. Those risks tend to increase with multiple geographies or multiple companies involved. The greater the risk, the greater the importance of having a database regression test suite in place that validates your work.
Can we take these database refactorings as technical debt user stories?
Database refactorings are small changes to the design of your database schema (which includes functionality such as triggers or stored procedures) that improves the quality of the design without changing the semantics of the schema in a practical manner. Examples of database refactoring includes Rename Column, Introduce Cascading Delete, and Replace One-to-Many with Associative Table. A full catalog of database refactorings can be found here. Because database refactorings are small you should just do them as a matter of course as you work on the database, they generally aren’t large enough to justify their own work items (such as a technical debt story). I suggest that you read the Spilled Juice Analogy. However, if you wanted to fix a collection of refactorings up into a single technical debt story I suppose you could do this.
Most often, data sources sit in the “business” area, that cares very little about how software are works or needs to work. Isn’t this attempt to clear data at the source something that will put strain on the organization?
I suggest that you read the Spilled Juice Analogy. Does the business want to be able to make decisions based on information that they can trust? Does the business want to reduce their long-term IT costs? Does the business want IT to be able to bring solutions to market quicker? If the answer to any of these questions is yes then they need to start treating data like an asset and invest in concrete quality techniques such as database regression testing and database refactoring.
Other Development Practices
Can we use database virtualization?
Sure, why not?
Automation – How that can be done in DW/BI project?
- Write regression tests for your database.
- Adopt continuous database integration.
- Write automated deployment scripts for database changes.
How does the practice of spikes apply?
A spike is a bit of code that is used to explore or prove a concept, typically written to pay down a technical risk. In a DW/BI solution that might be a bit of code to:
- See how a data source is accessed
- Explore a feature of your ETL tool
- Work with a BI tool for the first time
- Explore whether a data source can handle expected volume
- And many more technical risks.
How do you make sure that documentation is updated and yet consumes less time which is accommodatable within the Agile iteration/sprint of 2 weeks? Without documentation the developers who come into the project at a later point of time are in no man’s land without updated documentation?
- Agile/Lean Documentation Strategies
- Practice: Document Continuously
- Practice: Document Late
- When should we create a document on an agile team?
How can enterprise data governance fit into an agile DW/BI mindset?
What lifecycle would be most appropriate for projects implementing COTS software solutions (like ERP) within the company so investment would be maximized?
It depends on the situation that you face, including the skills of the people involved. I would think that your best bet is the Agile/Basic lifecycle that is based on Scrum.
Where Can We Learn More?
At the DAC posters page you can download the Disciplined Agile DW/BI poster (amongst many others).
We have a detailed article entitled Disciplined Agile Data Warehousing that you will find informative.