On Tuesday, August 7 I facilitated a workshop about Database DevOps at the Agile 2018 conference in San Diego. I promised the group that I would write up the results here in the blog. This was an easy promise to make because I knew that we’d get some good information out of the participants and sure enough we did. The workshop was organized into three major sections:
- Overview of Disciplined DevOps
- Challenges around Database DevOps
- Techniques Supporting Database DevOps
Overview of Disciplined DevOps
We started with a brief overview of Disciplined DevOps to set the foundation for the discussion. The workflow for Disciplined DevOps is shown below. The main message was that we need to look at the overall DevOps picture to be successful in modern enterprises, that it was more that Dev+Ops. Having said that, our focus was on Database DevOps.
Challenges around Database DevOps
We then ran a From/To exercise where we asked people to identify what aspects of their current situation that they’d like to move away from and what they’d like to move their organization towards. The following two pictures (I’d like to thank Klaus Boedker for taking all of the following pics) show what we’d like to move from/to respectively (click on them for a larger version).
I then shared my observations about the challenges with Database DevOps, in particular the cultural impedance mismatch between developers and data professionals, the quality challenges we face regarding data, the lack of testing culture and knowledge within the data community, and the mistaken belief that it’s difficult to evolve production data source.
Techniques Supporting Database DevOps
The heart of the workshop was to explore technical techniques that support database DevOps. I gave an overview of several Agile Data techniques so as give people an understanding of how Database DevOps works, then we ran an exercise. In the exercise each table worked through one of six techniques (there are several supporting techniques that the groups didn’t work through), exploring:
- The advantages/strengths of the technique
- The disadvantages
- How someone could learn about that technique
- What tooling support (if any) is needed to support the technique.
Each team was limited to their top three answers to each of those questions, and each technique was covered by several teams. Each of the following sections has a paragraph describing the technique, a picture of the Strategy Canvas the participants created, and my thoughts on what the group produced. It’s important to note that the some of the answers in the canvases contradict each other because each canvas is the amalgam of work performed by a few teams, and each of the teams may have included people completely new to the practice/strategy they were working through.
Just like you can vertically slice the functional aspects of what you’re building, and release those slices if appropriate, you can do the same for the data aspects of your solution. Many traditional data professionals don’t know how to do this, in most part because traditional data techniques are based on waterfall-style development where they’ve been told to think everything through up front in detail. The article Implementing a Data Warehouse via Vertical Slicing goes into this topic in detail.
The advantages of vertical slicing is that it enables you to get something built and into the hands of stakeholders quickly, thereby reducing the feedback cycle. The challenge with it is that you can lose sight of the bigger picture (therefore you want to do some high-level modeling during Inception to get a handle on the bigger picture). To be successful at vertically slicing your work, you need to be able to incrementally model, or better yet agilely model, and implement that functionality.
Agile Data Modeling
There’s nothing special about data modelling, you can perform it in an agile manner just like you can model other things in an agile manner. Once again, this is a critical skill to learn and can be challenging for traditional data professionals due to their culture around heavy “big design up front (BDUF)”. The article Agile Data Modelling goes into details, and more importantly an example, for how to do this.
The advantages of this technique is that you can focus on what you need to produce now and adapt to changing requirements. The disadvantages are that existing data professionals are resistant to evolutionary strategies such as this, often because they prefer a heavy up-front approach. To viably model in an agile manner, including data, you need to be able to easily evolve/refactor the thing that you’re modelling.
A database refactoring is a simple change to your database that improves the quality of its design without changing the semantics of the design (in a practical manner). This is a key technique because it enables you to safely evolve your database schema, just like you can safely evolve your application code. Many traditional data professionals believe that it is very difficult and risky to refactor a database, hence their penchant for heavy up-front modeling, but this isn’t actually true in practice. To understand this, see the article The Process of Database Refactoring which summarizes material from the award-winning book Refactoring Databases.
Database refactoring is what enables you to break the paradigm of “we can’t change the database” with traditional data professionals. This technique is what enables data professionals to rethink, and often abandon, most of their heavy up-front strategies from the 1970s. DB refactoring does require skill and tooling support however. Just like you need automated tests to safely refactor your code, to safely refactor your database you need to have an automated regression test suite.
Automated Database Regression Testing
If data is a corporate asset then it should be treated as such. Having an automated regression test suite for a data source helps to ensure that the functionality and the data within a database conforms to the shared business rules and semantics for it. For more information, see the article Database Testing.
An automated test suite enables your teams to safely evolve their work because if they break something the automated tests are likely to find the problem. This is particularly important given that many data sources are resources shared across many applications. Like automated testing for other things, it requires skill and tooling to implement. To effectively regression test your database in an automated manner you need to include those tests in your continuous integration (CI) approach.
Continuous Database Integration
Database changes, just like application code changes, should be brought into your continuous integration (CI) strategy. It is a bit harder to include a data source because of the data. The issue is side effects from tests – in theory a database test should put the db into a known state, do something, check to see if you get the expected results, then put the DB back into the original state. It’s that last part that’s the problem because all it takes is one test to forget to do so and there’s the potential for side effects across tests. So, a common thing is to rebuild (or restore, or a combination thereof) your dev and test data bases every so often so as to decrease the chance of this. You might choose to do this in your nightly CI run for example. For more information, see the book Recipes for Continuous Database Integration.
Operational Data Monitoring
An important part of Operations is to monitor the running infrastructure, including databases. This information can and should be available via real-time dashboards as well as through ad-hoc reporting. Sadly, I need to write an article on this still. But if you poke around the web you’ll find a fair bit of information. Article to come soon.
This was a really interesting workshop. We did it in 75 minutes but it really should have been done in a half day to allow for more detailed discussions about each of the techniques. Having said that, I had several very good conversations with people following the workshop about how valuable and enlightening they found it.