Captricity Engineering Machine: How we ship features
December 11, 2014 by Yoriyasu Yano
Last week we released Cap 2.14, the ZZTop release of Captricity, finishing the second cycle of A-Z version names under the theme of “80s pop bands” (our first cycle was themed “Famous people with great mustaches”). Next week we will release the Artisanal Pickling release of Captricity, starting our brand new theme of “Hipster things that Josh hates but secretly loves.” This got me thinking about our process and how we managed to ship 52 releases of Captricity.
Josh describing hipster things that he hates (but secretly loves)
When I tell people that I am the staff engineer leading a young team, they are often curious about our development process. Aside from getting really good at the alphabet game to name our releases, over the past 3 years we've worked hard to nail down our process to regularly ship high quality code.
We’ve found out first hand that teams with a poorly defined release process WILL FAIL to ship code on time and correctly. New tech co-founders and young companies who are starting out might find our lessons learned useful. Here is the 7 step process we’ve settled on after much trial and error:
Step 1: PR/FAQ
First, we need to figure out what to build. Inspired by the “Working backwards” approach at Amazon, we propose new features by writing up a sample press release describing the feature to the customer: a “PR/FAQ,” or Press Release/Frequently Asked Questions. We encourage everyone at the company to submit PR/FAQs and propose new features.
The point of the PR/FAQ is to be a lightweight document that immediately conveys the answers to the three most important questions about the feature: what it is, who cares about it, and why we should build it. The press release format provides a fun, concise way for us to share feature ideas.
The document is shared with a wide audience for comment and is voted on during a bi-weekly meeting where we prioritize the features we want to build in the coming 2-week sprint. It’s a great tool for ensuring that everyone in the company has a say in the product.
Step 2: Work breakdown and User stories
Once a PR/FAQ is vetted and prioritized, we then chunk it into manageable tasks. The chunking is done collaboratively between the feature owner and an engineer to ensure that the tasks that come out are small enough to be accurately estimated.
One of the key parts of shipping code is to get accurate estimates from the engineers so that the sprints can be scoped appropriately. We believe it’s worth putting in effort to get the best estimates possible because it allows engineering to collaborate with business partners and ship products that meet the customers’ expectations. The tasks are described in the form of user stories, which conveys what the system needs to do in a succinct and concise manner with just enough information to allow developers to provide estimates on the amount of work. These user stories usually take on the following format:
“As <target audience>, I want to <desired action>, so that I can <reason for wanting said action>.”
In one sentence, we are able to convey the who, what, and why of the desired task, which is useful in having high bandwidth conversations.
Step 3: Coding
At this point, we are finally ready to start writing some code. The first step is to open a new feature branch in git. This ensures that the developer has a stable base to build on in spite of the many concurrent development efforts. In general, the developer works on their own branch until the feature is complete and then we tackle the merge. This ensures the developer doesn’t need to context switch and resolve conflicts while coding the feature. The developer codes in the branch until she is satisfied, at which point she submits the code for review.
Step 4: Code Review
An important part of our ability to ship quality code is the code review process. Code review not only catches bugs, but it also allows for two way learning. The reviewer can keep tabs on the codebase, while the original developer gets a fresh pair of eyes on the problem.
We use Github’s pull request infrastructure for our code reviews because it provides a convenient, commentable diff view. The Github workflow also allows us to review code asynchronously, a choice we opted for early on because in person code reviews tend to be disruptive. In an in person code review, the reviewer gets interrupted from their normal workflow and undergoes expensive context switch to page in the code they are reviewing. By opting for an asynchronous code review workflow, the reviewer has the convenience of choosing when to review so that they don’t break their workflow.
Step 5: Continuous Integration and Merge
We use buildbot for our continuous integration (CI) and it is setup to build on every pull request and every merge to master. The build on pull request is an important step, since it guarantees that the full test suite is run during the review phase. It is easy to forget to run through the test suite when you are trying to meet a code freeze deadline.
We have a comprehensive test suite that tests various parts of the system with all kinds of unit, functional, integration, and regression tests. Many potential regressions are caught and stamped out in this stage thanks to the pre-merge CI build. Once the build passes and the code reviewer is satisfied, the feature branch is merged into master.
Step 6: Staging and Acceptance Testing
At the end of the sprint, all features that have been merged into master are deployed to our staging server. The staging server is an exact replica of the production system, including a copy of the production database. It is important to note that the staging database is disengaged from the production database to avoid data tampering and accidental loss from the new release (like an overly aggressive cascading delete).
Once the staging server is deployed, we go through acceptance testing of the user stories completed in the sprint to ensure that the corresponding feature has been implemented correctly. This involves some manual QA. Ideally, acceptance testing would be done in the form of an automated test suite, but we have learned that full on automated testing can be very hard, especially for front end testing. We’ve made some small steps in the right direction with the introduction of Selenium, but we can definitely do better.
This acceptance testing is important because unrelated features can collide and it’s difficult to catch that at the code review stage. This is especially true for front end stylistic changes where simple changes could have global effects (Sue didn’t know about the 5px padding added to the button class by Johnny, and so all the buttons on Sue’s feature look weird). The staging environment is a great place to see how all the new features interact with each other.
We also do a scrum style end-of-sprint demos on the staging server to the whole company. This helps keep everyone at Captricity up to date on all the features getting released. It also let’s us individually recognize the efforts of our developers.
Step 7: Deploy to Production
Once everything is copacetic on staging, it’s time to push to production! At this point, we have high confidence that:
- The user wants the feature we’re releasing thanks to the PR/FAQ process
- We’ve built the feature in a way that jives well with the Captricity roadmap thanks to the work breakdown and user stories provided by our product managers
- The code is sane and tested thanks to code review and CI
- Everyone agrees the feature looks correct thanks to the manual review on staging
The button is pressed, code is shipped, and we pop open the champagne. The only remaining challenge is to think of a hipster thing that Josh love/hates that starts with “B”--maybe hand knit beer koozies?
So there you have it. This is how Captricity ships high quality features on a regular basis:
- Work breakdown and User stories
- Code Review
- Continuous Integration and Merge
- Staging and Acceptance Testing
- Deploy to Production
We’ve developed this process through trial and error over the last three years, and still we haven’t stopped iterating. There are many things we know we can do better. In the future, we plan on incorporating:
- Better external testing to get third party insight
- Better design input early on in the process
- Greater automation for all parts of the deployment, leading to CDCI (continuous delivery and continuous integration)
- More automated testing to avoid manual QA and acceptance testing
We’re always looking for ways to improve our process. If you have any suggestions or insights you would like to share, please drop us a note! We love to hear feedback from the community. Or if you want to be hands on about improving our process, we’re hiring! Drop us a line at email@example.com and tell us what you’re passionate about.
We are looking for passionate, energetic people to help us solve interesting problems with real-world impact. From federal government agencies to hospitals and non-profits, the organizations that use Captricity are doing amazing, important work that helps people and communities far outside the usual reach of Silicon Valley.