Demystifying Bug Triage

I adapted this post from an internal guide I made for one of our teams. My goal was to demystify bug triage, lay out the basic hows and whys, and get buy-in from the team. I wanted everyone to feel comfortable triaging the issues reported in the team’s GitHub repositories (or other bug trackers).

The term “triage” comes from medicine, where it’s the process of determining the order in which patients will receive treatment based on the severity and urgency of their medical condition. At Automattic we apply the term “triage” to the processes we use to determine the severity and urgency of bug reports (and the potential positive impact of enhancement requests) so we can prioritize open issues. In other words, it’s how we keep our GitHub repos organized and make sure we can identify the next most important thing to work on.

How to Triage

What processes do we use for triage? Triage is primarily the initial review and prioritization of all new issues as they are opened in GitHub:

  • Add a label identifying the topic, feature, or epic related to the issue.
  • Add a label identifying the type of issue (e.g. bug or enhancement).
  • Add a label identifying the priority, if it’s clearly a high or low priority issue.
  • Check the issue to see if it’s missing any critical information, such as steps to reproduce or the device or app version where the bug occurs.
  • Add the issue to relevant projects or milestones for followup. If it’s a critical/blocking bug, escalate the issue in other ways, such as a direct ping to a team member.
  • Especially important when someone outside the team opened the issue: leave a comment to acknowledge the contribution and set expectations for followup.

I also use the term “triage” as an umbrella term for all the processes we use to review issues, and this includes reviewing all open GitHub issues on a regular basis:

  • Make sure that open issues are still valid and complete.
  • Look for trends, e.g. a group of issues related to a specific feature or component.
  • Re-prioritize issues when team goals and priorities change, or in response to trends you identified.

The exact timing for triaging new issues and reviewing existing issues depends on the team and project. If you’re just getting started, I’d suggest triaging new issues at least once per week and reviewing existing issues at least once per quarter (or whenever there’s a larger conversation about what to work on next).

Why to Triage

Why do these processes matter? They make it easier to:

  • Identify related issues that can be fixed at the same time, that show a potential weakness in a particular part of the app, or that point to a potential longer-term project.
  • Gauge the health of the app, in terms of number of issues and their severity.
  • Prioritize issues for regular maintenance.
  • Respond to all reports, especially from external contributors and reporters, to make sure nothing falls through the cracks.

Get Started

If you’re new to triage, here are some next steps you can take to get yourself and your team started:

  1. Agree on a consistent set of labels and what they’ll be used for. If you’re using GitHub, there is a set of default labels you can start with — but most important is to think of what’s useful for your team and how you work.
  2. Set up any projects or milestones you have or are planning to use to organize your work.
  3. Review all open issues (add labels, assign priority, check for completeness, etc.).
  4. Practice labeling new issues with appropriate topic, type, and priority labels. Hold yourself and your team accountable for doing this on all new issues you open.
  5. Identify a triage DRI (“Directly Responsible Individual”) and set a cadence for triaging new issues and reviewing existing issues going forward.

As with any work, be prepared to reflect and iterate on your processes. So far this approach has worked well for me and the teams I work with, but you may need to add or subtract steps to make it fit the way you work.

What do you think? Are your teams already doing this kind of triage? Are there any other steps or processes that you use to keep open issues organized and prioritized?

Wrangling Excellence

Today marked a big change for me at work.

For the past 4+ years, I worked as a Happiness Engineer supporting WordPress.com and the WordPress apps. I spent roughly the first two years working in the WordPress.com Support Forums, and I found that I loved providing public support and troubleshooting the incredible range of issues that arose there. I spent the past two years supporting the WordPress apps, and over time I got more and more involved in testing them as well.

As I spent time developing on my own manual testing approach, working with beta testing communities, exploring the support/development feedback loop, and encouraging my coworkers’ troubleshooting skills, I also kept an eye on a team being formed at Automattic around automated testing and bug prioritization. I worked with and learned from them as more discussions arose around testing and quality within our fast-paced, distributed environment. And although I enjoyed helping people use WordPress, I discovered that my favorite work was helping development teams understand our customers’ needs and identify what issues most needed their attention.

Earlier this year, I finally decided to build on my existing coding skills to try my hand at automated testing. With some guidance, I developed the first suite of UI tests for a new editor (codenamed “Aztec”) for the WordPress for iOS app. Later I added a suite of UI tests for the same editor for WordPress for Android. I also worked with a coworker to automate screenshots of the WordPress.com signup flow in multiple languages, to help our internationalization team review those localized flows. Some of this work was part of a trial, as I applied internally to change roles.

That work and study paid off, and today I started my first day as an Excellence Wrangler. I’ll be automating tests, doing manual testing, triaging bugs reports, and generally helping our support and development teams communicate and prioritize to create the best experience possible for our customers.

And if that excitement wasn’t enough, I also had a delivery that I’ve been waiting on since I hit my four-year anniversary at Automattic — a new laptop with the WordPress logo:

2017-08-07 21.00.15.jpg

Sharing User Feedback from App Reviews

Over the past year, I’ve been working fairly closely with the mobile app team at Automattic. As I got more involved, I tried to help close the feedback loop with the team by taking advantage of the feedback our users were already giving us — so of course I took a look at our app reviews.

It’s hard to look through app reviews. I mean, on one hand, it’s just emotionally draining to be hit with that barrage of unmediated criticism (although the unmediated praise is wonderful!). But it’s also hard to grok all that feedback when it’s just a stream of comments. So I decided to collect that feedback and present it to the team in an easier-to-digest format. I’ve now gone through that process several times and want to share it in case it helps you process reviews or other feedback from your customers.

Collect and Organize the Feedback

The first step is to gather up all the feedback. I used App Annie, since our mobile app team was already using it. I decided to identify all the reviews from the latest version of our app (in my case, it was the WordPress app on two platforms, iOS and Android) and export them. This conveniently dumped all of the ratings, reviews, and user details into CSV files (one per platform).

Then, I set up a spreadsheet for each platform and focused on a few key details:

  • The user’s rating (from 1 to 5)
  • The review’s title and content (adding a translation where the review was in another language)
  • The main issue in the review
  • Any secondary issues or notes about the review

How did I identify the main and secondary issues? A little analysis.

Analyze the Feedback

To find the main and secondary issues, I read every single review from that version of the app. I picked keywords to describe the main issues users described and assigned one of these keywords (categories) to each review.

If you’ve ever coded survey responses, this is a similar process. If this is the first time you’ve done this, here are some tips:

  • Read through all or a representative sample of the reviews. (For your first time, and especially for an unfamiliar product, you might need to read all of them.)
  • As you read, make notes about the topics or keywords that come up (more is better at this stage).
  • Pare down your list to a subset of more general keywords. For example, for the WordPress app I used keywords like “Editor,” “Login,” and “Media upload.”
  • Go through the reviews one by one and assign a keyword for the main issue the user described.
  • If the user mentioned more than one issue, or there is additional detail that you think will be helpful later on, add it in the field for secondary issues or notes. For example, I found a number of reviews with the “Editor” keyword that specifically mentioned “limited features” in the editor, so that went into the second field so I could keep track of that sub-issue.

Pro tip: To keep my sanity, I worked from 1-star reviews to 5-star reviews, so the toughest criticism came when I had the most energy and the work got easier and more cheerful as I went.

Once I was done assigning keywords to each review, I organized the spreadsheet by those keywords so I could see which issues were most commonly reported. I made adjustments to the keywords, looked for subsets of related issues, and checked everything for consistency. Finally, I got ready to share my findings.

Share the Feedback

I had a few self-imposed guidelines for what I wanted the team to get from this user feedback:

  • Praise for the things we are doing well
  • A clear picture of the top pain points our users experience
  • Suggestions for what action could have the biggest impact

Here’s a template showing how I organized my report:

Overview:
- Number of reviews
- Average rating
- How ratings are weighted (evenly spread? split between 1 and 5 stars?)

Highlights:
- Features or experiences that our users enjoy and appreciate
- 2-3 quotes from positive reviews

What did users mention in their reviews?
- The top three issues mentioned in reviews
- For each issue, an explanation of its impact (how many or what percentage of reviews mentioned it? what were the star ratings for those issues?) and a little context about what exactly users discussed and your interpretation of the source of the problem
- Links to any open bug or enhancement issues the team is already tracking, or any ongoing work related to the issue

Suggestions for followup:
- One or two projects, or open issues in the bug tracker, that the team could make a top priority to help address this feedback
- Any other user feedback (for example, from customer support interactions) that could shed additional light on the feedback in the reviews

I shared this with our entire mobile app team (along with the spreadsheets with the raw data), inviting questions and discussion. Although we haven’t taken action on every single issue, it has led to some quick wins, reprioritizing, and planning ahead with our users in mind.

I hope this is useful to you and your team! If you try it out, let me know how it goes. And if you have ideas for how to improve this process, I’d love to learn from you.

The Problem with Averages

If you’re interested in inclusive design, I’d recommend listening to “On Average” from the podcast 99% Invisible. From the episode:

So in 1926, when the army was designing its first-ever fighter plane cockpit, engineers measured the physical dimensions of hundreds of male pilots and used this data to standardize cockpit dimensions. Of course, the possibility of female pilots was never considered. Of course.

The size and shape of the seat, the distance to the pedals and the stick, the height of the windshield, even the shape of the flight helmets were all made to conform to the average 1920’s male pilot. Which changed the way the pilots were selected.

You basically then select people that fit into that and then exclude people that don’t.

Designing for the average and excluding anyone who doesn’t fit that average isn’t, well, inclusive. The episode goes on to discuss how design (including in the military) has become more inclusive — but it’s still something we struggle with.

From what I’ve seen in software design and development, one of the challenges is deciding which user personas and scenarios will be considered in your design, and which issues are only edge cases. Where and how do you draw that line? It’s also a matter of just remembering to think outside your own perspective, to consider cases that you haven’t thought of already. (As designer and fellow Automattician Mel Choyce pointed out, it’s about challenging your own biases by seeking out and really listening to users with different perspectives.)

As a linguaphile, I tend to notice when software design struggles or forgets to include non-English languages. For example, designs that aren’t responsive to languages that take up more space (ahemGermanahem) or that don’t consider right-to-left languages end up excluding entire populations of potential users in other parts of the world. It can be hard for monolingual designers and developers to know how their products work in other languages, and it’s always satisfying when I have a chance to test a product and suggest language-based enhancements, so people can use our products in any language. It’s one small way I can help democratize publishing for users around the world.