Strategy & Best Practices
Quality Isn't Vibes: How We Created a Practical Framework to Measure Software Quality
By Mathew Spolin / May 13, 2026
In this article:
Hoping is not engineering
"if you can’t measure quality, you can’t manage it. And if you can’t manage it, you’re hoping rather than engineering."

When I ask engineers about quality, I get ten different answers. Code coverage. Test pass rates. Customer satisfaction. Time to resolution.
All of them are right, and none of them are complete. Product quality encompasses attention to every detail.
In software, the problem is that quality often turns into a feeling. A vibe. Teams know when things feel shaky. Leaders sense when trust is eroding. But without shared, measurable signals, quality conversations quickly devolve into opinions and trade-offs driven by whoever speaks loudest.
This isn’t a breakthrough idea, but it’s the one that fundamentally changed how we operate years ago: if you can’t measure quality, you can’t manage it. And if you can’t manage it, you’re hoping rather than engineering.
Why single metrics fail
"The problem isn't the metrics themselves. It's that quality is multidimensional. Optimizing for one dimension while ignoring others creates blind spots."
Researchers at Google attempted to correlate technical debt—one proxy for quality—with 116 existing codebase metrics. The results were, in their words, "disappointing." No single metric predicted engineer assessments of quality. Their linear regression models predicted less than 1% of the variance in survey responses.
The problem isn't the metrics themselves. It's that quality is multidimensional. Optimizing for one dimension while ignoring others creates blind spots. You can have 95% code coverage and still ship bugs that wake people up at 3 AM. You can have great uptime and still lose customers to slow performance.
What you need is a composite view—a framework that captures multiple dimensions of quality that matter to customers, operators, and the business.
How we made software quality measurable: A practical framework
"The goal wasn’t competitively ranking teams. Instead, we created a shared language for quality."
Years ago at AppDirect, we built a quality framework around five pillars. Each measures a different aspect of how well we're delivering for customers:
Security
How well are we responding to known vulnerabilities? This isn't about having zero CVEs. It's about remediation velocity. When a vulnerability is identified, how quickly does it get addressed?
Reliability
Are our services available when customers need them? Classic uptime measurement, but sliced by service and weighted by business impact. An outage during invoice runs matters more than the same outage at 2 AM on Sunday.
Performance
How quickly are customer requests handled? We measure from both the API perspective (p50, p95, p99 latencies) and the front-end perspective. Performance degradation is often invisible until it's catastrophic. Customers don't complain about a slow system, they just leave.
Responsiveness
When something goes wrong, how quickly does the team respond? This measures human behavior, not system behavior. Mean time to acknowledge, mean time to resolve. A team that takes 4 hours to acknowledge a P1 has a responsiveness problem, regardless of how good their code is.
Issue detection
How often do bugs escape to production that customers find first? Instead of measuring what we catch, we measure what slips through. Every customer-reported defect is a signal that our detection mechanisms failed.
Each pillar produces a score from 0-100 based on automated data collection. We weight them based on business priorities and roll them into a single composite score per team.
The goal wasn’t competitively ranking teams. Instead, we created a shared language for quality.
What we discovered
"Once the framework was visible, something interesting happened. Teams stopped arguing about whether quality was a problem and started asking which dimension needed attention."
When we first rolled this out, the average score was… fine. Not great. Not terrible.
What stood out was the variance. It was enormous. Practices were all over the map.
Worse, there was no shared language. When one team said "we're solid on quality," they meant something completely different from another team. Quality was tribal knowledge—inconsistent, subjective, and impossible to improve systematically.
Once the framework was visible, something interesting happened. Teams stopped arguing about whether quality was a problem and started asking which dimension needed attention. That shift alone was transformative.
This is the power of measurement: it creates convergence. Without a shared framework, quality stays tribal. With one, it becomes organizational capability.
What actually improved quality
Three practices made the biggest difference.
1. Quality gates that actually gate
This is going to sound so obvious. Automated checks in the delivery pipeline enforced minimum standards. We required specific kinds of testing as code moved through each environment. If critical tests weren’t passing, the service didn’t ship.
What’s surprising is how many organizations still don’t operate this way. Even now, release decisions are often based on confidence, urgency, manual tests, or “it looks fine”—especially as AI-generated code makes it easier than ever to ship changes that feel correct but haven’t been validated.
Doing this was uncomfortable at first. The fear was slower delivery.
The result was the opposite: teams shipped more frequently, with fewer regressions, because failures were caught earlier and closer to the code—before they reached customers.
2. Better visibility, even when it hurts
Improved security and reliability tooling initially made scores look worse. This wasn't because quality declined, but because visibility increased.
That’s a necessary phase. Seeing problems clearly is a prerequisite to fixing them. Once teams had concrete signals, the teams could address them quickly.
3. Treating quality like a product
The mindset shift was treating quality as a product, not a project. Products get roadmaps, investment, and continuous improvement. Projects get finished and forgotten.
We assigned ownership. We reviewed scores in leadership meetings. We celebrated improvements. We investigated regressions. Quality became something we actively managed, not something we hoped for.
Quality stopped being “everyone’s responsibility” in theory and became something actively managed in practice.
The outcome
"Quality and velocity aren't tradeoffs. When you have confidence in your quality systems, you can move faster."
Over time, scores rose meaningfully. Even more importantly, they converged as practices became more consistent across the organization. Incident volume dropped and customer-visible failures declined.
And here's what's counterintuitive: we didn't ship less. We shipped more. Our delivery predictability hit the highest ever. We deployed hundreds of times per day.
Quality and velocity aren't tradeoffs. When you have confidence in your quality systems, you can move faster.
What this unlocks
When quality becomes measurable, several things change:
Prioritization becomes objective. Instead of debating whether to fix tech debt or build features, you can see which quality dimensions are degrading and make informed tradeoffs.
Recognition becomes fair. Teams that invest in quality get credit. The work shows up in the numbers, not just in "trust me, we're solid."
Accountability becomes shared. Nobody owns reliability alone. Nobody owns performance alone. These are outcomes achieved together, visible to everyone.
Investment becomes defensible. When someone asks why a team or project needs more resources, you can show exactly how additional investment would move quality metrics that correlate with customer retention.
Getting started
"Make it un-bypassable."
If you want to build a similar framework:
Pick your pillars
Mine may not be yours. What dimensions of quality matter most to your customers? Aim for 4-6 that are orthogonal (measuring different things) and automatable (not requiring manual assessment).
Automate the measurement
If you can't collect the data automatically, you won't sustain the effort. Pull from your observability stack, your ticketing system, your security scanners, your CI/CD pipeline.
Make it visible
Dashboard the scores prominently. Review them in leadership meetings. Visibility creates accountability.
Enforce what matters
Start with one quality gate. Make it un-bypassable. Expand from there. The goal isn't to prevent shipping! It's to prevent shipping broken things.
Expect resistance, then results
Quality metrics and practices are controversial until they work. Give it time. The reduction in customer incidents will speak louder than any debate about deployment friction.
The mindset shift
"We tell our teams: quality isn't vibes. We manage it like a product."
We tell our teams: quality isn't vibes. We manage it like a product.
Your factory—the way you build software—is itself something you optimize. When you treat quality as measurable, you stop treating it as overhead and start treating it as capability.
Improving a global quality score, seeing the numbers go up. Fewer customer incidents and churn, seeing the impact to the business. More predictable deliveries and a trustworthy roadmap. These didn't happen because we hoped harder. They happened because we measured, we enforced, and we improved.
Quality isn't vibes. It's engineering.
Join the conversation
What dimensions of quality matter most in your organization? We're curious how others approach this. If this framework resonates, we'd love to hear your examples.
Join the conversation on Mathew’s original LinkedIn article.
Mathew Spolin leads global engineering at AppDirect, where managing organizational debt across multiple countries, acquisitions, and restructurings has been as critical as managing our technical debt. If this framework resonates, I'd love to hear your examples.
Related Articles
Technology Insights
Navigating Technical Debt: How to Identify, Measure, and Manage it
Technical debt comes from taking shortcuts in software development that make systems harder to maintain, scale, and improve. Learn about the four categories of technical debt, what to watch for, how to measure it, and how to manage it before it derails your upcoming software and scaling efforts.By Mathew Spolin / AppDirect / April 27, 2026
Technology Insights
How to Adopt AI Development at Scale: AppDirect’s Leap from 0% to Over 90% AI-generated Code
Mathew Spolin, our SVP of Engineering, shares how AppDirect scaled AI-assisted development to achieve almost all AI-driven code in one year. Learn the key organizational changes, metrics, and mindset shifts needed to accelerate delivery, improve quality, and transform software development.By Mathew Spolin / AppDirect / February 2, 2026
Strategy & Best Practices
There Is Nothing Like a Customer—Why Silence is not Success
Successful products aren’t born from guesswork or competitor analysis—they come from deep, relentless engagement with real customers. The toughest customers push you to build better solutions, turning friction into trust and growth. Embracing customer feedback at every stage is the key to product success and long-term market leadership.By Mathew Spolin / AppDirect / March 9, 2026