About ten years into my career, I had the opportunity to work on a large-scale, business-critical talent management software system. One part of the platform I focused on was the application workflow — the feature where candidates applied for jobs.
This workflow might sound straightforward, but it was mission-critical. If it broke, candidates couldn’t apply, customers missed out on talent, and everyone lost valuable time. The challenge was that candidates themselves rarely reported errors; instead, our clients would eventually notice and file a bug report. By then, the damage had already been done.
The Push to Be Data-Driven
At the time, I was still in the habit of relying on my gut instincts to make decisions as an engineer. I’d fix things I thought were important, lean on my experience, and just “do what felt right.” But one of my PMs pushed me to take a different approach. She encouraged me to become more data-driven, showing me the value of letting the numbers guide our decisions rather than relying on instinct alone.
That mindset shift was eye-opening. It made me realize that our systems weren’t giving us the data we actually needed. We were blind to what candidates were experiencing in real time, and that’s why we were always one step behind when customers reported issues.
Being data-driven, I learned, wasn’t just about analyzing metrics after the fact. It was about instrumenting our systems so they produced the right data in the first place. With that data, we could stop guessing and start knowing.
The Plan
That’s what led me to focus on the application workflow. Since it was the entry point of the candidate experience, it deserved the most visibility. If we could capture and surface data about how it behaved in production, we’d have the signals we needed to make better decisions — not only about fixing bugs but also about prioritizing improvements.
So my task was clear: add monitoring and alerting around the workflow so that if errors occurred, the team would know right away. Instead of gut-driven firefighting, we’d have data-driven awareness.
The Execution
The good news was that implementation wasn’t technically difficult. I wrapped the entire workflow with logging, capturing errors and key signals at every step. We used Splunk as our standard tool for logs, so I set up dashboards to visualize trends and alerts to notify us when something went wrong.
It was a low-cost, high-value change: no noticeable performance impact, no resistance from the team, and the benefits were immediate.
The Results
Soon after rolling it out, we began catching corner cases and real production issues before customers ever reported them. As the team lead, this made my life infinitely easier. Instead of scrambling after a client filed a ticket, we could get ahead of the problem, often fixing it before they even noticed.
I can’t recall specific incidents now — there were many — but I do remember the relief of knowing we were no longer flying blind. The monitoring and alerting gave us confidence and visibility into a business-critical part of the platform.
Lessons Learned
That experience stuck with me, and to this day, I apply the same principle wherever I work:
- Proactive beats reactive. Monitoring and alerting let you surface issues before they get reported, saving time and frustration.
- Data over gut. Trusting data means you don’t have to rely solely on instinct. Numbers give you clarity and confidence.
- Low-cost, high-impact wins exist. Sometimes the simplest changes, like adding logging, can transform how a team operates.
Monitoring and alerting might not be glamorous, but it can be the difference between a smooth operation and frustrated customers. And for me, it was one of the first moments in my career where I truly understood the power of being data-driven.

Leave a comment