About You: - Systems Thinker: You see patterns. When five alerts fire, you identify the one root cause. When metrics look wrong, you question the instrumentation, not just the application.
- Adaptability & Curiosity: Agile and curious, enjoy exploring new tools, technologies, and concepts to continuously improve observability solutions.
- Strong Observability Fundamentals: Previous experience designing and/or implementing observability frameworks.
- You've focused on RED/Golden metrics, adapting them for a range of complex environments.
- Leveraged statistical analysis/techniques to validate data, driving meaningful insights, alerting capabilities, and log aggregation patterns.
What You'll Do: Build observability solutions that actually work. This isn't a role where you monitor dashboards all day. You'll work directly with customers to understand their systems, then design, guide, and implement frameworks/solutions that give them the insights they actually need. Translating vague observability pain into observability gain.
- Work directly with customers to identify key operational needs, providing them with customized observability solutions that translate into business value.
- Build and nurture strong relationships with customers, ensuring clear and effective communication around technical requirements and deliverables.
- Interpret and refine abstract customer requests, working with our engineering team to develop clear, practical action items.
What You Must Bring: - Programming/Scripting Tools: Strong experience with Python, Go, or Bash (other languages welcome)
- Container Orchestration: Kubernetes and Docker (you've debugged pods at 3 AM)
- Cloud Platforms: Hands-on knowledge of AWS, Azure, GCP, or physical data center architecture
- Log Parsing & Automation: Expertise in log processing and automation, with tools such as Ansible, Puppet, Chef, or Salt.
Observability Platform Experience: - Datadog: Preferred - expertise with APM, custom metrics, log pipelines, and platform integration
- OR equivalent hands-on experience with observability tools such as: Splunk, Dynatrace, New Relic, AppDynamics, Prometheus, Grafana, or any other open-source observability tech
- If you've built complex observability solutions elsewhere and have a strong infrastructure/SRE background, we can bring you up to speed on Datadog
- Bonus: You've worked across multiple platforms (migrations, integrations, or multi-tool environments) and can articulate the trade-offs between them
Nice to Have Experience: - OpenTelemetry: Proven experience implementing OTel for distributed tracing, metrics, and logging across complex application environments.
- You've instrumented production systems, understand sampling strategies, and can debug why spans aren't propagating correctly
- Java: Experience working with: JVM internals, garbage collection, and performance characteristics
- Debugging memory leaks in production, knowing why that thread pool is sized exactly that way, or reading a heap dump and telling a story - to name a few...
Desired Characteristics: - A high tolerance for fast-paced, evolving environments and a preference for taking on new challenges.
- Insatiable desire to learn and grow with a strong commitment to continuous innovation in the observability and cloud monitoring space.
- Ability to communicate technical concepts clearly and directly... We practice "Radical Best Intent" here, which means addressing problems head-on without sugarcoating.