GoDaddy: Metrics as a Service
Role: Sr Dir of SRE (Observability & ITSM)
Overview: Provided a centralized timeseries platform that initially delivered over 1,000 metrics per server, enabling BI, Finance, Program Management, and others to plan a rational migration strategy from private data centers to the public cloud. Followed up with application metrics empowering developers to better understand their applications and deployments.
Situation: GoDaddy needed a unified and scalable way to collect, store, and visualize metrics from its infrastructure and applications. This was particularly crucial for planning a major migration to the public cloud, triaging inflight issues, and for giving developers better visibility into their application performance.
Task: To design, build, and operate a centralized timeseries platform capable of ingesting over 1,000 metrics per server and millions of metrics per second. This platform needed to serve diverse stakeholders, from business intelligence and finance teams to individual developers.
Action:
- Designed and implemented a scalable timeseries platform.
- Initially focused on delivering comprehensive server-level metrics (over 1,000 metrics per server).
- Extended the platform to support application-specific metrics, empowering developers with insights into their application behavior and performance.
- Provided tools and APIs for easy metric submission and consumption.
- Enabled BI, Finance, and Program Management teams to use this data for strategic planning, including the migration from private data centers to the public cloud.
- Implemented wrappers and libraries for developers to easily emit metrics, check application state, and even trigger automated remediation actions.
- Tech Stack Used: Carbon C-relay, Go-carbon, Graphite.
Result: Developers throughout the organization were able to use a simple wrapper and deliver functions that would give state, collect metrics and even remediate active issues. The “Metrics as a Service” platform became a cornerstone of GoDaddy’s data-driven operations. It provided critical data for strategic decision-making regarding cloud migration and equipped developers with the tools to understand and optimize their applications.
Context: This centralized metrics platform was instrumental in GoDaddy’s efforts to modernize its infrastructure, optimize costs, and improve service reliability. It fostered a culture of data-driven decision-making across the organization.
