DigitalOcean is looking to optimize and enhance core system performance, ensuring their products consistently exceed customer expectations and providing transparent insights into system capabilities and efficiency.
Requirements
- Knowledge of Linux kernel, hypervisors, and open-source operating systems
- 3+ years of experience with performance measurement tools such as profilers, perf, eBPF, fio, and MLPerf
- 3+ years developing strategies for managing, monitoring, and analyzing infrastructure, applications and services
- Proficiency in Golang, Python, and/or C
- Expertise in distributed systems performance, including tracing and debugging methodologies
- Experience with observability platforms such as Splunk, Prometheus, Grafana, Elastic, or Dynatrace
- Experience with Chef, AWX, and/or Kubernetes
Responsibilities
- Develop and implement comprehensive performance metrics, analysis tools, and reporting systems
- Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
- Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, and network stack to devise optimization strategies
- Identify system bottlenecks proactively and drive optimizations across the hypervisor software stack
- Work cross-functionally to harness new performance capabilities from evolving hardware architectures
- Enhance test frameworks and pipelines to ensure robust performance validation
- Investigate and resolve virtual machine downtime and performance issues in our production environment
Other
- Bachelor's or Master's degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
- Demonstrated ability to solve complex problems at scale
- Excellent cross-team collaboration and communication skills
- Professional-level written and spoken English with strong presentation abilities
- Familiarity with x86_64 and/or ARM architectures