Meta Platforms, Inc. is looking to solve complex technical issues in its data centers and improve platform health, while moving beyond 2D screens toward immersive experiences like augmented and virtual reality.
Requirements
- Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues
- Server hardware and components, including storage
- Interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
- HTTP, DNS, RAID, and DHCP
- Debugging, modifying and developing scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
- Out-of-band/lights-out server communication methods, including IPMI and serial console
- Using data and metrics to drive decisions
Responsibilities
- Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls.
- Perform deep dives and root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues.
- Facilitate collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation.
- Lead the introduction of new platforms and hardware to the site and geographical area, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production.
- Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers.
- Drive corrective actions of complex hardware issues, work with internal teams and vendors
- Solve complex and systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution.
Other
- Requires a Master’s degree (or foreign equivalent) in Computer Science, Computer Software, Computer Engineering, Telecommunications or related field
- Participate in 24/7 on-call rotation
- Coach and mentor team members to evaluate and identify better ways to resolve issues, and define updates to tools and processes.
- Build cross functional relationships and influence policies and procedures that improve global data center operations.
- Provide engineering support and be a go-to technical resource and Subject Matter Expert for the team, leadership, and cross-functional teams in all aspects of operating and maintaining data center servers.