Meta Platforms, Inc. is looking to solve the problem of building and maintaining business critical distributed storage systems to support its growing technologies and services.
Requirements
- TCP, firewall, troubleshooting and configuration
- Linux Automation: shell scripting, package management and optimization and Diagnosing and fixing broken Linux servers (software and hardware)
- C++ or Python
- Experience with stress and load testing software and frameworks like fio, stress-ng, tc or similar framework
- Performance Analysis: Profiling and debugging C++ services to identify bottlenecks, memory issues, and inefficiencies
- Experience in contributions to User Space Distributed File System Like HDFS, Ceph or similar system
- Experience in log collection and aggregation systems as well as time series metric collection systems (ELK, Splunk, New Relic, or similar system)
Responsibilities
- Operate, Support, Develop, Troubleshoot business critical distributed storage system.
- Ensure balance between optimal performance, stability, growth potential.
- Coordinate work with XFN teams, track and guarantee SLO, ensure steady performance through automated stress and functional testing.
- Identify and resolve performance, software and hardware issues in real time.
- Contribute to User Space Distributed File System Like HDFS, Ceph or similar system
- Investigating, diagnosing, and mitigating failures in a large-scale distributed environment
- Experience in log collection and aggregation systems as well as time series metric collection systems (ELK, Splunk, New Relic, or similar system)
Other
- Bachelor's degree (or foreign degree equivalent) in Computer Science, Engineering, Information Systems, Analytics, Mathematics, Physics, Applied Sciences, or a related field
- 2 years of experience in the job offered or in a computer-related occupation
- Individual compensation is determined by skills, qualifications, experience, and location
- Meta offers benefits
- Equal Employment Opportunity employer