Meta Platforms, Inc. (Meta) is looking to solve the problem of ensuring optimal performance and capacity for growth in their software services, while also reducing complexity and friction in machine learning model training in production. They aim to build a scalable, efficient, data-driven, and reliable platform that empowers Machine Learning engineers.
Requirements
- UNIX or Linux operating system fundamentals
- TCP/IP network fundamentals
- Coding in at least one of the following higher-level programming languages: PHP, Python, C++, or Java
- Software frameworks and APIs
- Performing 'guerilla capacity planning' for internet service architectures
- Internet service architectures (such as load balancing, LAMP, or CDN’s)
- Configuring and maintaining applications using at least one of the following: web servers, load balancers, relational databases, storage systems, or messaging systems
- Relational Databases including MySQL
- Network protocols including at least one of the following: NFS, DHCP, NTP, SSH, DNS, or SNMP
- Maintaining web-based applications using at least one of the following: Apache, Redis Cache, Memecached, or Squid
- Storage Systems including NFS
- Network Management tools like DHCP, NTP, SSH, DNS, or SNMP
- Diagnosing and troubleshooting issues ranging from low-level hardware issues to large scale failures within datacenter clusters
- Experience utilizing high performance query engines (Presto, Splunk or Spark) for big data
Responsibilities
- Develop, design, create, modify, and/or test software services to ensure optimal performance and capacity for growth.
- Own back-end data warehouse services, front-end services like Messenger and Newsfeed, and infrastructure components to ensure services run without incident.
- Write and review code, develop documentation and capacity plans, and debug the problems in real time in highly complex software systems.
- Serve as an escalation contact for service incidents.
- Build a scalable, efficient, data-driven and reliable platform that empower Machine Learning engineers to own the Machine-Learning training lifecycle end-to-end.
- Reduce the complexity and friction in machine learning models training in production.
- Create and validate growth of models for various services to ensure the long-term sustainability of Meta’s services and products.
Other
- Requires a Master’s degree in Mobile and Internet of Things Engineering, Computer Science, Computer Engineering, Computer Software, Electronics Engineering, Applied Sciences, Mathematics, Physics or related field.
- Requires completion of a university-level course, research project, internship, or thesis in the following:
- Work on problems of diverse scope where analysis of data requires evaluation of identifiable factors.
- Demonstrate good judgment in selecting methods and techniques for obtaining solutions.
- Individual compensation is determined by skills, qualifications, experience, and location.