Macy's is looking to bridge the gap between experimental machine learning model development and reliable production systems by hiring a Lead Machine Learning Engineer to oversee the entire lifecycle of ML models, automate ML pipelines, optimize training and serving, ensure model governance, and maintain system stability.
Requirements
- Expertise in managing cloud-based ML infrastructures (GCP, AWS, or Azure), coupled with DevOps practices, ensures seamless model deployment, scalability, and system reliability. This includes containerization, CI/CD pipelines, and infrastructure-as-code tools.
- Proficiency in programming languages such as Python, SQL, and Java.
- Experience using Tensorflow, PyTorch, scikit-learn, Kubeflow, pandas and numpy.
- frameworks like Ray, Dask preferred
- Expertise in data engineering, object-oriented programming, and familiarity with microservices and cloud technologies
Responsibilities
- Build and optimize ML pipelines for feature engineering, model training, and inference
- Develop low-latency, high-throughput model endpoints for distributed environments
- Maintain cloud infrastructure for ML workloads, including GPUs/TPUs, across platforms like GCP, AWS, or Azure
- Troubleshoot, debug, and validate ML systems for performance and reliability
- Write and maintain automated tests (unit and integration)
- Supports discussions with Data Engineers to work on data collection, storage, and retrieval processes.
- Collaborate with Data Governance to identify data issues and propose data cleansing or enhancement solutions.
Other
- Work with Data Science Leadership and Stakeholders to understand business objectives, map scope of work, and lead junior colleagues in achieving technical deliverables
- Invest in strong relationships with colleagues and build a successful followership around a common goal
- Drive continuous improvement efforts in enhancing performance and providing increased functionality, including developing processes for automation
- Foster an environment of acceptance and respect that strengthens relationships, and ensures authentic connections with colleagues, customers, and communities
- 5+ years of industry experience working with machine learning tools and technologies
- Familiarity with agile development frameworks and collaboration tools (e.g., JIRA, Confluence)
- An ongoing learner who seeks out emerging technology and can influence others to think innovatively
- Gets energized by fast-paced environments and capable of supporting multiple projects - can identify primary and secondary objectives, prioritize time, and communicate timelines to team members
- Regularly required to sit, talk, hear; use hands/fingers to touch, handle, and feel
- Occasionally required to move about the workplace and reach with hands and arms
- Requires close vision
- Able to work a flexible schedule based on department and company needs