Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Evaluation Engineer

bedrock

Salary not specified

Dec 18, 2025

New York, NY, US • New York, NY, US • Remote, US • San Francisco, CA, US • CA, US • Remote, US

Bedrock is bringing autonomy to the construction industry by deploying advanced autonomous systems on heavy machinery. The business problem involves translating the complexities of the built world into actionable AI-native evaluations to accelerate the adoption of Bedrock Operator technology.

Requirements

Proficiency in Python and a data warehouse query language
Comfort with development on infrastructure within parallelized cloud-based frameworks
Strong statistical analysis skills (e.g. classification, model fit bias determination, hypothesis testing, and uncertainty quantification)
Experience working with large datasets
2+ years of professional experience analyzing modern ML or robotics system performance on real-world problems
5+ years of professional software engineering, data science, or research experience
Bonus points: Applied statistical backgrounds to ML research or real-world robotics applications

Responsibilities

Design and maintain eval systems: Build pipelines for measuring system performance – across open loop and closed loop simulation, hardware in the loop systems, and field data from Bedrock Operator equipped machinery. Excite other teams to gain insights earlier in the development cycle through streamlined workflows.
Develop metrics: Connect product goals and system behavior - by bridging real-world specification to measurable indicators from logged data. Empower confident decision making from parameter tuning to program planning by slicing through the noise and delivering objective insights.
Classify data sources for training and testing: Implement infrastructure and classifiers - to self-annotate data and allow creation of datasets for a variety of training and evaluation use cases.
Leverage models to source rich annotations for massive datasets to accelerate model iteration.
Predict system performance: Model metrics and interpret results - from various sources ranging from raw sensor data to key leading indicators.
Determine whether new construction sites pose hidden challenges and drive business decisions about deployment readiness.
Iterate on complex ML systems run in production environments and understand the complexities that come with it.

Other

Engineers who are currently Senior or Staff level
Flexible roles with consideration for candidates not fitting all criteria or located in other areas (especially SF or NY)
Collaboration with construction veterans and experienced engineers
Work directly impacting how the physical world gets built