Autodesk is looking to build new ML-powered product features that will help their customers imagine, design, and make a better world. This role will focus on building datasets that power generative AI features in Autodesk products.
Requirements
- Experience with software version control, unit tests, and deployment pipelines
- Strong data modelling, architecture, and processing skills with varied data representations including 2D and 3D geometry
- Experience with cloud services & architectures (AWS, Azure, etc.)
- Experience with relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra)
- Experience with frameworks such as Ray data, Metaflow, Hadoop, Spark, and Hive
- Experience with implementing ML models
- Experience working with large data lakes and data streams
Responsibilities
- Own and lead engineering projects in the area of data acquisition, ingestion, and curation
- Organize and curate large, unstructured, disparate multi-modal (text, images, 3D models, video) data sources into a unified format suitable for machine learning
- Develop and deploy highly scalable distributed systems to process, filter, and deploy datasets for use with machine learning
- Conduct and analyze experiments on data to provide insights
- Writing robust, testable code that is well documented and easy to understand
- Use data analysis, judgment, and interpretation to select the right course of action
- Apply creativity in recommending variations in approach
Other
- BSc or MSc in Computer Science, or equivalent industry experience
- Excellent written communication skills to document code, data analysis, and findings from experiments
- Team player with a high degree of curiosity
- Will not be intimidated by the details of domain specific file formats and will have the self-drive and creativity to connect the dots between information stored in different sources to provide new and useful features for machine learning models
- Proficiency in software engineering and cloud-based systems to deliver these features to machine learning projects through the creation and deployment of scalable data pipelines