Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

The Allen Institute for AI Logo

Senior Software Engineer, Data

The Allen Institute for AI

$146,880 - $220,320
Aug 14, 2025
Seattle, WA, US
Apply Now

The Allen Institute for AI (Ai2) is hiring a Data Engineer to help integrate a large U.S. patent corpus into the Semantic Scholar platform. This NSF-funded role focuses on high-impact data engineering: linking patent and academic research data, resolving citations, disambiguating inventors and authors, applying topic models, and extending data products and APIs.

Requirements

  • Strong Python engineering skills, especially for building and maintaining data pipelines
  • Experience with SQL and schema design in production settings (PostgreSQL preferred)
  • Familiarity with common ML workflows (training classifiers, tuning models, and deploying for inference), particularly for large-scale or ambiguous structured datasets
  • Comfortable working with structured datasets (XML/JSON/Parquet) and writing ETL code
  • Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (e.g. AWS, S3, Docker)
  • Experience with author disambiguation, entity resolution, or record linkage problems
  • Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale

Responsibilities

  • Build scalable data pipelines (Airflow) for citation resolution and corpus integration
  • Develop and deploy lightweight ML models for inventor disambiguation and author linking
  • Train or adapt a topic model to classify patents using titles, abstracts, claims, and specs
  • Extend REST APIs to expose linked metadata and topic classifications
  • Contribute to dashboards and tools for evaluating data quality and model precision
  • Collaborate with Ai2 engineers to ensure maintainability, test coverage, and robust deployment
  • Produce reliable, well-documented code and contribute technical designs that support long-term maintainability

Other

  • Persons in these roles are welcome to work remotely from any state in the US.
  • This is a fixed term position scheduled for 2 years with the possibility of renewal.
  • Strong communicator and a strong sense of ownership for results
  • Must be able to remain in a stationary position for long periods of time.
  • The ability to communicate information and ideas so others will understand.