The Washington Nationals are seeking a software engineer to help ensure their datasets are well organized and accessible for R&D analysts and other stakeholders in Baseball Operations, by building impactful solutions around data workflows.
Requirements
- Fluency in Python and SQL
- Experience with orchestration tools (e.g Prefect, Airflow, Dagster, etc.)
- Proficient with MySQL, PostgreSQL, DuckDB, or other relational database systems
- Experience with AWS or other cloud providers
- Some experience with Terraform and/or Ansible is a plus
- Comfortable working on the command line in a Linux environment
Responsibilities
- Build robust data pipelines and ETL processes that pull data from a variety of sources (HTTP APIs, cloud object stores like AWS S3, relational databases) and write to our internal data systems
- Assist with the deployment, orchestration, and monitoring of our data pipelines and machine learning pipelines. We use Prefect for orchestration, utilizing AWS Fargate on ECS
- Design and build solutions to make working with our internal datasets easier. This work includes maintaining database tables and views, building out our Apache Iceberg data lakehouse, merging datasets from different sources into consistent formats, and building internal APIs to make data more accessible
- Develop validation processes to monitor data quality and flag potential sources of error
- Assist with the maintenance of our cloud computing infrastructure: manage and configure servers, databases, and other systems
- Research and advocate for any new tooling that can aide in timely, accurate and accessible data delivery
Other
- Bachelor’s degree in computer science, computer engineering, information science, or a related field
- 4+ years of relevant work experience
- Ability to work independently with close attention to detail
- Authorized to work in the United States
- Willing to relocate to Washington, DC area for in person work at Nationals Park (or fully remote option for exceptional candidates)