Aquent is seeking a Data Scientist to lead the development and deployment of sophisticated AI capabilities, focusing on innovation and data-driven strategies to transform the way services are delivered and make a significant impact on the industry.
Requirements
- Advanced degree in Computer Science, Data Science, Statistics, Engineering, or a related quantitative field.
- Minimum of 4 years of experience in data science or applied ML/NLP with a strong focus on NLP & Generative AI.
- Proficiency in Python and SQL, coupled with strong engineering practices for building maintainable, testable pipelines.
- Strong experience with Databricks for data processing and pipeline development, including Spark and common lakehouse patterns.
- Demonstrated experience building retrieval-grounded LLM systems or LLM-based information extraction solutions for real-world use cases.
- Experience with document ingestion and parsing, including OCR and handling messy, semi-structured content such as PDFs, tables, forms, and web pages.
- Familiarity with vector databases and retrieval concepts, including indexing, embeddings, hybrid retrieval, reranking, and performance and cost tuning.
Responsibilities
- Architect, build, and refine retrieval-grounded LLM systems, including advanced RAG patterns, to deliver grounded, verifiable answers and insights.
- Design robust pipelines for ingestion, transformation, and normalization of public and internal data, including ETL, incremental processing, and data quality checks.
- Build and maintain document processing workflows across various formats like PDFs, HTML, and scanned content, incorporating OCR, layout-aware parsing, table extraction, metadata enrichment, and document versioning.
- Develop information extraction pipelines using LLM methods and best practices, including schema design, structured outputs, validation, error handling, and accuracy evaluation.
- Own the retrieval stack end-to-end, encompassing chunking strategies, embeddings, indexing, hybrid retrieval, reranking, filtering, and relevance tuning across vector databases or search platforms.
- Implement web data acquisition where necessary, including scraping, change detection, source quality checks, and operational safeguards like retries and rate limiting.
- Establish evaluation and monitoring practices for retrieval and extraction quality, including golden datasets, regression testing, groundedness checks, and production observability.
Other
- Be authorized to work in the United States
- Not require sponsorship of any kind for the duration of the assignment
- Be able to work on a W-2 basis. C2C or 1099 is not permitted for this role
- Excellent communication skills, with a proven track record of partnering with stakeholders and transforming ambiguous requests into adopted solutions.
- Ability to work fully remote within the United States