World Wide Technology is looking to solve the problem of evaluating data integrity and identifying errors for evaluation and resolution in the Government Services team by developing an AI model.
Requirements
- Strong proficiency in programming languages like Python for data processing pipeline development
- Experience with distributed data processing frameworks (e.g., Apache Spark) and SQL for handling large-scale datasets
- Proficiency in working with databases (SQL and NoSQL) and data storage systems (e.g., Oracle Database, Delta Lake)
- Experience working with vector databases or similar technologies (e.g., Pinecone, FAISS)
- Familiarity with orchestration tools (e.g., Apache Airflow, Prefect) for data workflows
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services (e.g., Azure AI Search, BigQuery, Redshift)
- Proficiency with common DevOps practices, including CI/CD pipelines and containerization (Docker)
Responsibilities
- Design and construct scalable data pipelines and ETL processes for ingesting and transforming structured and unstructured data, using OCR, large language models, vision-language models, and reasoning agents
- Build and optimize modern data storage cloud environment, including cloud-based data lakes and warehouses, to support advanced analytics and GenAI workflows.
- Implement data integration workflows that prioritize performance, security, and scalability while ensuring data quality and governance.
- Collaborate with data scientists and machine learning engineers to prepare high-quality datasets for model development, training, and inference.
- Monitor and troubleshoot data pipeline performance, addressing bottlenecks and failures to ensure reliability and scalability.
- Employ DevOps practices to set up CI/CD pipelines, automate testing, and ensure reliable deployment of data workflows and services.
Other
- Bachelor's degree in Computer Science or related field, or equivalent experience
- 3–5 years of experience in data engineering
- Strong conceptual problem-solving
- This is a full-time direct hire position. We are not able to offer visa sponsorship, 1099 status, or work with C2C for this role.
- Preferred locations: MO, FL, NC, TX, AZ, IL, MA, VA, AL, LA, GA, MN, OH, MI, WI, IA, SC, NY