The data science team is looking for a Data Analyst to help grow their data professional capabilities and work with purpose to help others (animals, coworkers, veterinarians). This role involves leveraging data analysis techniques, including NLP, to improve data quality and contribute to machine learning applications.
Requirements
- Experience or willingness to quickly learn working with Apache Spark infrastructure
- Strong experience with Python and SQL; pyspark and DBT would be a strong plus and an excellent fit for this position
- Familiarity with navigating and pulling/working with data in Databricks, Snowflake, and AWS
- 2-3 years of experience focused in providing data quality with natural language processing (NLP) techniques
- Basic software engineering practices/standards and documentation skills, experience using git/previous projects/work on GitHub
- Experience with object oriented programming principles is a plus
- Experience using ontologies and hierarchical taxonomies for normalization and machine learning applications
- Interest in not just generative AI but also machine learning at large including NER, LLMs, fuzzy matching and statistics
Responsibilities
- working with Apache Spark infrastructure
- providing data quality with natural language processing (NLP) techniques
- navigating and pulling/working with data in Databricks, Snowflake, and AWS
- using ontologies and hierarchical taxonomies for normalization and machine learning applications
- working with generative AI and machine learning at large including NER, LLMs, fuzzy matching and statistics
- applying basic software engineering practices/standards and documentation skills
- using git/previous projects/work on GitHub
Other
- A desire to grow as a data professional working with purpose to help others (animals, coworkers, veterinarians)
- Relative comfort in presenting in front of stakeholders and breaking down big-picture ideas for both technical and non-technical audiences
- Adaptability to variable project timelines/open ended problems/solutions
- Being open to communicating across teams and being the first to message or scheduling skip level meetings
- Problem solving abilities in real-world applications (anything from outside the box thinking to simple basic principles, creativity is encouraged but we want to make sure maintainability is considered)
- Time management skills that can handle the real-world timeline of intake requests -> requirements discussion -> delegation -> preparing for presentation and tweaks -> delivery (teamwork is important and preparation is 85% or more of the success of a project!)