EnDyna is looking to develop and implement machine learning models that process large-scale unstructured and structured text data in the cybersecurity domain.
Requirements
- Strong proficiency in Python and libraries such as spaCy, NLTK, Hugging Face Transformers, TensorFlow, or PyTorch.
- 8-10 years of experience building, training, and deploying NLP models in production environments.
- Strong knowledge of data engineering concepts: ETL processes, SQL/NoSQL databases, and data pipeline tools (e.g., Apache Spark, Airflow).
- Solid understanding of machine learning methods and predictive modeling.
- Experience working with cybersecurity datasets (e.g., CVE data, threat intelligence feeds, log analysis).
- Familiarity with cybersecurity frameworks (e.g., MITRE ATT&CK, NIST).
- Experience with cloud platforms (AWS, Azure, or GCP).
Responsibilities
- Design, develop, and deploy NLP pipelines for extracting, processing, and analyzing large-scale cybersecurity-related text data (e.g., threat reports, logs, vulnerability disclosures).
- Build and optimize predictive models to identify, classify, and forecast cybersecurity risks and trends.
- Implement advanced algorithms such as named entity recognition (NER), topic modeling, sentiment analysis, and text summarization.
- Develop data ingestion and transformation workflows from multiple structured and unstructured data sources.
- Design and maintain scalable data pipelines and data lakes to support analytics and model training.
- Conduct exploratory data analysis (EDA) to identify patterns, anomalies, and actionable insights.
- Perform feature engineering to enhance model accuracy and relevance.
Other
- Bachelor’s or Master’s degree in Artificial Intelligence, Data Science, Computer Science, or related field.
- Excellent analytical, problem-solving, and communication skills.
- Need to be a US citizen.
- Familiarity with containerization (Docker) and CI/CD for ML deployment.
- Free parking and close access to public transit.