Rose International is looking to support and maintain critical batch workflows that generate large-scale forecasts, ensuring the stability and reliability of complex, data-intensive systems.
Requirements
- Proficiency with Python (PySpark), Bash, and working knowledge of R.
- Experience running Apache Spark on YARN-managed clusters (preferably large-scale, on-premise).
- Familiarity with workflow orchestration tools (e.g., Airflow, Luigi, or custom equivalents).
- Experience with Terraform for infrastructure-as-code management.
- Hands-on experience implementing observability practices using OTEL, Kibana, REST APIs, and custom instrumentation.
- Proficient in GitHub workflows (branching, pull requests, code reviews).
- Must have (Spark, Hadoop, orchestration, observability, etc) - please do not submit if they do not h (5 yrs)
Responsibilities
- Monitor, restart, and troubleshoot daily batch workflows to ensure timely and reliable forecast generation.
- Diagnose and resolve complex technical issues across R, Python, Bash, and Spark components.
- Enhance existing jobs by contributing code changes in Python (PySpark), Bash, or Terraform.
- Implement and refine monitoring and observability solutions using OTEL, Kibana, REST APIs, and custom metrics.
- Manage version control, code reviews, and pull requests via GitHub.
- Proactively identify opportunities for workflow automation, performance tuning, and infrastructure improvements.
- Experience running Apache Spark on YARN-managed clusters (preferably large-scale, on-premise).
Other
- Must have GitHub, Hadoop, Spark, Terraform
- Exceptional analytical and troubleshooting abilities with a keen attention to detail.
- Strong verbal and written communication skills for effective cross-functional collaboration.
- Ability to balance operational stability with ongoing development and code improvements.
- Proactive, reliability-focused mindset with a drive for continuous improvement.