EPAM is looking to hire a Site Reliability Engineer to architect, implement, and support cutting-edge data science platforms like Alteryx, Dataiku, and Azure Machine Learning, which are crucial for generating business insights in wealth management, investment banking, and corporate functions.
Requirements
- 2+ years of hands-on administrative experience with data science platforms such as Alteryx Server, Dataiku, or Azure Machine Learning
- Strong MongoDB performance monitoring and optimization skills with focus on automation and reliability
- Demonstrated proficiency in DevOps practices using Unix Shell, Python, PowerShell scripting, or other programming languages
- Proven ability to analyze complex problems, design effective solutions, and implement technical improvements at scale
- Unix and/or Windows administration experience (Optional)
Responsibilities
- Design and implement robust infrastructure solutions to support enterprise-scale data science platforms across multiple global regions
- Provide expert-level production support for engineering teams and business stakeholders using MongoDB-based data science environments
- Develop automation frameworks that enhance system reliability, performance monitoring, and incident response capabilities
- Troubleshoot and resolve complex technical incidents as a Problem Manager, ensuring minimal disruption to business operations
- Collaborate with cross-functional teams to continuously improve core infrastructure and implement modern data science initiatives
- Ensure all platform changes and enhancements adhere to operational guidelines and compliance requirements across international jurisdictions
Other
- Experience influencing IT stakeholders and business partners in enterprise technology environments
- Willingness to participate in occasional on-call rotation and weekend support for critical activities
- H1B visa sponsorship is not available for this position.