The company is looking to build, architect, design, and integrate solutions in a large-scale enterprise of highly distributed applications, specifically focusing on developing and delivering software solutions in Big Data platforms using the Hadoop ecosystem. They need to perform data modeling, build and support CI/CD pipelines, and assist various teams with their data-related technical issues and needs.
Requirements
- designing and developing data processing applications using Pandas libraries in Python
- automated unit and integration testing using Python code
- testing and optimizing the performance of data processing applications
- using statistical methods and predictive modeling techniques with tools including Python and SAS
- Jenkins for continuous integration and continuous deployment (CI/CD)
- Oracle and Teradata databases for data storage and management
- managing data processing application releases, using tools such as Application Release Management (ARM)
Responsibilities
- Build, architect, design, and integrate solutions in a large-scale enterprise of highly distributed applications.
- Responsible for developing and delivering software solutions in Big Data platforms using Hadoop ecosystem.
- Perform data modeling to deliver logical and physical data models to enable database and data consumption solutions.
- Build and support continuous integration and deployment using CI/CD tools.
- Work with stakeholders, including Analytics, Machine Learning, and Product teams, to assist with data-related technical issues and support their data needs.
- Develop Python- and Pyspark-based solutions to facilitate business decisions and refractor existing solutions.
Other
- Coordinate with lines of business to build the Software Development Life Cycle (SDLC) process.
- Interact with business stakeholders and IT partners, participate in business product backlog refinement sessions, and prioritize the backlog.