Equifax is seeking a Python Developer to develop, maintain, and optimize Python-based web scrapers for collecting, cleaning, and structuring data from diverse sources.
Requirements
- Python development
- Python code to extract data from websites
- Web scraping tools/libraries (Beautiful Soup, Scrapy, requests, Playwright, Selenium)
- Handling large data sets
- HTML, CSS, JavaScript, and XML
- Designing, querying, and managing data in SQL or NoSQL databases
- Data cleaning and validation techniques
Responsibilities
- Develop and maintain Python-based web scrapers to efficiently extract structured and unstructured data from various websites and sources.
- Design scripts to automate repetitive scraping tasks and schedule jobs using tools like cron or Airflow.
- Store and manage scraped data in databases (SQL/NoSQL) or cloud storage solutions.
- Utilize tools and techniques to bypass CAPTCHAs, IP blocking, and other challenges encountered during web scraping.
- Ensure scrapers are optimized for performance and can handle large-scale scraping without crashing or slowing down.
- Process and clean data: Transform raw scraped data into structured formats (e.g., CSV, JSON) and ensure data quality through validation and cleaning processes.
- Collaborate with data analysts, product managers, and other developers to understand data requirements and deliver high-quality results.
Other
- A Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field.
- 3+ years of professional experience in software engineering
- English proficiency of B2 or higher.
- Understanding the importance of respecting website terms of service and avoiding harmful scraping practices.
- Experience with cloud platforms like AWS, Google Cloud, or Azure.
- Network traffic understanding or experience.
- Experience working with SDLC and Testing.
- Proficiency with version control systems, particularly Git, for collaborative development and code management.
- Familiarity with CI/CD pipelines.