Wayvia is looking for a Software Engineer to join their Data Acquisition team to engineer efficient systems that empower clients through their products by designing and running large scale, highly distributed pipelines capable of scheduling, orchestrating, and executing tens of millions of data collection tasks daily across a diverse and constantly evolving landscape of sources. This involves tackling complex challenges in high-throughput web crawling, adaptive extraction, intelligent batching, fault tolerant processing, and seamless distribution of mission critical datasets.
Requirements
- Develop network services leveraging HTTP(S) protocol versions 1.1, 2, and 3.
- Implement and manage egress proxy networks with MITM traffic interception and routing optimization.
- Extend Chrome DevTools Protocol (CDP) with custom commands to support specialized use cases.
- Build and maintain scalable WebSocket infrastructures optimized for short-lived, high-volume connections.
- Design distributed caching systems to support high throughput containerized worker pools.
- Deploy and manage geographically distributed proxies with automated failover and load balancing.
- Proficiency with data parsing, transformation, and storage technologies.
Responsibilities
- Design and implement advanced capabilities to detect and counter anti-bot protection methods.
- Engineer complex network requests that accurately mimic real user behavior and environments.
- Develop network services leveraging HTTP(S) protocol versions 1.1, 2, and 3.
- Implement and manage egress proxy networks with MITM traffic interception and routing optimization.
- Extend Chrome DevTools Protocol (CDP) with custom commands to support specialized use cases.
- Build and maintain scalable WebSocket infrastructures optimized for short-lived, high-volume connections.
- Design distributed caching systems to support high throughput containerized worker pools.
Other
- This is a remote position open to candidates based in the United States.
- Unwavering focus on delivering measurable value to customers
- thrives on building resilient, scalable systems, solving real world data engineering puzzles, and pushing the boundaries of automation and efficiency in large-scale information retrieval.