ProRata is looking to design and build its next generation of large-scale distributed web crawling infrastructure to ingest and process massive amounts of web data.
Requirements
- 7+ years of experience in backend development, with strong proficiency in Java.
- Solid understanding of data structures, algorithms, and system design fundamentals.
- Experience with relational databases (e.g., PostgreSQL, MySQL).
- Hands-on experience with NoSQL databases (e.g., MongoDB).
- Practical knowledge of Docker and an understanding of Kubernetes or container orchestration.
- Proven track record of building and scaling large distributed or data-intensive systems.
- Familiarity with OpenTelemetry or similar frameworks for tracing, metrics, and logging.
Responsibilities
- Design and develop high-scale, fault-tolerant crawler systems that ingest and process massive amounts of web data.
- Own the full lifecycle of crawler components — from architecture and design to deployment, monitoring, and continuous optimization.
- Iterate rapidly to enhance system performance, scalability, and reliability through data-driven experimentation.
- Build automation agents for post crawl verifications.
- Implement observability across crawler services using OpenTelemetry and modern monitoring tools to ensure system transparency and health.
- Champion best practices in distributed systems design, code quality, and operational excellence.
Other
- This position is Onsite.
- This role is based at our Bellevue WA (or Pasadena, CA) office location, and employees are expected to work on-site during regular business hours.
- Collaborate with other engineering teams to ensure seamless integration of crawlers within broader ecosystem.
- Prior experience developing or maintaining large-scale web crawlers is highly desirable.
- Experience with ClickHouse for large-scale analytical workloads.