Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Costco IT Logo

Software Engineer - Central Engineering

Costco IT

$85,000 - $225,000
Sep 5, 2025
Seattle, WA, US
Apply Now

Costco IT is looking to solve the business problem of ensuring the stability, performance, and scalability of their complex enterprise-level IT infrastructure and applications that underpin their global retail and supply chain operations. This involves driving strategic initiatives to enhance system resilience, reduce mean time to recovery, optimize costs, and champion a culture of operational excellence and continuous delivery across diverse technology stacks.

Requirements

  • Deep expertise in managing and optimizing complex enterprise-level systems (e.g., ERP, WMS, e-commerce platforms) across various operating systems (Linux, Windows) and high-volume, high-velocity platforms.
  • Proven expertise in working with cloud platforms (e.g., AWS, Azure, GCP) to architect and implement scalable and efficient platforms and services.
  • Expert in using modern software development tools, Git, branching and versioning patterns and practices, and continuous integration/continuous deployment (CI/CD) pipelines.
  • Strong proficiency in at least one scripting/automation language (e.g., Python, PowerShell, Bash) and object-oriented code, scripting, and infrastructure as code (e.g., Terraform).
  • Extensive experience with modern monitoring, logging, and observability tools (e.g., Splunk, Datadog, Prometheus, Grafana).
  • Solid understanding of networking concepts (TCP/IP, DNS, Load Balancing) and the ability to configure, manage, and troubleshoot network infrastructure, including cloud networking components.
  • Experience with containerization technologies (Docker, Kubernetes).

Responsibilities

  • Oversees the proactive monitoring, analysis, and tuning of critical production systems, databases, and network infrastructure to ensure optimal performance and stability.
  • Implements robust telemetry, monitoring, and alerting solutions to provide real-time visibility into system health and potential issues.
  • Leads root cause analysis (RCA) efforts for major incidents, driving permanent solutions to prevent recurrence.
  • Troubleshoots and optimizes automation, reliability, and monitoring for delivered products.
  • Develops "best-in-class" engineering for services by ensuring that services and components are well-defined, modularized, reusable, secure, reliable, diagnosable, and actively monitored.
  • Drives the adoption of automation and Infrastructure as Code (IaC) principles to streamline deployment, configuration, and operational tasks across on-premise and cloud environments.
  • Champions CI/CD practices for operational changes and collaborate with development teams to embed operational readiness into the software development lifecycle.

Other

  • 15+ years of experience in IT operations, site reliability engineering (SRE), software engineering, or platform engineering, with at least 5 years in a leadership, director, or principal-level role managing and implementing technical delivery within a large-scale, global enterprise environment.
  • Demonstrated ability to lead during critical incidents, perform root cause analysis, and drive problem resolution, taking ownership and responsibility of critical issues.
  • Excellent problem-solving and analytical skills, with the ability to dissect complex technical challenges and propose innovative solutions.
  • Strong communication and leadership abilities, with a proven track record of collaborating effectively in cross-functional teams, mentoring, and motivating software engineers.
  • If hired, you will be required to provide proof of authorization to work in the United States.