Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

TCGplayer.com Logo

Frontline Engineer

TCGplayer.com

$100,000 - $173,000
Aug 25, 2025
PA, US • TX, US • MN, US • IL, US
Apply Now

The Frontline Engineering team at TCGplayer (part of eBay) plays a pivotal role in ensuring the reliability, availability, and seamless performance of our platform, which serves millions of buyers and sellers globally within the $25B collectible hobbyist space. As the first line of defense for incident response and problem management, you'll have a direct impact on customer trust and satisfaction.

Requirements

  • Direct experience as an incident commander, including managing live incident calls, coordinating triage efforts, and driving communications during high-pressure situations.
  • Hands-on operational experience with AWS in a production environment, specifically executing runbooks, restarting EC2 instances, checking alarms, and pulling logs from CloudWatch.
  • Proficiency with Kubernetes, including troubleshooting containerized workloads, understanding pod health, managing deployments, and interacting directly with Kubernetes clusters.
  • Experience with scripting (Python, PowerShell, or Bash) to automate operational tasks or assist in incident resolution workflows.

Responsibilities

  • Serve as Incident Commander, leading real-time response efforts, managing communication across teams, triaging issues, and driving resolution of high-priority incidents to minimize downtime and business disruption.
  • Execute documented runbooks for troubleshooting and resolving production incidents involving AWS services (EC2, CloudWatch, IAM) and Kubernetes clusters (pods, deployments, scaling).
  • Collaborate closely with engineering teams post-incident, performing root cause analysis, documenting lessons learned, and driving the implementation of durable solutions.
  • Drive operational excellence by measuring and analyzing critical metrics (e.g., MTTR, SLA adherence) to identify improvement opportunities and implement impactful solutions.
  • Continuously refine and update operational runbooks and procedures, ensuring alignment with evolving technologies and business needs.
  • Proactively contribute to long-term strategic initiatives to improve incident management practices.

Other

  • This position is fully remote with a preference for candidates working within Eastern Standard Time (EST) or Central Standard Time (CST) hours.
  • Participation in an on-call rotation and occasional off-hours support for incidents is required.
  • A Bachelor’s degree in a technical field or equivalent experience (5+ years) in system administration, infrastructure engineering, or related roles; relevant certifications are a plus.
  • Strong communication skills with the ability to clearly articulate technical details and strategies to both technical and non-technical stakeholders.
  • Excellent problem-solving capabilities, able to stay composed and decisive under pressure during high-impact incidents.