Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Frontline Engineer

TCGplayer.com

$100,000 - $173,000

Aug 25, 2025

PA, US • TX, US • MN, US • IL, US

The Frontline Engineering team at TCGplayer (part of eBay) plays a pivotal role in ensuring the reliability, availability, and seamless performance of our platform, which serves millions of buyers and sellers globally within the $25B collectible hobbyist space. As the first line of defense for incident response and problem management, you'll have a direct impact on customer trust and satisfaction.

Requirements

Direct experience as an incident commander, including managing live incident calls, coordinating triage efforts, and driving communications during high-pressure situations.
Hands-on operational experience with AWS in a production environment, specifically executing runbooks, restarting EC2 instances, checking alarms, and pulling logs from CloudWatch.
Proficiency with Kubernetes, including troubleshooting containerized workloads, understanding pod health, managing deployments, and interacting directly with Kubernetes clusters.
Experience with scripting (Python, PowerShell, or Bash) to automate operational tasks or assist in incident resolution workflows.

Responsibilities

Serve as Incident Commander, leading real-time response efforts, managing communication across teams, triaging issues, and driving resolution of high-priority incidents to minimize downtime and business disruption.
Execute documented runbooks for troubleshooting and resolving production incidents involving AWS services (EC2, CloudWatch, IAM) and Kubernetes clusters (pods, deployments, scaling).
Collaborate closely with engineering teams post-incident, performing root cause analysis, documenting lessons learned, and driving the implementation of durable solutions.
Drive operational excellence by measuring and analyzing critical metrics (e.g., MTTR, SLA adherence) to identify improvement opportunities and implement impactful solutions.
Continuously refine and update operational runbooks and procedures, ensuring alignment with evolving technologies and business needs.
Proactively contribute to long-term strategic initiatives to improve incident management practices.

Other

This position is fully remote with a preference for candidates working within Eastern Standard Time (EST) or Central Standard Time (CST) hours.
Participation in an on-call rotation and occasional off-hours support for incidents is required.
A Bachelor’s degree in a technical field or equivalent experience (5+ years) in system administration, infrastructure engineering, or related roles; relevant certifications are a plus.
Strong communication skills with the ability to clearly articulate technical details and strategies to both technical and non-technical stakeholders.
Excellent problem-solving capabilities, able to stay composed and decisive under pressure during high-impact incidents.