Salesforce is looking for a Software Engineer to architect, build, and scale the infrastructure, tools, and platforms that improve the resiliency, reliability, performance, and scalability of distributed systems running on the MuleSoft Anypoint Platform, especially in high-security environments.
Requirements
- Proven proficiency in Java, Python, Go, Bash, with experience writing production-quality, maintainable, and testable code for infrastructure and platform automation.
- Hands-on experience with infrastructure as code, CI/CD pipelines, and deployment automation using tools like Terraform, Jenkins, and Spinnaker.
- Proven experience architecting, developing, and operating systems in cloud-native environments (AWS) and managing containerized workloads with Kubernetes.
- Strong understanding of observability engineering, including instrumentation, metrics, logging, and distributed tracing—experience with OpenTelemetry, Grafana, Splunk, Sumo Logic, or similar platforms.
- Solid knowledge of distributed systems, network protocols (TCP/IP, DNS, HTTP, TLS), and API design standards (REST, RAML, OAS).
- Demonstrated ability to diagnose complex system issues, design for fault tolerance and high availability, and continuously improve reliability through software.
- Familiarity with compliance-bound environments, including FedRAMP, Protected B, or similar, and experience incorporating security and compliance into engineering workflows.
Responsibilities
- Design and develop systems, libraries, and tools that strengthen the resiliency and reliability of distributed services running on the MuleSoft Anypoint Platform.
- Develop and extend monitoring, logging, and alerting capabilities using industry-standard observability platforms (e.g., metrics, tracing, and log aggregation tools) to ensure issues are detected and diagnosed before they impact customers.
- Write production-grade code in Python, Go, or similar languages to automate operational tasks, scale deployment pipelines, and implement self-healing systems.
- Participate in on-call rotations, drive root cause analysis, and deliver software-based solutions that prevent recurrence and reduce meantime to recovery (MTTR).
- Build internal platforms, shared APIs, and systems that enhance developer velocity while improving overall system resilience and operability.
- Optimize and evolve our CI/CD pipelines using Jenkins, Spinnaker, and infrastructure-as-code tools such as Terraform and Kubernetes to enable safe and frequent delivery.
- Develop and maintain automated solutions to meet FedRAMP, Protected B, and other regulatory requirements—integrating security and compliance directly into deployment workflows.
Other
- This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship with the ability to meet customer and government screening standards applicable to this role.
- A passion for engineering reliability through software—you drive automation, eliminate toil, and foster a culture of operational excellence.
- A related technical degree required.
- Experience with chaos engineering, fault injection, or reliability gamedays to proactively validate system resilience and recovery readiness.
- Background in platform-as-a-service (PaaS), internal developer tooling, or building self-service infrastructure that accelerates engineering productivity.