About This Opportunity
Enhance enterprise platform reliability as a Site Reliability Engineer in Canada. This crucial role focuses on maintaining the stability, availability, and performance of cloud and hybrid systems.
As an SRE/DevOps Engineer, you will serve as the first responder for incidents across various infrastructure environments, including Kubernetes and APIs. Your work will involve monitoring systems, executing runbooks, and actively supporting rapid incident resolution while collaborating with multiple teams. Analytical problem-solving, effective communication, and technical troubleshooting are essential in contributing to overall service reliability.
Key Responsibilities:
• Monitor system health using observability tools
• Perform incident triage and execute runbooks
• Troubleshoot application and infrastructure issues
• Communicate incident status to stakeholders
• Support deployment operations by following workflows
Requirements:
• 2–5 years in IT operations or DevO...