About This Opportunity
**Responsibilities**:
- Identify technical and process gaps to implement improvements that increase operational reliability and operational efficiency, as well as promote stability through automation
- Support build and configuration of Kubernetes clusters, setting up monitoring framework
- Help teams perform post-incident reviews to eliminate the possibility of reoccurrence
- Help to meet performance and stability requirements by working with the team to implement load tests, tracing, monitoring, etc.
- Manage and maintain the release pipelines, help with manual and automated deployments.
- Perform regular security monitoring to identify any possible intrusions
- Perform/Validate scheduled backup operations, ensuring all required file systems and system data are successfully backed up to the appropriate media.
- Create/Manage (Change and Delete) user accounts as needed.
- Repair and recover from hardware or software failures as needed. Coordinate and communi...