Job Description
We are hiring a talented Site Reliability Engineer (SRE) to ensure high availability, scalability, and reliability of our production systems. This role requires strong knowledge of DevOps practices, cloud platforms, automation, and monitoring tools.
Key Responsibilities
- Manage production reliability, availability, and scalability.
- Automate workflows with Infrastructure as Code (Terraform, Ansible).
- Monitor systems using Prometheus, Grafana, Datadog.
- Collaborate with developers and DevOps teams to improve CI/CD pipelines.
- Implement incident response and disaster recovery strategies.
- Optimize cloud infrastructure across AWS, Azure, or GCP.
- Ensure compliance with security best practices.
Requirements
- Bachelor’s degree in Computer Science, IT, or related field.
- 3–6 years of experience as an SRE, DevOps Engineer, or similar role.
- Expertise in Linux systems and cloud environments.
- Strong knowledge of Kubernetes, Docker, CI/CD.
- Coding/scripting skills in Python, Go, or Shell.
- Certifications in DevOps or Cloud Platforms are a plus.
#SiteReliabilityEngineer
#SRE
#DevOps
#CloudInfrastructure
#Kubernetes
#Automation
#Monitoring
#CI_CD
#CloudJobs
#InfoResumeEdge