Qdrant is a cutting-edge vector database company on a mission to revolutionize how organizations manage and query unstructured data. Our open-source engine and managed cloud solutions power AI-driven search, recommendation, and data discovery at scale. We are a remote-first company, building a global team of passionate engineers to push the boundaries of database infrastructure.
As a Senior DevOps / SRE Engineer on the Cloud Operations team, you will focus on keeping Qdrant Cloud reliable, observable, and secure as usage and infrastructure complexity grow. Your primary responsibility is operational excellence: stability, incident response, and continuous improvement of production systems.
This role is operations-heavy, ideal for engineers who thrive in owning reliability and reducing operational risk at scale.
Tasks
- Operate and maintain production cloud infrastructure at scale
- Own Kubernetes infrastructure, networking, and deployment pipelines
- Improve monitoring, logging, alerting, and operational visibility
- Lead incident response, root cause analysis, and follow-up actions
- Reduce operational toil through automation and better tooling
- Improve reliability, security, and performance of production systems
- Collaborate closely with Platform and Regions & Clusters teams
- Maintain and evolve runbooks, operational procedures, and alerts
- Participate in on-call rotations and continuous reliability improvements
Requirements
Must have
- 5+ years of experience in DevOps, SRE, or infrastructure operations roles
- Strong hands-on experience operating Kubernetes in production
- Solid knowledge of Linux systems, networking, and cloud infrastructure
- Experience working with AWS, GCP, or Azure
- Strong understanding of monitoring, alerting, and incident management
- Experience with infrastructure-as-code and automation tooling
- Comfortable owning on-call responsibilities and production incidents
- Strong operational mindset and clear communication skills
Nice to have
- Experience with Terraform or similar IaC tools
- Familiarity with Prometheus, Grafana, Loki, or OpenTelemetry
- Exposure to security, compliance, or hardening initiatives
- Scripting experience in Python, Bash, or Go
- Experience in SaaS, cloud, or data infrastructure environments
Benefits
- Competitive salary, equity, and benefits
- Fully remote setup with flexible working hours
- Clear ownership of reliability and operational excellence
- Opportunity to work on mission-critical customer-facing infrastructure
- Strong collaboration with platform and engineering teams
If you enjoy keeping complex systems reliable and improving operations through automation and discipline, we’d love to hear from you.
Recruiting Agencies and Headhunters, please only via 𝙝𝙞𝙧𝙚𝙗𝙪𝙛𝙛𝙚𝙧.𝙘𝙤𝙢?ref=qdrant