Qdrant
Qdrant

Senior Site Reliability Engineer (SRE)

Remote
Employee
Software Development

Qdrant is an Open-Source Vector Database.

We help businesses take advantage of modern AI technologies. We are developing neural search solutions that allow everyone to use state-of-the-art neural network encoders at the production scale. At the same time, we help companies to integrate our technology into their infrastructure. Our flagship product is the open-source Vector Database: https://github.com/qdrant/qdrant

Among the technical challenges, we are facing is the implementation of our cloud infrastructure to serve our engine as a scalable cloud API solution. We are looking for a Site Reliability Engineer to ensure stable and secure operability of our managed solutions. If you're passionate about Site Reliability Engineering, Python, Go, Kubernetes, and contributing to the growth of a cutting-edge Database as a Service, we want to hear from you! Apply now and become a key player in shaping the reliability and scalability of our DBaaS platform.

Tasks

  • Infrastructure Automation: Design, implement, and manage infrastructure code using Terraform, focusing on the reliability and scalability of our Database as a Service (DBaaS) platform.
  • Programming Mastery: Utilize Python and Go to improve our service quality and develop automation scripts and tools for monitoring, deployment, and maintenance tasks specific to database operations.
  • Kubernetes Expertise: Demonstrate a deep understanding of Kubernetes, ensuring optimal performance, scalability, and reliability for our DBaaS platform.
  • Operator Frameworks: Develop and maintain Kubernetes Operators for automating database platform operations, enhancing the reliability of our services.
  • Multi-Cloud Management: Architect and maintain infrastructure in multi-cloud environments (AWS, GCP, Azure) to provide a resilient and available DBaaS solution.
  • Monitoring and Incident Response: Implement effective monitoring solutions tailored for database services and collaborate on incident response procedures to maintain the high availability of our systems.
  • Service Level Objectives (SLOs) and Agreements (SLAs): Define, measure, and maintain SLOs and SLAs specific to database performance and reliability, actively monitoring and optimizing systems to meet these targets.

Requirements

  • Site Reliability Engineering Focus: Proven experience in a Site Reliability Engineering or similar role, with a strong emphasis on database systems.
  • Programming Languages: Proficiency in Python and Go; experience with other languages is a plus.
  • Kubernetes Skills: Proven hands-on experience managing and optimizing Kubernetes clusters, particularly in the context of database services.
  • Operator Frameworks: Strong background in developing and maintaining Kubernetes Operators, with a focus on database automation.
  • Infrastructure as Code (IaC): Solid understanding and experience with Terraform, Ansible, or Pulumi, specifically applied to database infrastructure.
  • Multi-Cloud Expertise: Experience working with multi-cloud environments (AWS, GCP, Azure), ensuring seamless database operations across platforms.
  • Container Orchestration: Deep understanding of containerization concepts and orchestration tools (Docker, Kubernetes) within the DBaaS context.
  • SLOs and SLAs: Demonstrated experience in defining, implementing, and meeting Service Level Objectives and Agreements, particularly in the context of database reliability.
  • Problem Solving: Strong analytical and problem-solving skills, with a keen attention to detail.
  • Communication Skills: Excellent communication and collaboration skills, with the ability to convey complex technical concepts to diverse audiences.

Benefits

  • Working in a passionate international team
  • Competitive salary plus perks
  • Flexible working hours
  • Company events
  • Choose any hardware
  • Remote first/home office
  • Relocation option
Updated: 1 week ago
Job ID: 11106265
Report issue

Qdrant

11-50 employees
Technology, Information and Internet

Qdrant is powering the next generation of AI applications with advanced, high-performant vector similarity search technology. Our flagship product is the leading open-source Vecto…

Read more
+

2 more

  1. Senior Site Reliability Engineer (SRE)