Sonia Solutions
Sonia Solutions

Platform Engineer (all) - Kubernetes Expert

Remote
Employee
Software Development

Let me introduce...

With Sonia, doctors are successful doctors. We create and deploy AI enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self-starter who values impactful work, join us in revolutionizing healthcare.

We’re looking for a Platform Engineer (all) (levels: mid to senior) to take ownership of our core infrastructure and internal developer platform. You’ll design, build, and maintain our Kubernetes-based environments on OVHcloud, ensuring they are scalable, reliable, and secure. Partnering closely with engineering and ML teams, you’ll manage our CI/CD pipelines, observability stack, and the critical infrastructure powering our GPU workloads, enabling our teams to ship code faster and more reliably.

This role can be performed remotely from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.

This is what you’ll own

  • Design, deployment, and management of scalable and secure Kubernetes clusters on OVHcloud.
  • Ownership and advancement of our CI/CD pipelines for automated, reliable application and infrastructure deployments.
  • Implementation and management of our GitOps workflows using tools like ArgoCD or Flux.
  • Management and scaling of GPU workloads in Kubernetes, ensuring optimal performance and resource utilization for our ML teams.
  • Development and maintenance of our observability stack (VictoriaMetrics, VictoriaLogs, Grafana, Tracing) to ensure deep visibility into system health.
  • Management of our cloud infrastructure on OVHcloud, focusing on automation (Infrastructure as Code), cost optimization, and security.
  • Lifecycle management of core platform services, including message brokers (RabbitMQ), databases (PostgreSQL, Redis), and authentication systems (Okta, OIDC, OAuth2).
  • Acting as a key responder for infrastructure incidents; debugging and troubleshooting complex production issues across distributed systems.
  • Supporting and empowering development teams by providing robust self-service tools, clear documentation, and collaborative support.

You’ll thrive in this role if you bring

  • 3-5+ years of professional experience in a Platform Engineering, DevOps, or SRE role.
  • Deep, hands-on experience with Kubernetes in a production environment (cluster management, networking, security, scheduling).
  • Proven experience managing infrastructure on a cloud provider (OVHcloud is a strong plus; AWS, GCP, or Azure experience is also valued).
  • Strong practical knowledge of CI/CD systems (e.g. GitHub Actions) and GitOps principles (ArgoCD, Flux).
  • Proficiency with Infrastructure as Code (IaC) tools like Terraform or Pulumi.
  • Solid understanding of observability principles and tools (e.g. VictoriaMetrics, VictoriaLogs, OpenTelemetry/Tracing, Grafana).
  • Experience managing stateful services in production (e.g. PostgreSQL, Redis, RabbitMQ).
  • Solid scripting skills in Python
  • A collaborative, user-centric mindset focused on enabling and empowering other engineers.
  • Strong debugging and problem-solving skills in distributed systems.

    Nice-to-Haves

  • Experience managing GPU workloads in Kubernetes (e.g. NVIDIA GPU Operator).
  • Familiarity with MLOps frameworks and tools (e.g. MLflow, Argo Workflows).
  • Exposure to CI/CD practices tailored for ML systems.
  • Experience with real-time inference of LLMs (e.g. vLLM, LMCache, llm-d).
  • Deep knowledge of authentication and authorization protocols (OIDC/OAuth2 in combination with Okta).

Why you’ll love working with us

  • Full ownership of a mission-critical platform
  • A team that values curiosity, learning, and experimentation
  • Remote-first setup with the option to work in our Berlin office
  • Competitive salary depending on experience
  • Work on AI infrastructure that directly impacts healthcare innovation

Ready to apply?

If you're passionate about web development and want to work with cutting-edge technologies, we'd love to hear from you!

I'm Margarita and will be guiding you through the application process.

Updated: 1 hour ago
Job ID: 15027032
Report issue

Sonia Solutions

11-50 employees
Software Development

Sonia is transforming dental practices with an AI-powered SaaS platform that automates time-consuming administrative tasks. By streamlining documentation and billing, we help dent…

Read more
  1. Platform Engineer (all) - Kubernetes Expert