Akvelon
Akvelon

Middle DevOps/SRE Engineer (Serbia, Croatia, Poland, Portugal)

Remote (Serbia)
Employee
Engineering

A fast-growing provider of AI povered solutions is scaling its operations. With a strong customer base and increasing demand, the existing engineering team is under pressure to handle both infrastructure improvements and customer-facing support.

To meet this growth, the company is looking to add an Infrastructure Engineer in a team of two (will be a third engineer), supporting Kafka, Redis, Opensearch, RabbitMq, ClickHouse for products.

Tasks

  • Manage, monitor, and optimize ClickHouse clusters in production, including schema design, query performance tuning, replication configuration, and capacity planning;
  • Operate and maintain Kafka clusters, OpenSearch deployments, and other distributed systems, ensuring high availability and optimal performance;
  • Deploy, configure, and manage containerized applications and stateful workloads on Kubernetes, implementing best practices for resource management and scaling;
  • Implement and maintain GitOps workflows for infrastructure and application deployments, ensuring version-controlled and automated deployment processes;
  • Design and implement comprehensive monitoring, logging, and alerting solutions for distributed systems, enabling proactive issue detection and rapid troubleshooting;
  • Conduct performance analysis, identify bottlenecks, and implement optimizations across distributed systems to meet SLA requirements and improve system resilience;
  • Create and maintain technical documentation, runbooks, and operational procedures while collaborating with development teams to ensure smooth integration and operations.

Requirements

  • Hands-on experience operating distributed systems in production environments, with strong understanding of distributed computing concepts, data consistency, and fault tolerance;
  • Solid experience with ClickHouse, including cluster management, MergeTree engine families, data modeling, query optimization, and replication strategies;
  • Practical experience deploying and managing applications on Kubernetes, including StatefulSets, persistent volumes, networking, and security configurations;
  • Working knowledge of Apache Kafka (brokers, topics, partitions, consumer groups) and OpenSearch or similar search and analytics engines;
  • Experience with GitOps practices and Infrastructure as Code tools (Terraform, Helm, or similar), with ability to manage infrastructure through declarative configuration;
  • Proficiency with monitoring and observability platforms (Prometheus, Grafana, or similar) and experience implementing metrics collection and alerting strategies;
  • Hands-on experience with at least one major cloud platform (AWS, GCP, or Azure), including compute, storage, and networking services;
  • Strong scripting and programming skills in Python, Go, or Bash for automation, tooling development, and operational tasks.

Nice to have:

  • Experience with other distributed databases (Redis, Spark, Flink, etc.);
  • Knowledge of data streaming patterns and event-driven architectures;
  • Strong analytical and troubleshooting skills with ability to diagnose complex distributed systems issues, coupled with clear communication skills for cross-functional collaboration.

Benefits

Working conditions:

  • This role availible only for candidates from Croatia, Serbia, Portugal, Poland
  • Duration: 1 year+, with extension possibility;
  • Locations: Serbia, Portugal, Croatia, Poland;
  • Overlap: Until 11:00 AM PST at max.
  • Employment Type: Full-time
Updated: 21 seconds ago
Job ID: 15033940
Report issue

Akvelon

501-1000 employees
Software Development
  1. Middle DevOps/SRE Engineer (Serbia, Croatia, Poland, Portugal)