Middle DevOps/SRE Engineer (Serbia, Croatia, Poland, Portugal)

Remote (Serbia)

Employee

Engineering

A fast-growing provider of AI povered solutions is scaling its operations. With a strong customer base and increasing demand, the existing engineering team is under pressure to handle both infrastructure improvements and customer-facing support.

To meet this growth, the company is looking to add an Infrastructure Engineer in a team of two (will be a third engineer), supporting Kafka, Redis, Opensearch, RabbitMq, ClickHouse for products.

Tasks

Manage, monitor, and optimize ClickHouse clusters in production, including schema design, query performance tuning, replication configuration, and capacity planning;
Operate and maintain Kafka clusters, OpenSearch deployments, and other distributed systems, ensuring high availability and optimal performance;
Deploy, configure, and manage containerized applications and stateful workloads on Kubernetes, implementing best practices for resource management and scaling;
Implement and maintain GitOps workflows for infrastructure and application deployments, ensuring version-controlled and automated deployment processes;
Design and implement comprehensive monitoring, logging, and alerting solutions for distributed systems, enabling proactive issue detection and rapid troubleshooting;
Conduct performance analysis, identify bottlenecks, and implement optimizations across distributed systems to meet SLA requirements and improve system resilience;
Create and maintain technical documentation, runbooks, and operational procedures while collaborating with development teams to ensure smooth integration and operations.

Requirements

Hands-on experience operating distributed systems in production environments, with strong understanding of distributed computing concepts, data consistency, and fault tolerance;
Solid experience with ClickHouse, including cluster management, MergeTree engine families, data modeling, query optimization, and replication strategies;
Practical experience deploying and managing applications on Kubernetes, including StatefulSets, persistent volumes, networking, and security configurations;
Working knowledge of Apache Kafka (brokers, topics, partitions, consumer groups) and OpenSearch or similar search and analytics engines;
Experience with GitOps practices and Infrastructure as Code tools (Terraform, Helm, or similar), with ability to manage infrastructure through declarative configuration;
Proficiency with monitoring and observability platforms (Prometheus, Grafana, or similar) and experience implementing metrics collection and alerting strategies;
Hands-on experience with at least one major cloud platform (AWS, GCP, or Azure), including compute, storage, and networking services;
Strong scripting and programming skills in Python, Go, or Bash for automation, tooling development, and operational tasks.

Nice to have:

Experience with other distributed databases (Redis, Spark, Flink, etc.);
Knowledge of data streaming patterns and event-driven architectures;
Strong analytical and troubleshooting skills with ability to diagnose complex distributed systems issues, coupled with clear communication skills for cross-functional collaboration.

Benefits

Working conditions:

This role availible only for candidates from Croatia, Serbia, Portugal, Poland
Duration: 1 year+, with extension possibility;
Locations: Serbia, Portugal, Croatia, Poland;
Overlap: Until 11:00 AM PST at max.
Employment Type: Full-time

Updated: 21 seconds ago

Job ID: 15033940

Report issue

Akvelon

501-1000 employees

Software Development

Website

WE HIRE WITH

Middle DevOps/SRE Engineer (Serbia, Croatia, Poland, Portugal)

Tasks

Requirements

Benefits

Akvelon

Interested?

Contact Person

No time? Just apply later