Omnilex

Legal Content Database Infrastructure Engineer

Employee
Engineering
CHF 8'000 to CHF 12'000 / month

🌟 About You

Do you enjoy being the person who turns chaotic legal information into something a product can actually rely on? Are you energized by the combination of source acquisition, document understanding, data modeling, and AI-oriented processing, where success means clean structure, strong traceability, and fewer downstream surprises? Do you like building systems that keep working even when sources change, formats are inconsistent, and the “easy” path breaks in production?

You might be a great fit if you care equally about getting the data in, making it usable, and ensuring it improves search and AI quality in measurable ways.

🚀 About Omnilex

Omnilex is a young dynamic AI legal tech startup with its roots at ETH Zurich. Our passionate interdisciplinary team of 14+ people is dedicated to empowering legal professionals in law firms and legal teams by leveraging the power of AI for legal research and answering complex legal questions. We already stand out with handling unique challenges, including our combination of external data, customer-internal data and our own innovative AI-first legal commentaries.

Tasks

🛠️ Your Responsibilities

As a Legal Content Infrastructure Engineer, you will build and operate the pipelines that transform raw legal materials into reliable, structured, AI-ready assets for search, analytics, and LLM-based workflows.

  • Source integration: Build and maintain robust ingestion flows for legal content from websites, APIs, bulk imports, customer repositories, and document collections across jurisdictions.
  • Extraction & structuring: Turn messy inputs such as HTML, PDFs, XML, and semi-structured documents into clean representations with sections, metadata, references, and document relationships.
  • Normalization & schemas: Design pragmatic typed schemas for legal materials such as statutes, decisions, commentaries, citations, authorities, dates, and jurisdiction-specific metadata.
  • AI-ready preprocessing: Implement chunking, sectioning, citation linking, deduplication, summarization, tagging, and enrichment pipelines that improve retrieval quality and answer traceability.
  • Operational robustness: Make ingestion resilient to real-world failure modes such as source changes, inconsistent markup, missing fields, parser drift, retries, and rate limits.
  • Search & indexing support: Shape extracted data so it performs well in downstream indexing, retrieval, ranking, and legal research workflows.
  • Quality & observability: Introduce validation checks, coverage metrics, regression tests, lineage, and versioning so pipeline changes are measurable and safe.
  • Performance & cost: Improve throughput, runtime behavior, database efficiency, token usage, and overall cost-awareness across the data processing stack.
  • Cross-functional collaboration: Work closely with engineers, legal experts, and product teams to understand which data properties matter most and translate them into reliable pipeline behavior.

Requirements

✅ Minimum qualifications

  • Strong hands-on experience in data engineering or backend engineering with a focus on ingestion, transformation, or document processing in production.
  • Proficiency in TypeScript and solid engineering fundamentals.
  • Experience working with heterogeneous data sources and messy document formats, including structured and semi-structured content.
  • Strong practical skills with SQL / PostgreSQL, data modeling, and building reliable processing pipelines.
  • Experience using AI methods pragmatically for data preparation tasks such as classification, tagging, chunking, summarization, or retrieval-oriented preprocessing.
  • A strong debugging and ownership mindset: you can identify where a pipeline breaks and make it dependable.
  • Ability to work full-time and be on-site in Zurich at least two days per week (hybrid).

🎯 Preferred qualifications

  • Experience with web scraping, crawling, or source-specific extraction pipelines.
  • Familiarity with parsing/document tooling for PDFs, HTML, XML, and OCR-related workflows.
  • Experience with Azure, Docker, CI/CD, and queue- or worker-based pipeline architectures.
  • Familiarity with search and retrieval systems, including indexing strategies, relevance trade-offs, embeddings, or citation-aware retrieval.
  • Working proficiency in German and proficiency in English.
  • You have a Swiss work permit or EU/EFTA citizenship.
  • Knowledge and experience with legal systems, in particular Switzerland, Germany, USA 🧑‍⚖️

Benefits

🤝 Benefits

  • Foundational impact: your work directly improves what our AI can know, retrieve, explain, and cite.
  • End-to-end ownership: shape the pipeline from raw source acquisition to structured, searchable, AI-ready content.
  • Technical depth: work on real-world challenges across extraction, schema design, document intelligence, and data reliability.
  • Team: grow with an interdisciplinary team at the intersection of legal tech, data systems, and AI.
  • Compensation: CHF 8’000–12’000 per month + ESOP (employee stock options), depending on experience and skills.

We’re excited to hear from candidates who want to build the data foundation behind trustworthy legal AI. Apply today by pressing the Apply button.

Updated: 1 day ago
Job ID: 15839469
Report issue

Omnilex

11-50 employees
Technology, Information and Internet
  1. Legal Content Database Infrastructure Engineer