Codepan – founded in 2014 – is a Berlin-based AI Innovation Hub. Our team of passionate data scientists, engineers, and technologists applies machine learning to solve real-world problems for clients as well as to incubate and accelerate our own AI product ideas.
Codepan is currently developing an AI-based product using capabilities of state-of-the-art LLM technologies in the space of Intelligent Document processing.
Tasks
As a Data Engineer in our team, you'll architect, build, and maintain advanced data pipelines and storage solutions. You'll play a pivotal role in enabling our analytics and AI teams to work efficiently with large datasets, including those used for training and deploying Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) models.
Key Responsibilities:
- Design and optimize scalable data pipelines to support advanced analytics, machine learning, and AI projects, with a particular focus on applications involving LLMs and RAGs.
- Develop robust data warehousing solutions that ensure fast, reliable access to large volumes of data, optimizing for query performance and system scalability.
- Collaborate with AI research and development teams to understand data requirements and ensure the seamless integration of AI models with our data ecosystem.
- Implement data governance and security measures, adhering to best practices and regulatory standards to safeguard sensitive information.
- Utilize and advocate for cloud-based technologies and services to enhance our data processing capabilities, ensuring our infrastructure is both flexible and cost-effective.
- Regularly evaluate and adopt new tools and technologies to keep our data infrastructure at the forefront of industry standards, particularly those enhancing LLM and RAG functionalities.
- Simplify complex data flows, making data easily accessible for non-technical stakeholders while maintaining the integrity and confidentiality of the data.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, Information Systems, or a related field.
- At least 5+ years of proven experience in data engineering, with a track record of developing scalable data solutions.
- Strong technical expertise in SQL/NoSQL databases, Python, Java, and ETL processes.
- Experience with cloud platforms (Azure/GCP) and familiarity with big data technologies.
- A keen interest in AI technologies, especially LLMs and RAG models, with a desire to stay updated on the latest trends and techniques.
- Solid foundation in data security principles and a commitment to implementing privacy-compliant data management practices.
- Excellent problem-solving skills, ability to work collaboratively in a team environment, and strong communication skills for explaining complex technical concepts.
Join us to contribute to cutting-edge projects in AI and analytics, leveraging your expertise to create impactful data solutions.
Salary: 30L - 35L, remote job in India