We're building an AI-powered conversational system for drive-thru automation. As our Data Engineer, you'll design and implement the infrastructure that powers our multi-stage LLM pipeline, from data capture to processing, model training, and deployment.
Tasks
- Build scalable real-time data pipelines for audio processing, LLM interactions, and model training
- Design comprehensive data storage solutions across object storage, NoSQL, and analytical databases
- Implement data quality management with filtering, normalization, and enrichment capabilities
- Create automated processes for data preparation, model evaluation, and continuous improvement
- Develop observability systems with monitoring, alerting, and performance dashboards
- Establish data security and compliance protocols, including privacy protection measures
- Build resilient data systems with error recovery, backup, and integrity verification
Requirements
What You'll Need
- Experience designing data pipelines for AI/ML applications
- Expertise with Apache Airflow for workflow orchestration
- Strong knowledge of Apache Spark for large-scale data processing
- Experience with Apache Kafka for real-time event streaming
- Proficiency with object storage systems (S3/MinIO) and database technologies (Cassandra/ScyllaDB, ClickHouse)
- Understanding of monitoring tools (OpenTelemetry) and observability platforms
- Experience implementing data security and compliance measures
- Advanced Python programming skills
Preferred Experience
- Audio data processing and conversational AI systems
- LLM training and fine-tuning pipelines
- Data quality frameworks (Great Expectations) and versioning tools (LakeFS, DVC)
- Kubernetes for container orchestration
- Multi-region deployment and distributed systems
Benefits
- Build cutting-edge conversational AI systems with real-world impact
- Work with modern, open-source technology stack
- Help shape the future of automated customer service
- Competitive compensation and flexible work arrangements
If you're passionate about building robust data systems for AI applications and excited by complex real-time data challenges, we'd love to talk.