We are looking for a Senior Applied AI Engineer who combines strong AI/ML fundamentals with a disciplined, evaluation-driven approach to improving production AI systems. This is not a research-only role. You will work on real systems used by real users — diagnosing failures, designing metrics, running structured experiments, and continuously improving AI behavior at scale.
Details
Position: Senior Applied AI / ML Engineer
Experience: 5+ years preferred
Employment Type: Full-time
Location: Fully Remote (Ukraine, Europe, or within ±3 hours of CET)
English: Upper-Intermediate / Advanced
Start Date: ASAP
The Role in a Nutshell
You will improve production AI systems through evaluation, experimentation, and system design.
This role is heavily focused on:
Diagnosing failures in agent workflows
Designing evaluation metrics and KPIs
Improving prompts and agent behavior
Running structured experiments and measuring impact
You will not work in isolation on academic research. Instead, you will optimize systems that power real product features.
Responsibility Breakdown
AI evaluation & KPI design – ~30%
Prompt and agent system design – ~30%
ML systems (ranking, optimization, etc.) – ~30%
Engineering integration – ~10%
What You’ll Work On
AI Evaluation & System Quality (Core Focus)
Design evaluation strategies for LLM and multi-agent workflows
Define metrics and KPIs for AI system performance
Build and maintain evaluation datasets
Debug production AI failures systematically
Compare system behavior against baselines
Quantitatively measure improvements
This is a core responsibility of the role.
Multi-Agent AI Systems
Improve agent orchestration and pipeline workflows
Diagnose failures across agent chains
Refine system prompts and interaction logic
Improve reliability, latency, and output quality
ML & AI Systems
You will contribute to areas such as:
Recommendation systems (ranking & personalization)
Itinerary optimization & constraint-based planning
LLM-based reasoning systems
Optional: computer vision pipelines
Depth in one of these areas is more important than superficial experience across all.
Tech Environment
Golang (primary production language)
Python (ML workflows)
Postgres, Redis
Internal services & production systems
You don’t need to be a Go expert initially, but you should be comfortable reading and modifying production backend code.
Backend engineers handle infrastructure-heavy service development — your focus will be AI system behavior, correctness, and evaluation.
What We’re Looking For
Strong AI/ML Fundamentals
You understand the theory behind what you build and can choose appropriate methods for a given problem.
Examples:
Evaluation metrics (precision, recall, F1, etc.)
Ranking & recommendation systems
Embeddings & similarity
Experimentation methodology
Not required:
Academic publications
Advanced theoretical math
Large-scale model training experience
Evaluation-Driven Mindset (Most Important Signal)
Think in metrics and baselines
Design experiments instead of guessing
Measure improvements quantitatively
Debug failures methodically
Experience with LLM Systems
Prompt design
Agent workflows
Evaluation of LLM outputs
Production LLM integrations
Ability to Ship Production Systems
Turn ideas into working systems
Iterate based on measurable results
Balance experimentation with delivery
Programming Ability
Comfortable writing production code in at least one language (Python, Go, or similar)
Able to learn and adapt to new languages as needed