Dear applicants, please keep in mind that applications without provided salary expectations and active LN profile will not be considered.
Hope for your understanding.
Location: San Francisco, CA (In-person)
Employment Type: Full-Time
Equity: 0.5% – 1%
Visa: Not available
Experience: 1+ years (exceptional new grads welcome)
We are hiring ML Engineers to implement research ideas reliably and operate full training pipelines end-to-end. This is not a research-only role. This is research-engineering at scale. A seed-stage research-driven ML company focused on mechanistic understanding of model architectures and optimizers.
The team studies:
- Optimizer–architecture co-design
- Orthogonalized optimizers and manifold-based training
- Sparse attention mechanics
- Data-efficient reasoning models
- Learning dynamics in data-sparse regimes
The environment blends academic rigor with industrial compute and speed. The team is deliberately long-term oriented and avoids premature commercialization pressure.
You will:
- Translate research papers into working PyTorch/JAX implementations
- Run distributed transformer training
- Debug divergence and instability
- Optimize throughput
- Build full pipelines (data → training → evaluation)
- Reason about learning dynamics and architecture tradeoffs
- The bar is slope and research intuition, not years.
What You’ll Own
- Reliable implementation of novel architectures
- Distributed transformer training at scale
- Training stability and performance debugging
- Evaluation frameworks
- Optimization reasoning alongside researchers
Must-Have Requirements
- Strong PyTorch or JAX proficiency
- Hands-on transformer training experience
- Experience with distributed training setups
- Debugging divergence and instability
- Ability to read and implement research papers
- Research intuition around optimization and learning dynamics
- High growth slope
Nice to Have
- Megatron-LM, DeepSpeed, xformers
- End-to-end pipeline ownership
- Research-engineering team experience
- Mathematical depth (optimization, information theory, etc.)
- Competitive programming / theory-heavy background