Dear applicants, please keep in mind that applications without provided salary expectations and active LN profile will not be considered. Hope for your understanding.

HYBRID IN NEW YORK

We are looking for two Senior Site Reliability Engineers to build and scale reliability foundations for a rapidly growing fintech platform. This role focuses on architecting resilient infrastructure, strengthening observability, and establishing sustainable SRE practices as systems scale from thousands to millions of users. You will lead incident response, design highly available cloud architectures, and ensure engineering teams can ship quickly without compromising reliability. The position requires deep AWS expertise, strong infrastructure-as-code experience, and a proactive reliability mindset. You will partner closely with feature teams to design scalable databases, async workflows, and data pipelines. This is a high-impact hybrid role based in NYC for engineers who thrive in fast-scaling environments.

Details

Location: NYC (Hybrid)

Work Model: Hybrid

Employment Type: Full-time

Industry: Financial Technology

Start Date: ASAP

Key Responsibilities

Lead incident response and establish sustainable on-call processes

Create comprehensive runbooks and foster blameless postmortem culture

Architect highly available, scalable cloud infrastructure on AWS

Design auto-scaling, health checks, and graceful degradation strategies

Implement and evangelize modern observability tooling (monitoring, logging, tracing)

Develop infrastructure as code using Terraform or CloudFormation

Build and improve CI/CD pipelines with advanced deployment strategies (blue/green, canary)

Partner with engineering teams to embed reliability into feature design

Improve database performance, async workflows, and data pipeline reliability

Reduce MTTR through systematic process and tooling improvements

Requirements

5+ years of SRE/DevOps experience OR 7+ years of software engineering with strong infrastructure focus

Proven experience leading incident response for high-availability production systems

Strong AWS expertise (EC2, Fargate, networking, scaling strategies)

Experience with infrastructure as code (Terraform preferred)

Hands-on experience implementing observability solutions (Datadog, Prometheus, ELK, etc.)

Experience designing CI/CD pipelines and deployment automation

Strong knowledge of scalable system design and production reliability practices

Excellent documentation and cross-team communication skills

Nice to Have

Experience scaling fintech or regulated systems

Experience working in high-performance engineering cultures

Evidence of entrepreneurial or high-initiative background

Experience designing async workflow infrastructure or high-scale data pipelines

Interview Process

Recruiter Screen

Hiring Manager Screen

Case Study / Panel Interview

Onsite Interviews

Culture / CEO Interview

Offer

Senior SRE, Software Engineering (AWS / Scaling Infrastructure)

Submit Your Application