ADMINISTRATING

Senior SRE, Software Engineering (AWS / Scaling Infrastructure)

New York, New York
Work Type: Full Time
HYBRID IN NEW YORK

We are looking for two Senior Site Reliability Engineers to build and scale reliability foundations for a rapidly growing fintech platform. This role focuses on architecting resilient infrastructure, strengthening observability, and establishing sustainable SRE practices as systems scale from thousands to millions of users. You will lead incident response, design highly available cloud architectures, and ensure engineering teams can ship quickly without compromising reliability. The position requires deep AWS expertise, strong infrastructure-as-code experience, and a proactive reliability mindset. You will partner closely with feature teams to design scalable databases, async workflows, and data pipelines. This is a high-impact hybrid role based in NYC for engineers who thrive in fast-scaling environments.

Details
Location: NYC (Hybrid)
Work Model: Hybrid
Employment Type: Full-time
Industry: Financial Technology
Start Date: ASAP

Key Responsibilities
Lead incident response and establish sustainable on-call processes
Create comprehensive runbooks and foster blameless postmortem culture
Architect highly available, scalable cloud infrastructure on AWS
Design auto-scaling, health checks, and graceful degradation strategies
Implement and evangelize modern observability tooling (monitoring, logging, tracing)
Develop infrastructure as code using Terraform or CloudFormation
Build and improve CI/CD pipelines with advanced deployment strategies (blue/green, canary)
Partner with engineering teams to embed reliability into feature design
Improve database performance, async workflows, and data pipeline reliability
Reduce MTTR through systematic process and tooling improvements

Requirements
5+ years of SRE/DevOps experience OR 7+ years of software engineering with strong infrastructure focus
Proven experience leading incident response for high-availability production systems
Strong AWS expertise (EC2, Fargate, networking, scaling strategies)
Experience with infrastructure as code (Terraform preferred)
Hands-on experience implementing observability solutions (Datadog, Prometheus, ELK, etc.)
Experience designing CI/CD pipelines and deployment automation
Strong knowledge of scalable system design and production reliability practices
Excellent documentation and cross-team communication skills

Nice to Have
Experience scaling fintech or regulated systems
Experience working in high-performance engineering cultures
Evidence of entrepreneurial or high-initiative background
Experience designing async workflow infrastructure or high-scale data pipelines

Interview Process
Recruiter Screen
Hiring Manager Screen
Case Study / Panel Interview
Onsite Interviews
Culture / CEO Interview
Offer

Submit Your Application

You have successfully applied
  • You have errors in applying