At January, we're transforming the lives of borrowers by bringing humanity to consumer finance. Our data-driven products empower financial institutions to streamline their collections, providing borrowers with straightforward and compassionate solutions to regain financial stability and control over their lives. We're not just expanding access to credit. We're restoring dignity and paving the way for millions to achieve financial freedom.

About the Role

As a Senior SRE you will ensure the reliability, scalability, and performance of January's production and internal systems as we scale from thousands to millions of borrowers. You'll establish SRE practices from the ground up - architecting resilient infrastructure, implementing proactive monitoring solutions, and building sustainable on-call processes that evolve with our rapid growth. Your work will directly tackle our current scaling challenges including database optimization, async workflow infrastructure, and data pipeline reliability while ensuring our engineering team can ship with confidence.

What You’ll Work on

Lead incident response and establish sustainable on-call practices, including comprehensive runbooks, blameless postmortems, and systematic improvements that reduce MTTR
Develop and maintain self-service observability solutions using modern monitoring tools that provide actionable insights for troubleshooting and performance optimization
Create and maintain infrastructure as code (using Terraform, CloudFormation) that allows for consistent, scalable, and secure cloud environments on AWS
Partner closely with feature teams to architect resilient infrastructure for critical components (databases, networking, async workflows, data pipelines) that scale seamlessly
Work closely with DevX to design and implement robust CI/CD pipelines with advanced deployment strategies (blue/green, canary) that enable teams to ship confidently and rapidly
Advocate for best practices early in feature design, ensuring we design with reliability in mind and future-proof our services

What You Bring to the Table

Expertise leading incident response for high-availability production systems, thorough root cause analysis, and fostering blameless postmortem culture
Experience designing highly available deployment architectures across multiple targets (e.g. EC2, Fargate), with expertise in auto-scaling, health checks, and graceful degradation strategies
Track record of implementing effective monitoring & observability solutions (e.g. Datadog, Prometheus, ELK), and evangelizing best practices
Strong knowledge of AWS cloud services and infrastructure-as-code practices using tools like Terraform
Experience with CI/CD pipelines and automation to enable reliable, efficient deployments
Excellent communication skills with experience documenting processes and collaborating across engineering teams

We encourage you to apply even if your experience isn’t an exact match. We value professional development and on-the-job learning!

We are currently hiring for this position in our New York office.

As a New York City-based company, we are dedicated to transparent, fair, and equitable compensation practices that reflect our commitment to fostering an environment where all team members are valued and supported. We encourage individuals from all backgrounds to apply.

We are an equal opportunity employer committed to diversity and inclusion in the workplace. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, disability status, age, veteran status, or any other legally protected characteristic.

Apply now

See more open positions at Debtsy

Privacy policy Cookie policy

Getro.org is a community-driven initiative to ease the impact of COVID-19 by connecting tech professionals and hiring companies. It's free for job seekers and companies, and curated by a growing list of referral partners.