Ritvik Aryan Kalra

00 / Orientation

The whole
training stack. Sandboxes, environments, data, rewards, training — for Sarvam's coding models. Read top to bottom; take your time.

Current work

Sarvam AI · 2025–

The end-to-end loop behind Sarvam's coding models — execution fleet underneath, the task pipeline on top, training and measurement closing the loop.

Sandbox
infra

Self-built execution fleet for agent rollouts — thousands of concurrent sandboxes.

1000s of concurrent sandboxes · self-built fleet

fleet · live (illustrative)runningpassfail

run pass fail idle

Every agent rollout in the building lands on this floor — a self-hosted execution fleet scaling to thousands of concurrent sandboxes, warm and placement-fast. Capacity, images, observability, and the on-call: owned end to end.

sandboxes at scaleself-hostedfleet observability

The task
factory

Raw software work to training-grade tasks, automatically.

build_tasks(raw_swe) → envs · data · rewards — one automated pipeline

aEnvironments

Real software work becomes runnable, graded RL environments — automatically, at scale.

the line · illustrativeraw work → graded

bCuration

Synthetic generation plus ruthless filtering — most candidate data is cut, on purpose.

the gate · illustrativeseen → keptmost cut

cRewards

Every reward signal is stress-tested against gaming before it shapes a gradient.

the audit · illustrativeaudited → shipped / cut

One automated line from raw software work to training-grade tasks: environment construction, synthetic data and curation, and reward signals stress-tested against gaming before they ever shape a gradient. This is the factory the run trains on.

environments at scalesynthetic + curationanti-gaming rewards

Training

The full post-training ladder, run end to end — SFT through on-policy RL.

post-training, end to end · pip-sql-1.3b (’23) → Sarvam coding models

the descent, liveloss ↓

The full post-training ladder — SFT through on-policy RL — run end to end, landing meaningful gains on internal benchmarks. The arc runs from pip-sql-1.3b in ’23, a 1.3B model matching models 7× larger, to the coding models behind Sarvam’s agents today. Owning data → rewards → weights means a regression gets chased to its source.

SFT → on-policy RLdata → weightspip-sql → Sarvam

Evaluation

The measurement layer that keeps every reported gain honest.

benchmarking infra · per-trace forensics · Samvaad V2V evals

per-trace replay · illustrativetrace forensics

Eval infrastructure for benchmarking the coding models — rollouts at scale, per-trace diagnosis of where an agent got stuck, integrity checks against contamination — plus Samvaad, evals for voice-to-voice agents. A number that can’t be defended doesn’t ship.

benchmarking infraper-trace forensicsSamvaad · V2V

The road here

2022 → now. Everything above stands on this — voice agents on live traffic, a founding-MLE model, and the systems work underneath it all.

Before this

2022–25

The experience the training stack is built on, most recent first.

01Sarvam — voice agents’24–25

telephony-scale traffic · live V2V platform

live traffic · illustrativevoice-to-voice

Tuned the agentic LLM behind a voice-to-voice platform serving production, telephony-scale traffic, applying GEPA on production traffic and building the Samvaad eval and monitoring that made that safe.

telephony-scale V2VGEPA on prodSamvaad evals

02Pipable — founding MLE’23–24

pip-sql-1.3b ≈ models 7× larger

pip-sql-1.3bNL → SQL

Founding ML engineer. Led pip-sql-1.3b — an NL→SQL model built with RL and deep learning, matching models 7× larger — and shipped pip-library-etl-1.3b, turning codebases into retrievable, model-ready context.

pip-sql-1.3bNL → SQL · RL7× smaller

03IIIT Hyderabad — systems’22–23

Slurm GPU cluster · Sprinklr internship

slurm · illustrativeIIIT-H GPU cluster

Student sysadmin running IIIT-H’s Slurm GPU cluster for ML workloads, plus a reverse-proxy still used by alumni worldwide and a course-management migration. At Sprinklr: a test-analytics pipeline used across a large engineering org and Kafka health monitoring.

Slurm / GPUreverse-proxyKafka / Elasticsearch

Research

Algorithmic fairness · 2023–

Peer-reviewed work on algorithmic bias and gender disparities in recommendation systems.

P1Exploring Gender Disparities in Bumble’s Match Recommendations

SIG GlobDev Pre-ICIS 2023 · arXiv:2312.09626

A mixed-methods study of gendered disparities in match recommendations on a large dating platform, combining quantitative analysis of recommendation outcomes with qualitative reading of how the system treats users differently by gender. Presented at the SIG GlobDev Pre-ICIS Workshop 2023.

[PDF][arXiv][BibTeX]

P2Unveiling Algorithmic Bias and Bridging Gender Disparities: Case Studies from a Gaming and a Dating Platform in India

IJGS · in press

A pair of algorithmic-bias case studies from a gaming platform and a dating platform in India, arguing that the same structural bias patterns recur across very different products and pointing toward mitigation. International Journal of Gender Studies (IJGS) — in press.

[PDF][arXiv][BibTeX]

Off the clock

the human behind the stack

Off the clock

🎹 keytar ⌨️ mechanical keyboards 🎲 board games ☕ coffee

I play the keytar, build mechanical keyboards, lose at board games, and take coffee too seriously. I also keep a blog — coding practices, Effective Java notes, and a running keyboard build log.

RitvikAryanKalra

Current work

Sandboxinfra

The taskfactory

Training

Evaluation

Before this

Research

Off the clock

Ritvik
Aryan
Kalra

Sandbox
infra

The task
factory