Backend engineer who's been in the weeds at two NYC startups. I like hard infrastructure problems. Currently at Terra (YC W24).
Cornell University, B.S. Computer Science, '25.
Built Python pipelines to analyze meme-based sentiment during the Ukraine war, scraping 5,000+ Reddit posts. Published by the Brookings Institution.
Led labs and office hours for 40+ students in Intermediate Web Development (CS 2300). Taught SQL indexing, partitioning, and normalization.
Cornell Brooks TPI · Future Leaders Spotlight: AI & DemocracyRemote · Summer 2024 · Rails backend, RSpec test coverage to 90%, PostgreSQL optimization.
Ithaca, NY · Spring 2024 · Led team of 8 building a 2D Java game; built UI framework and level editor.
This came out of a problem I kept running into at Terra. We process a lot of messy Excel and CSV files from suppliers, turning them into structured product data. The pipeline that handles this would occasionally fail partway through a file, and there was no clean way to retry without risking duplicate rows or half-written data. I wanted to build a standalone system that solves this correctly from the ground up.
The core flow is: a spreadsheet gets uploaded to GCS, that triggers an event into Pub/Sub, and workers on Cloud Run pick up the file, parse it, validate the rows, and write structured output to PostgreSQL. Every job gets a deterministic idempotency key based on the file contents. Before writing anything, the worker checks a Redis dedup store to see if that file has already been processed. If a worker crashes mid-parse, the message goes back to the queue and gets retried safely. Same file, same key, same result. Jobs that fail after max retries go to a dead letter queue.
The piece I'm most focused on right now is the replay system. The goal is to be able to resubmit any failed job or entire batch without worrying about side effects. Most pipelines treat reprocessing as an afterthought, but I want it to be a first-class thing. I'm also adding structured logging, per-job tracing through the full lifecycle, and basic metrics for success rates and latency.
Once the core is solid I'm planning to add failure simulation: crash workers mid-job, inject duplicate requests, force partial writes, and verify the system produces the correct output every time regardless.
View on GitHubOur payment infrastructure had no way to detect when Stripe and our database fell out of sync. Stripe processes a payment, we write the result to our database, and if that second step fails for any reason there's no error, no alert, nothing. The two systems just quietly disagree. I noticed this while reviewing our payment flows and started mapping out where it could happen.
I found four places. An operation that touched Stripe and our database in sequence, with nothing watching what happened in between. A refund could process on Stripe's side and never land in our records. A payment could go through but fail to create an order on our end. Creator earnings could simply never be calculated if a specific payment event was missed. And a payout could be marked as completed in our system before the actual bank transfer was attempted, leaving money stuck in a permanent "completed" state that never actually moved. None of it produced a visible error.
Patching each flow individually would have been the wrong call. The real problem is structural: you can't make two external systems agree at the exact moment of writing. The right answer is to accept that inconsistencies will occasionally happen and catch them after the fact.
I built a background job that runs on a schedule, looks back over the last 24 hours, and compares what Stripe says happened against what our database recorded, across payments, refunds, invoices, fees, and payouts. Any gap triggers an immediate Slack alert. The job only detects and reports, it never silently corrects, because you want to understand a new failure mode before you automate a fix for it.
Refine's product is a search engine, which means the database was ingesting a continuous stream of search events from customer sites around the clock. ClickHouse, which is built for analytical queries over large volumes of time-series data, was storing all of it. The analytics dashboard was reading from it directly, and the problem was that every single load was scanning the full event history to compute what to show. Roughly 50 million records on every request. You could see it in how long the page took to come up.
The first thing I changed was how the data was structured for querying. Instead of aggregating across the raw event history on every request, I set up materialized views that pre-aggregated the data by hour, day, and week as events came in. A query for "last 7 days" went from scanning tens of millions of rows to reading a few hundred pre-computed ones. That handled most of the latency.
The second change was caching. The results of a dashboard query for a given time window don't change until new data arrives, so hitting ClickHouse on every load was unnecessary. I added a Redis cache in front of the queries so that repeated requests for the same window returned instantly. Between the two changes, response times for uncached requests came down from 300ms to around 50ms, and cached requests returned instantly.
When I built the promotions system at Terra, the first question wasn't what features it needed. It was what a discount actually does to the payment stack. It changes what Stripe charges, what the creator gets paid, and how refunds are calculated if the order is returned. Getting any of those wrong means money ends up in the wrong place.
At Terra, creators sell products through their own storefronts. The core design question was who absorbs the cost of a discount. The answer we landed on was simple: the creator always takes the hit. If a creator issues a 20% off code, their payout is reduced by that amount. Terra's cut stays fixed. The Stripe charge reflects the discounted total, and the payout calculation works backwards from there.
The harder problem was what happens when a creator wants to discount beyond their margin. We allowed it, but Terra can't lose money in the process, so any order where the discount would have eaten into Terra's cut gets an automatic adjustment. The creator sees exactly what they're agreeing to before they publish the code.
Stacking was the other non-trivial piece. Creators could run sitewide promotions alongside individual coupon codes, so we had to make explicit decisions about when they could be combined and which took priority when multiple qualified at once.
One bug that came up in production: address updates at checkout were silently wiping the coupon discount from the order. The customer had seen the discounted total, but by the time the order confirmed the discount was gone. Once we identified the cause we fixed it and ran a backfill job to correct every affected order.