case-studyStrategyMar 20266 min read

PaperLoop: Problem Space & Discovery

How 14 educator interviews killed an OCR wrapper and led to a Gemini-powered scanner shipping on the Play Store.

Shanjit ThokchomAspiring Product Manager & Builder

The problem — "I spend Sunday typing what I already wrote."

The first time I watched a chemistry teacher prepare a Class 10 unit test, she was working from two documents at once: a handwritten A4 draft she had written the night before, and a Microsoft Word file she was typing from scratch on her laptop. The handwritten version had all of the intelligence — question numbering, section breaks, marks weighting, which diagrams to include. The Word file was pure duplication. She had been doing this for eleven years.

Across six schools in and around Shillong, the pattern was identical. Teachers wrote exam drafts by hand because it was faster to think on paper, and then re-typed the entire thing in Word before it went to the photocopier. A mid-sized school ran roughly 14–20 exam cycles a year across classes and subjects. On average, 90 minutes per cycle was lost to retyping alone — roughly three working days per teacher per year, almost all of it on a weekend, almost none of it pedagogy.

Who we spoke to — 14 educators, 6 schools, one cheap channel

I ran 14 semi-structured interviews over four weeks. Nine were conducted on WhatsApp voice notes — teachers replied between periods, while correcting papers, while commuting — because it met them where they actually had three minutes. Five were in person, often in a staff room with chai, because at that point we were beyond "what's the problem" and into "show me your actual file."

The recruitment mix intentionally skewed away from tier-1 metros: four teachers from government schools, six from private schools charging under ₹60,000 a year, two from higher-end private schools, and two independent tuition instructors. Deliberately nobody from an international school — those buyers already have enterprise tools, and their workflow isn't where value compounds.

Three questions anchored every conversation: (1) walk me through your last exam, literally every step; (2) what's the most annoying part you'd never mention to a vendor; (3) if this tool existed and worked, would it replace something, or sit next to it?

The pattern broke before interview #6

By the sixth interview, the framing had locked in. The bottleneck was not writing the exam — teachers liked writing. It wasn't grading either. It was the middle step: the thirty-to-ninety-minute translation from handwritten draft into a document Word would accept, with chemistry subscripts in the right place, MCQ grids that didn't break across pages, and marks annotations like [5] lining up flush-right on the margin.

Two decisions fell out of that, early:

We killed the OCR-first framing. The instinct from every engineer I talked to was "just wrap Tesseract and ship." But raw OCR output forces teachers to proofread twice: once to catch OCR errors, once to reapply formatting. Every tier-2 scanner app on the Play Store had already shipped this; none of them had stuck. The workflow didn't save time — it redistributed it, which is worse, because the new distribution happens when the teacher is more tired.

We bet on Gemini Vision understanding structure, not characters. Handing a raw image to a multimodal model and asking it to produce a structured representation of the exam — questions numbered, sections delimited, marks attached, notation preserved — meant we could skip the OCR-proofread step entirely and emit a print-ready PDF directly. This was the risky assumption. It was also the reason to build the product.

What we decided not to build

Discovery work is usually evaluated on what it decides to build. This one was decided by what we cut.

No cloud accounts at launch. Every teacher I spoke to had at least one story about a school-mandated platform that locked their work behind an expired subscription. The friction was non-negotiable: install → scan → PDF, zero signups.
No grading features in v1. Grading is a bigger, juicier opportunity. Shipping it at launch would have doubled the scope, halved the focus, and put us into the teeth of an entrenched market (ExamSoft, Crowdmark). It goes on the roadmap at V2.
No "AI writes the exam for you." Teachers explicitly did not want this. The creative act was theirs. Replacing it signalled the product didn't respect their work. We wrote that into the positioning.
No English-only build. Three of fourteen teachers taught bilingually (Khasi/English, Hindi/English). Building a scanner that quietly broke on Devanagari characters in a science MCQ was a category of failure we couldn't ship into — Gemini handled it natively, so we kept it.

The metric that mattered

For a discovery phase, the only honest metric is did we ship the right thing? Early answer: yes, and fast. Four weeks after the Play Store listing went live, we had installs from teachers in cities I had never spoken to (Jaipur, Coimbatore, Siliguri) — meaning the positioning sentence ("handwritten to print-ready in seconds") was clear enough to travel on word-of-mouth alone, without a paid growth loop behind it. Of the first 200 installs, the top three uninstall-reason codes in the Play Console were "not what I expected" (12%), device issue (6%), and duplicate download (3%). "Not what I expected" is what I watch hardest — it's the signal the positioning and the actual product have drifted apart. At 12% and holding, it's acceptable; above 18% and I'd be rewriting the store listing.

The deeper signal: among teachers who scanned twice in the first week, retention to week four was above 80%. One successful scan isn't a habit. Two is. That's the metric I now run the roadmap against.

What I'd do differently

I'd start the WhatsApp channel earlier. Four of the most useful interviews happened because a teacher I'd already spoken to forwarded my voice note to two colleagues unprompted. That compounding was free; I should have been asking for it by interview #2. Discovery is cheaper than founders tell themselves — the cost is almost entirely coordination, not conversation.

Next PaperLoop: Web Conversion Pipeline