The problem nobody talks about
Bank statements come as PDFs. That's fine if you're a human reading them, but the moment you need that data in a spreadsheet, an accounting tool, or literally any other format — you're stuck.
Copy-pasting from a PDF mangles columns. OCR tools get confused by multi-currency layouts. And every bank formats their statements differently. There's no standard, no API, no universal parser.
I'd been annoyed by this for years. Then I realized: if it annoys me, it probably annoys thousands of accountants, bookkeepers, and small business owners too. That's a product.
The 2-week timeline
Here's how the build actually went:
| Day | What happened |
|---|---|
| 1–2 | Research: PDF parsing libraries, competitor analysis, market sizing |
| 3–4 | Core parser prototype — single bank format working |
| 5–7 | Next.js frontend, file upload, drag-and-drop UI |
| 8–10 | Added 4 more bank parsers, error handling, edge cases |
| 11–12 | Self-hosted deployment: Docker, Cloudflare Tunnel, domain setup |
| 13 | Security audit: no files stored on disk, memory-only processing |
| 14 | Launch — live at convertstatement.com |
Two weeks, start to finish. Not two weeks of "thinking about it" followed by three months of building. Two actual weeks of writing code.
Tech decisions that mattered
Why Next.js (again)
I initially considered a Python backend with Flask — the PDF parsing ecosystem in Python is more mature. But maintaining two stacks for one person is a terrible idea. I went with Next.js for everything:
- Same framework as every other product I'll build (code reuse)
- Server-side PDF processing via Route Handlers
- React frontend with instant preview of parsed results
The tradeoff: JavaScript PDF libraries are less battle-tested than Python's. Worth it for the consistency.
The parser architecture
Each bank gets its own parser module. Here's what the interface looks like:
interface BankParser {
bankName: string;
detect: (text: string) => boolean;
parse: (buffer: ArrayBuffer) => Promise<Transaction[]>;
}
interface Transaction {
date: string;
description: string;
amount: number;
currency: string;
balance?: number;
}The detect function examines raw PDF text to identify which bank issued the statement. Once detected, the correct parser handles the format-specific extraction. This keeps things clean — adding a new bank is just adding a new module that implements the interface.
Self-hosting on a Mac Mini
The entire product runs on a Mac Mini M4 sitting in my apartment:
- Next.js builds as a standalone Docker container
- Cloudflare Tunnel routes
convertstatement.comtraffic to the container - No AWS, no Vercel, no monthly cloud bill
Total hosting cost: electricity. The Mac Mini draws about 5 watts at idle. That's roughly $5/year.
What I'd do differently
Not everything went smoothly. A few lessons:
- PDF text extraction is chaos. Every library extracts text slightly differently. Some preserve column alignment, some don't. I burned two full days on this before finding a configuration that worked across all five bank formats.
- Test with real statements earlier. I started with synthetic test data. Real bank statements have weird edge cases: multi-page transactions, currency symbols in unexpected places, date formats that change mid-document.
My initial regex-based parser handled maybe 60% of real documents— I had to rewrite the core extraction logic on day 9. - Security first, not security later. Processing financial documents means people trust you with sensitive data. I added memory-only processing (no file persistence) on day 13. Should have been day 1.
Launch checklist
What I verified before going live:
- All 5 bank parsers producing correct output
- File upload handles PDFs up to 10MB
- No uploaded files persisted to disk
- Error messages are human-readable (not stack traces)
- Mobile layout works for upload and results
- Stripe integration for premium features (deferred to v2)
- Rate limiting on the upload endpoint (added post-launch)
The numbers (or lack thereof)
I'm not going to pretend there's a hockey-stick growth chart here. The product launched, it works, and a handful of people have used it. The real test is whether organic search traffic picks up over the next 60 days — that's my kill-or-keep threshold.
If it doesn't get traction, I'll write a post-mortem and move on to the next product. That's the factory model: build, measure, decide. No emotional attachment to any single product.
What's next
I'm already scoping the next product idea. This blog will document the process, same as this post — from initial idea validation through launch.
If you're building something similar, or if you have bank statements that won't parse correctly, check out ConvertStatement. And if you want to follow along with the factory journey, the next post is probably already in the works.