An interview with Sandeep Dinesh, Co‑Founder and CTO of Mercoa (YC W23), on building AI agents, doing OCR with LLMs, and lessons learned from AI in the trenches
I'm impressed that Gemini 2.5 is their go-to OCR tool / outperforms the custom APIs. So many existing paradigms are being disrupted just by how powerful the models are becoming.
Thanks for sharing these perspectives! One bit that really stood out to me:
> We’ve completely ignored fine-tuning. There’s an emerging consensus that it’s too time-consuming and expensive to do relative to other approaches, especially with the foundational models themselves evolving so fast. You’d have to fine tune again for every model you want to try every time a new one releases.
Super interesting (and helpfully opinionated)!
I'm impressed that Gemini 2.5 is their go-to OCR tool / outperforms the custom APIs. So many existing paradigms are being disrupted just by how powerful the models are becoming.
Thanks for sharing these perspectives! One bit that really stood out to me:
> We’ve completely ignored fine-tuning. There’s an emerging consensus that it’s too time-consuming and expensive to do relative to other approaches, especially with the foundational models themselves evolving so fast. You’d have to fine tune again for every model you want to try every time a new one releases.