An interview with Sandeep Dinesh, Co‑Founder and CTO of Mercoa (YC W23), on building AI agents, doing OCR with LLMs, and lessons learned from AI in the trenches
Thanks for sharing these perspectives! One bit that really stood out to me:
> We’ve completely ignored fine-tuning. There’s an emerging consensus that it’s too time-consuming and expensive to do relative to other approaches, especially with the foundational models themselves evolving so fast. You’d have to fine tune again for every model you want to try every time a new one releases.
I'm impressed that Gemini 2.5 is their go-to OCR tool / outperforms the custom APIs. So many existing paradigms are being disrupted just by how powerful the models are becoming.
Thanks for sharing these perspectives! One bit that really stood out to me:
> We’ve completely ignored fine-tuning. There’s an emerging consensus that it’s too time-consuming and expensive to do relative to other approaches, especially with the foundational models themselves evolving so fast. You’d have to fine tune again for every model you want to try every time a new one releases.
I went to a Google meetup and lead of the Gemini coding agent team said exact same thing
Super interesting (and helpfully opinionated)!
I'm impressed that Gemini 2.5 is their go-to OCR tool / outperforms the custom APIs. So many existing paradigms are being disrupted just by how powerful the models are becoming.