Interview: Chief Scientist Angshu Guha Talks Data Science

January 21, 2015 by Madison Jacobs

Recently, I sat down with Angshuman "Angshu" Guha, Chief Scientist here at Captricity, to chat about his past work at Google and Microsoft, his love for data and who inspires him.

1. In your 'Meet the Team' blog post a few months back, you talked briefly about your past lives at Microsoft and Google. I’m curious... what was the coolest project you worked on at both companies and why?

I worked on handwriting recognition at Microsoft for a long time. That was the coolest project I have ever worked on. Or perhaps I should say it was a sequence of cool projects spanning more than a decade. And among those, one of the coolest projects was developing a topology and a training method for a neural network that could decipher both cursive and print handwriting in several languages. I am proud of that work.

I worked on several teams at Google. The coolest project I was involved in was their web search ranking -- “Search Quality” is the name they use. It’s quite impressive, especially given that that it was deve

loped in the old-fashioned software engineering way of tweaking and tuning, nothing as sexy as modern machine learning. And yet it performs extremely well and what was written for the US English web was adaptable to the web world-wide.

2. Amazing! Now, you know what’s next. Why join Team Cap? What inspired you to come along for the ride?

Over my career, I have realized that there two things that make me want to wake up in the morning and get to work: (1) interesting tough problems that need to be solved and (2) smart and personable people to work with. Those two reasons were sufficient for me to join Captricity. But it turned out there were more! It was a fantastic opportunity to work on fascinating machine learning problems including OCR. It is also a privilege to work with and for someone like Kuang -- I love his vision and the way he tempers his sharp mind with empathy.

3. So, you’re the Chief Scientist here at Captricity. When did you first realize your love for data and the science behind it?

I don’t remember ever not being interested in science. Mathematics was my first love… but that’s another story. The love for data -- what might be called a love for empirical sciences -- came naturally as I was training in computer science through undergrad as well as grad school. But it matured into something substantial after I joined Microsoft and started working on handwriting recognition. Here was a situation where you couldn’t just write a program to crack the problem -- the only way to make progress was to automatically learn from examples and the only way to measure your progress was using large amounts of data. That’s when I realized subjective opinions mattered little compared to what the data proved or disproved. And I believe the world is increasingly facing these problems that are dominated by data.

4. Let's talk data science at Captricity. Can you give us a sneak peek of what you're working on in 2015?!

These days the term “data science” is sometimes used in confusing ways. It probably has several aspects to it, or shades of meaning -- statistical modeling, predictive analytics, data mining, machine learning, etc. My experience and interest are mostly in machine learning.

I am working on several machine learning projects at Captricity. But probably the most important and exciting one is developing an Optical Character Recognition engine. We hope to build an OCR system using deep learning that will be able to handle both human handwriting and machine print with a state-of-the-art accuracy. It’s a hard machine learning problem. On one hand, we have a volume of training data that will be the envy of many. On the other hand, it’s also a difficult kind of data -- handwriting mixed with print, free flowing text mixed with form fields mixed with multiple choice questions, different writing sizes mixed together and so on. We have plenty of ideas, including some secret sauces, and we are very optimistic. We are making solid progress. The idea is to solve OCR in a way that is most applicable to our data, exploiting constraints that are peculiar to us, as well tackling difficulties a general purpose commercial OCR engine might choose to ignore.

5. Final question. What data scientist(s) do you look up to? How do they inspire you to push the envelope in your everyday work at Captricity?

A lot of computer scientists have inspired me. Let’s just name three. Edsger Dijkstra was one of the most brilliant computer scientists whose work I have studied, though the term “data scientist” may not be appropriate for him. Geoff Hinton is one of the giants of machine learning, especially neural nets. And David Rumelhart. He was another pioneer in deep learning before it was called deep learning and before it became fashionable.

All three produced simple but powerful ideas. They could all express themselves elegantly and clearly. Hinton and Rumelhart allowed their intuitions to lead them to new insights and inventions and sought mathematical validations only after the fact.

I am inspired by the values they project -- work hard, pay attention to intuition but let data be the last arbiter, prefer simplicity over unnecessary complexity, don’t get too bogged down with conventional wisdom as to what is possible and what is not.

Stay up to date!

Sign up for our newsletter today to discover how Captricity can help you unlock valuable customer data—from handwritten forms and scans to faxes, emails, and mobile inputs.