Captricity’s secret weapon: crowdsourcing
July 09, 2013 by webmaster
There are some things that people are really good at, like intuitively knowing when something is in context or not. We can tell in a flash that a bowl of fruit does not belong in our shower stall. There are other things that computers are really good at, such as processing vast amounts of quantifiable information very quickly, like calculating pi to 300 decimal points.
It turns out that extracting data from documents needs both these human and computer skill sets if it is to be both fast and accurate, particularly for hard problems like extracting data from handwritten forms. Optical character recognition software (OCR) is just not good at reading human handwriting. Humans are best at reading human handwriting. Captricity has developed a unique solution that leverages the best of both worlds.
How does Captricity crowdsource?
1) As soon as a document is uploaded to Captricity , our special alignment algorithms match it up to the template document on file. We use that template to “shred” every single pages into a number of unique fields.
2) Captricity then uses sophisticated machine learning to package up these fields (we call them “shreds”) into quickly identifiable packets. We use some cool cognitive science tricks to create packages of data that both protect the privacy of the original document and make it easy for human workers to validate the data.
3) Our machine learning algorithms take a first pass at reading and processing the data.
4) Between one and five people verify every single piece of data.
5) Captricity’s algorithms takes all the data fields and shreds and piece them back together into a complete data set.
This entire process takes anywhere from under an hour for a couple dozen pages to a day or so for thousands.
How can we possibly do that?
That’s where the magic of crowdsourcing comes in. Crowdsourcing has come of age along with the Internet. It refers to groups of people connected to a larger Internet service, community or marketplace. Each of these people perform tiny bits of a larger task. Wikipedia is perhaps the most famous example, where individuals across the world collaborate to write and update pieces of longer articles. Kickstarter, a platform that aggregates many small payments to support new projects or companies, offers a novel version of crowdsourcing dubbed “crowdfunding”. Clay Shirky, author of Here Comes Everybody, talks about crowdsourcing as a way of efficiently redistributing excess capacity. With crowdsourcing, the action of the group adds up to more than all of the individual contributions by themselves. The whole is more than the sum of its parts -- synergy in action.
Captricity’s application of crowdsourcing offers a stellar example of the synergies of crowdsourcing. Each shred by itself isn’t worth a lot. A first name alone doesn’t tell us much. But add up all the shreds in a given document and now you have machine-readable data that can be fed into a CRM system or an analytics tool. Take thousands of documents, made up of millions of shreds, and the data begins to drive real and significant value. In some cases, that data can have a world-changing impact, such as critical public health information or environmental impact data.
We use Amazon’s Mechanical Turk crowdsourcing marketplace to send microtasks to “Turkers” who verify the shreds for micropayments. Most of our Turkers are college-educated, live in the U.S. or India, and process microtasks in-between other work. Turkers are generally using their spare time to supplement their incomes with these micropayments. The result? You get your data “transcribed” with an unmatched degree of accuracy.
Captricity offers a groundbreaking collaboration between humans and computers. By drawing the best from both worlds, we are able to dramatically reduce the number of hours people have to sit and do the mind-numbing task of long-form data entry, a task that most people experience as exhausting and demoralizing.