For some time now, I have been watching Mocavo.com make it move to join ranks with the major league at the top of the genealogical database teams. I have been seeing a constant barrage of new innovations and additions to their large and getting larger free database. I have already written recent posts about their newest additions, when I got some more interesting posts to share. The first involves the image of the page out of a book. Here is what the programmers at Mocavo.com have found:
We have continued to improve our handwriting detection and recognition tools. In doing so, we stumbled upon another exciting new feature that we think will help change the way people learn about their family history. We are excited to share that we have developed the ability to very easily extract pictures, photographs and other images from our historical books. It’s not exactly like stumbling upon penicillin, but we were pleasantly surprised at how perfectly we are able to identify these images!Hence, the image up above. They predict that they will soon be able to add image-specific search capabilities to Mocavo.com. If you would like to read how serious they are about increasing their capabilities, read their post entitled "The Mean, Lean, Green Mocavo Machine." Here is a short excerpt to get you started:
With over 500 multi-core Dell Datacenter grade servers under the hood we have the ability to perform OCR on over 1 million documents per day. In fact, we’re in final stages of re-engineering our OCR process to increase that number to over 5 million, all without affecting the performance of the website whatsoever!
The processed documents have to go somewhere, and we’re pleased to announce that we have increased our storage capacity to over 1 Petabyte! That’s a lot of spinning platters, check out below how we keep them all spinning!They may join the major league sooner than you think.