Pages

Thursday, December 26, 2013

Will all the books in the world be digitized?

I guess my first comment on this subject would be, don’t hold your breath. But this is a real concern for genealogists and others and cannot be dismissed quite that cavalierly.  This is especially true when there are some very large companies in the world that have as a goal the digitization of every book and in some cases, every record in the world.

If you believe it, the Wikipedia article on Google Books claims there are exactly 129,864,880 known printed books in the world. I always suspect very large exact numbers, especially when it would seem that counting every published book all around the world since moveable type was invented is highly unlikely. Of course, this number is wrong the second one more book is published. This number was published back on 5 August 2010, so today it is wrong anyway. See “You can count the number of books in the world on 25,972,976 hands.” How did Google arrive at the number? See “Books of the world, stand up and be counted! All 129,864,880 of you.”

But let’s assume that the number really is around 130 million or so. Could Google digitize all those books? Well, the answer is if they had them available to scan, yes they could. According to some estimates, Google has already scanned over 30 million books since starting in 2004 and has done over 10 million in that last year. At that rate, they would be “done” in about ten years. But the real questions is not whether Google is going to digitize every last book in the world, but whether or not someone or anyone is going to do so. Of course if you think about it for a minute (or more as the case may be), you will soon realize that there are some rather apparent insurmountable obstacles to achieving this goal. There are the physical limitations of access created by national boundaries and attitudes. Do you really believe that every library in the world is just going to sit there and let Google (or anyone else) waltz in and start scanning away?

Don’t think I have ignored the issue of copyrights. Really copyright isn’t an issue with the digitization of books, it is only an issue with what can happen to display or make the digitized books available online. I give you an example of one problem. This problem hits home because it is sitting in the Mesa FamilySearch Library. Many of the books in the Mesa FamilySearch Library are essentially unique. They are very limited editions. What's more is that they are extremely unlikely to been included in Google's estimate of the number of books. So, the question about whether or not all the world's books will be digitized is not a legal issue, neither is it a digitization issue, in the end it is a totally practical problem of making all of the books available to be digitized. Now, I should mention that many of the books in the Mesa FamilySearch Library have already been digitized and are already available online on FamilySearch.org. But under present policies and procedures, the remaining books that are under copyright and unique or in limited editions, will likely not be digitized ever. In this context ever means until the copyrights run out and that is a very long time assuming that additional extensions of the copyright coverage are not passed by the United States legislature in the future.

So the answer to the question is 42.

Just in case that answer is not satisfying, here is the full quotiation:

"Good Morning," said Deep Thought at last.
"Er..good morning, O Deep Thought" said Loonquawl nervously, "do you have...er, that is..."
"An Answer for you?" interrupted Deep Thought majestically. "Yes, I have."
The two men shivered with expectancy. Their waiting had not been in vain.
"There really is one?" breathed Phouchg.
"There really is one," confirmed Deep Thought.
"To Everything? To the great Question of Life, the Universe and everything?"
"Yes."
Both of the men had been trained for this moment, their lives had been a preparation for it, they had been selected at birth as those who would witness the answer, but even so they found themselves gasping and squirming like excited children.
"And you're ready to give it to us?" urged Loonsuawl.
"I am."
"Now?"
"Now," said Deep Thought.
They both licked their dry lips.
"Though I don't think," added Deep Thought. "that you're going to like it."
"Doesn't matter!" said Phouchg. "We must know it! Now!"
"Now?" inquired Deep Thought.
"Yes! Now..."
"All right," said the computer, and settled into silence again. The two men fidgeted. The tension was unbearable.
"You're really not going to like it," observed Deep Thought.
"Tell us!"
"All right," said Deep Thought. "The Answer to the Great Question..."
"Yes..!"
"Of Life, the Universe and Everything..." said Deep Thought.
"Yes...!"
"Is..." said Deep Thought, and paused.
"Yes...!"
"Is..."
"Yes...!!!...?"
"Forty-two," said Deep Thought, with infinite majesty and calm.”
― Adams, Douglas. The Hitchhiker's Guide to the Galaxy. Ballantine, 1980.






4 comments:

  1. Because the 'Inside Google Books' article refers to "books of the world", I would hope that the count isn't limited to those published in the English language. However, there's no explicit mention of the issue James.

    While I can easily imagine OCR works OK for other Latin-based languages, I admit I that have no knowledge of its usage for other scripts, e.g. Cyrillic, Japanese, Chinese, Korean.

    There must be a large number of published works in the associated languages but you wouldn't find many of them in a US/UK library.

    Maybe I'm just being pessimistic.

    ReplyDelete
    Replies
    1. Hmm. I think I will look into the issue of OCR in alternative scripts. That sounds interesting. Thanks for the comment and the idea.

      Delete
  2. All I can say is… this would require a lot of time to process metadata and other details associated with the digitized books, should it eventually come to pass. Although digitized books do pave the way to easier access, it’s a matter of how well they’ll handle the digital catalog and how extensive it will be that we can measure usefulness. Because digitizing for the sake of just having digital copies isn’t exactly productive.

    Ruby Badcoe

    ReplyDelete
    Replies
    1. Yes, good point. But I think that most of the work has already been done by the libraries as far as metadata and cataloging.

      Delete