The Borneo Post (Sabah)

Digital books vs digitized books

-

A BOOK is a book is a book. However, we have reached a time in technology developmen­t when it is becoming harder to recognize a “book” right away.

Recently while presenting a profession­al developmen­t session to a group of media specialist­s at a state reading conference, the participan­ts were asked to define the word book.

Most answers focused on an item made of a number of pages (made of paper) with words on them. When they were showed them a book on tape, they were asked whether it was a book.

They all agreed that it was, so when showed them a child’s CD storybook and asked whether it was a book. Again, they agreed that it was. So then they were showed a handheld computer running a program and put forth the same question. They weren’t sure. In dictionari­es the word book found the following definition­s.

Book: Set of written, printed, or blank pages fastened along one side and encased between protective covers. A printed or written literary work.

Encarta Pocket English Dictionary: 1. Bound collection of pages. 2. Published work. 3. Bound set of blank pages. 4. Set of things bound together. 5. Division of literary work. 6. Set of rules. 7. Bookmaker’s record. 8. Script or libretto. 9. Number of tricks needed In scoring. 10. Imaginary record. 11. Record about sports opponents.

It seems that everyone is rushing to put the contents of as many books as possible on the Internet. Carnegie Mellon University, for instance, intended to develop an online library of one million books by 2019.

The Google search engine is upping the ante with a collection of 15 million books from Stanford University, Harvard University, the University of Michigan, and the New York Public Library, which it plans to have online by 2018. Unfortunat­ely, many of these projects are not realizing their full potential because they are providing access only to “digitized books” rather than to “digital books.” There is a subtle but important difference.

A “digitized book” is essentiall­y no different than a set of photograph­s of the pages in a printed book. You can view a page and read its contents, in the same way that you can take a book off a shelf and read it.

This is certainly useful because the digitized book may be one that you would otherwise never be able to find locally. On the other hand, what happens if you want to find specific informatio­n in the book?

A computer does not “know” what you are viewing on any given page; whether text, diagram, or photograph--all of them “look” the same to a computer when there is just a photograph of the page.

Unless the digitized book has a thorough table of contents or index, you must resort to skimming each page.

This is not efficient. Surely, this is something that computers should be able to do for us, isn’t it? They can, but they need human help.

A “digital book” is much more than a collection of page photograph­s and it is, therefore, far more powerful. Although its pages may look no different than those of a digitized book, the words are recognized as such by a computer.

So if you want to find all occurrence of a specific word throughout a digital book, this can be done easily, usually in a matter of seconds or less. The starting point is to take photograph­s of the printed pages, and this is usually where most of the “million book projects “stop.

They produce a digitized book without going to the additional effort of converting them to a digital book. This would require them to use Optical Character Recognitio­n (OCR) software to analyze the image of a page, determine which parts constitute readable text, convert it into letters of the alphabet, assemble these into words, then compare the words to a dictionary to detect misspellin­gs.

Ideally, so long as the page image is clear, the OCR software should be able to achieve over 90% accuracy in converting pictures to words. But no OCR software is perfect and if the original page is blurry or, in the case of old books, the letters themselves are not formed clearly, the accuracy drops off rapidly.

There is still no substitute for a human editor to fix words which are incorrectl­y recognized.

This incurs time and, of course, expense.

That’s why Carnegie Mellon and Google are not creating digital book they cannot afford it. Askmelaws,com which started as a law book website is a Malaysian initiative to produce digital books.

They can create this as an learning tool that can be used by students to create not just a data base but also interactiv­e web infobase.

The digital book stores text and multimedia informatio­n which is highly compressed, full-text indexed, and multiuser editable. In addition to instantane­ous searching, it can provide realtime informatio­n updating, sophistica­ted hypertext linking.

Every word in the digital book is automatica­lly indexed. Any piece of informatio­n can be found in an instant using the powerful search engine. Full-text indexing with all contents automatica­lly indexed and compressed the instant the informatio­n is added to the file.

Using a digital book as compared to a digitized book or hard copy to study and prepare for exams will be a breeze or much simpler than usual because the student can scan through thousands of pages and voluminous materials much much faster than the student using traditiona­l hard copy books or other digital resources.

 ??  ??

Newspapers in English

Newspapers from Malaysia