document scanning, PDF conversion, OCR, databasehome

 About         Samples        Contact         Customers

Description of Electronic Publishing and Adobe® PDF

Electronic Publishing is the process of scanning paper documents and converting them to a digital format for viewing, searching or printing.

Description of Process


Step 1 —  Scan to Tiff

TIFF is the image file format produced during the scanning process. 

Sample  tiff scanned for Stanford University from an book published in 1891. From this stage, the image can then be converted to a variety of formats: PDF, text, HTML, etc.

Note: Documents requiring special handling to enhance contrast, odd-size or flimsy paper stock will be priced individually.

Step 2 —  Create PDF - necessary step for ease of viewing documents

Option 1 - Basic PDF file format  

PDF Image Only is produced from tiff images without OCR (optical character recognition). The document is not searchable and the file size is relatively large.  This is the lowest-cost PDF solution – a good option for document archiving.

Sample of the Historic Resources Inventory for the City of Gilroy. These documents have been indexed for the user by Architectural Style, APN and Architect. 


Option 2 - The next best PDF file format

PDF Image + Hidden Text is produced from tiff images using OCR to produce searchability of the hidden text layer. The visible layer is an exact duplicate bitmap image of the document itself. Without error correction of the hidden text layer, the document is searchable only to the extent the computer is successful at recognition of the text (OCR). This is also a low-cost PDF solution. Quoted price is based upon processing without extra OCR correction. 

Sample of a document scanned and converted to PDF Image + Hidden Text for Stanford University from an book published in 1891. 

Sample of underlying Hidden Text taken from above sample. This level of accuracy was achieved without additional error correction*. 

* Note: If additional error correction of the hidden text layer is required, this additional cost will be dependent upon the quality of the original document which is directly related to OCR success. This price is an estimate only and will be refined after working with original documents.


Option 3 -  The Premium PDF file format

PDF Normal  In this case, the images are processed through OCR, all suspects are corrected to 99.95% accuracy level, fonts are matched to the original, images (if present) are scanned and inserted. The image layer is replaced with 100% searchable text, which results in a much cleaner-looking document, and an extremely small file size.  This is an ideal format for presenting on a website due to its compact file size.  This price is estimated. Actual cost will be determined after working with original documents. 

Sample- was taken from a PDF file we created from a 1920 Olympic Review magazine. This was a very high-end, complex conversion considering the age of the originals, especially the photos, and the unique fonts.


Search Instructions: To search a PDF document, click on the "binoculars" in Acrobat Reader (which should have loaded upon clicking on the sample link), and then enter a word you see on the page. If it has been OCR'd successfully, the word will be highlighted on the page.

If you do not have Acrobat Reader, click here to download (necessary only once).

 


Step 3 —  Create a text-based document, usually in Microsoft Word format

Text format  In this format the images are processed through OCR, all suspects are corrected to 99.95% accuracy level and the document is reformatted to match the original.  

Sample


Keyboarding Where OCR is not possible, the document is keyboarded from the original tiff printout. Cost for this process is based upon character count.

* Note:  this price is based upon average character count for a typical text document. Exact price will be determined according to actual character count. Special transcription of handwritten text will be priced separately.


Click Here for more information


Altoscan
2280 Grass Valley Highway
Suite 232

Auburn, CA 95603

Tel 530.268.0477 | Toll-free 877.252.8294 | Fax 530.268.9060

email: info@altoscan.com