Description of Process
|
|
|
Step
1 — Scan to Tiff
|
|
TIFF
is the image file format produced during the scanning process.
Sample
tiff scanned for Stanford
University from an book published in 1891. From this
stage, the image can then be converted to a variety of formats: PDF, text,
HTML, etc.
|
|
Note:
Documents requiring special handling to enhance contrast, odd-size or
flimsy paper stock will be priced individually.
|
|
|
Step
2 — Create PDF - necessary
step for ease of viewing documents
|
|
Option
1 - Basic PDF file format
PDF
Image Only
is
produced from tiff images without OCR (optical character recognition). The
document is not searchable and the file size is relatively large.
This is the lowest-cost PDF solution – a good option for document
archiving.
Sample
of
the Historic Resources Inventory for the City of Gilroy. These documents
have been indexed for the user by Architectural Style, APN and
Architect.
|
|
|
Option
2 - The next best PDF file format
PDF
Image + Hidden Text is
produced from tiff
images using OCR to produce searchability of the hidden text layer. The
visible layer is an exact duplicate bitmap image of the document itself.
Without error correction of the hidden text layer, the document is
searchable only to the extent the computer is successful at recognition of
the text (OCR). This is also a low-cost PDF solution. Quoted price is
based upon processing without extra OCR correction.
Sample
of a document scanned and converted to PDF Image + Hidden Text for Stanford
University from an book published in 1891.
Sample
of underlying Hidden Text taken from above sample. This level of accuracy
was achieved without additional error correction*.
|
|
* Note:
If
additional error correction of the hidden text layer is required, this additional
cost will be
dependent upon the quality of the original document which is directly related to OCR success. This price is an estimate only and will be refined
after working with original documents.
|
|
|
Option
3 - The Premium
PDF file
format
PDF
Normal
In this
case, the images are processed through OCR, all suspects are corrected to
99.95% accuracy level, fonts are matched to the original, images (if
present) are scanned and inserted. The image layer is replaced with 100% searchable
text, which results in a
much cleaner-looking document, and an extremely small file size. This is an ideal format for presenting
on a website due to its compact file size.
This price is estimated. Actual cost will be determined after
working with original documents.
Sample-
was taken from a PDF file we created from a 1920 Olympic Review magazine.
This was a very high-end, complex conversion considering the age of the
originals, especially the photos, and the unique fonts.
|
|
|
Search
Instructions: To
search a PDF document, click on the "binoculars" in Acrobat Reader
(which should have loaded upon clicking on the sample link), and then
enter a word you see on the page. If it has been OCR'd successfully, the
word will be highlighted on the page.
If
you do not have Acrobat Reader, click here to download (necessary only
once). 
|
|
|
Step 3 —
Create a text-based document, usually in
Microsoft Word format
|
|
Text
format
In
this format the images are processed through OCR, all suspects are
corrected to 99.95% accuracy level and the document is reformatted to
match the original.
Sample
|
|
|
Keyboarding
Where OCR is not possible, the document is keyboarded from the original
tiff printout. Cost for this process is based upon character count.
* Note:
this price is
based upon average character count for a typical text document. Exact
price will be determined according to actual character count. Special
transcription of handwritten text will be priced separately.
|
|
|
Click
Here for more information
|