ExpertCentral.coman About company
Your Search is Over!
Expert Home Sign Up My ExpertCentral Answer Library Help
Search for Experts in 
View question by Expert saintly
Question History!
From : rclacy
To : saintly
User Comment : Saintly has been a tremendous amount of help!
Rating :
Message Status : Public

[08-16-2000] rclacy : Dear Saintly,

I've been thinking about converting my paper files into some sort of digital or electronic format. I have a scanner. I read an article or two about doing this last year and I know that there are several different issues to consider, but I don't know what they are! I lost the magazine articles. And I'm not sure what to look under in searching the net. I'm a missionary and my youngest files from seminary are already over 20 years old. Can you point me in the right direction?
[08-16-2000] saintly :
First... this probably isn't going to work quite like you're expecting (assuming you're expecting what most people are expecting)... :) Scanners are designed to scan images and pictures, not text. Many people want to be able to scan in documents and end up with something like a wordperfect file with the text on the page converted to text electronically. This rarely works.

In the few cases where this works well, the pages are specially designed to be scanned in with specific fonts and precise page layouts. What you will probably end up with, unless you are willing to spend a lot of time converting and buy some more expensive software, is a big collection of images.

Each page you scan would end up as a big image that you could look at later, zoom in on, scroll around or whatever. On the plus side, you may be able to store all of your old paperwork on one or two CDs.

Assuming that's all OK with you, here are some things to consider:

Scanner speed vs. how fast you want to get this done. There are bulk/high-speed scanners that can scan stacks of documents very quickly.

Resolution. Unless you need sharp detail in your documents, scan them in with as low resolution as you can stand to see. The lower the resolution, the smaller the image size.

Color, greyscale and black-and-white. Most scanners allow you to scan in either the full range of colors, shades of grey (greyscale) or strictly black-and-white. If scanning in printed material, black-and-white is usually fine. Scanning in handwritten documents usually means you have to use greyscale (the color of handwriting is not consistent to the computer. Going B&W can make it much more difficult to read later). Reserve color only for where you desperately need it. It bloats image sizes.

Archival media. All re-writable media (like floppies, hard drives, zip disks, rewritable CDs) decays. Floppies and zip disks have a lifespan of about 5-10 years, hard drives are in the 10-15 year range (if you're lucky), Rewritable CDs may last 20. Write-once CDs have a lifespan of over 100+ years if stored correctly. And they're cheaper than other media as well. As tempting as it is, do not buy in bulk (like those 50-packs). Get a name brand, Sony, Maxell, 3M. A 10-pack should be fine, unless you have a lot more documents than I'm imagining.

Image format. BMP vs TIFF vs JPG. JPG is usually the best image format, but many scanners will not output it by default. If you just use the BMPs or TIFFs the scanner spits out, you may be able to get 500-1000 pages on each CD. If you convert each to JPG, you can get probably close to 5000 on each CD.

Does that help? That is a summary of each topic I could think of. If you want more expansion on any of them, please reply back and I'll tell you more.
[08-17-2000] rclacy : Saintly,

I have an Epson 636U scanner with a USB connection. I have TextBridge Pro 9.0. Scanning in a document so that it opens in WordPerfect actually works pretty well. Not 100% perfect, of course, but pretty well.

Thanks for the info about the hardward. I was aware that I would need a CDRW in order to make this work.

What about software? Once I have scanned in the documents, I have to be able to retrieve them. A filing system is only as good as your ability to find what you've stored.

I'm thinking that I would gradually convert my paper files onto a CD,and even then I probably won't be able to get all my files into an electronic format.

I also assume that I will have to move the data in a few years onto some other medium.

Thanks for your help
[08-17-2000] saintly :
I'm glad you've had good experiences with TextBridge. It didn't work quite as well for us here.

There are some disadvantages to using CD-RW discs. First, they aren't readable on other CD/DVD drives. CDRW discs (unlike the write-once discs) can only be read in CD-RW drives. You would not be able to just take the RW disc over to someone else's computer and use it.

CDs are also not designed to be written to gradually. The most efficient use of the CD is to collect 650MB of information on a hard drive or Jaz disk or something and write it all at once to the CD. Each use of a CD tends to create a "session", each session takes up overhead space and there is a limit on how many sessions you can write to the CD. Older CD drives won't read multi-session CDs either. There are some software drivers that do let you write gradually, but these CDs have to be read in computers with the special software installed.

For indexing and filing the records electronically... you need to decide how much searching/sorting capability you want to have on these files. At present (on paper), you don't really have any. All your files may be categorized into folders, perhaps with a card catalogue system. You can easily replicate this, but adding more capabilities takes extra work. From simplest to most complex/expensive:

Files are organized in hierarchies of folders and subfolders on the CD.

Only the file names and folder names help you find specific files. You can use free third-party software programs to search through all files in all directories looking for specific text in them.

Advantages: Almost no effort involved, replicates your current filing system almost exactly.
Disadvantages: Limited searching of data (none by default), no index

Files are organized in hierarchies, with HTML index files

As above, but you create HTML documents in the folders and subfolders describing the files in and below them. You can have extensive cross-linking, describe any information you like about the file, arrange it with graphics, enhance it with javascript to make a searchable database. If you use one big index file instead of (or in addition to; just break the big one down into smaller files, or concatenate the small files to make the big one) lots of little ones, you can even search on it fairly well.

Advantages: Free. Indexes can be read in any browser on any other computer, marginally searchable. If you export the files as plain text or images (not WordPerfect/Word files), you can even read them in Netscape as well.
Disadvantages: Knowledge of creating web pages required, some effort required to make the pages, search features are not that powerful unless you go to a lot of extra effort to make a Javascript/Java database as well.

Files are stored any convenient way, with a database on the disc.

You create a database file in MS Access (comes with MS Office) or FileMaker Pro (costs more, but much more powerful, flexible and easier to use) to describe your files, entering each file manually. You can set up as much extra information as you like: keywords, date file was entered, date file was first created, summary, or even load the entire file into the database.

Advantages: Powerful searching capabilities, easy exports and summaries. Corresponds more to having a separate card-catalogue filing system for the other files on CD.
Disadvantages: Costs money to buy the software, takes time and effort to set it up, requires the database software (FileMaker can make standalone database applications if you pay extra for that feature)

Files are loaded into a specialized document management database

You buy or commission a special software package designed to store, index and file documents.

Advantages: ?? Depends on the software, but searching capabilities are going to be the best. May be able to store several revisions of the same document, will require less effort than designing a similar database on your own.
Disadvantages: $$ Such products tend to be designed for law firms or medical records and priced out of the range of most people.

Here at the hospital, we have our own medical records software designed in-house. Our CDs of scanned images use the HTML indexing method described earlier though.

Does this help? I can go into more detail on a particular option if you like as well.
[08-17-2000] saintly :
One option I almost forgot to mention. It is also possible to collect, organize and index your documents into a semi-standalone package called an Electronic book. These also have search capabilities and features to search for words and phrases in the indexed documents. You can build fairly complex collections of documents.

I don't know very much about what is involved in making them, but Expert TexasT does, she has made some impressive e-books. If this solution interests you, you can click on the "Ask a Question" button for Software in her profile (linked above). She would know more about the benefits and drawbacks of using that method.
[08-20-2000] rclacy : Dear Saintly,

You really are being very helpful. And maybe I used the wrong designation for the CD drive? The CD drive that allows you to burn CD's once, not write and re-write the same one over and over again is called CD-R? Is that correct? The CD that is created in that kind of drive could be taken to another computer, as you say, and read like a "normal" CD?

Second, just to confirm what you said. It's better to collect a CD's worth of info, 650 MB, and then burn the CD, not do it gradual? Right?

Any other tips like that that you can think of?

[08-20-2000] saintly :
Yes... CD-R refers to write-once/permanent CDs that work in almost all CD/DVD drives; CD-RW refers to the ReWritable ones that only work in CDRW drives and degrade.

It is much better to collect 650MB of data and burn it to the CD all at once. CDs written that way can take advantage of the full 650MB and do not need a multisession-capable CD drive to read them. They look almost exactly like the pressed ("silver") CDs. Some of the oldest CD drives still have a hard time reading even those, but it is the most standard format.

Hope that helps!
Home | Sign Up | My ExpertCentral | Answer Library | Help | Log Out
Public Board | How it Works | Why Join? | Tell a Friend | About Us | Contact Us

Copyright © 2000, Inc. All Rights Reserved.
ExpertCentral and are trademarks of, Inc.
Use of this site constitutes your acceptance to the terms and conditions of the ExpertCentral Member Agreement.