The Value of Electronic Documents
When a case involves electronic documents, it is common to try to render them to a typical format before reviewing and/or producing them. The most common formats are TIFF (tagged image file format), PDF (portable document format) and HTML (hypertext markup language). Although there are some differences among these alternatives, we can refer to them collectively as printouts.
A printout may not accurately represent the contents of electronic documents. The rendered version might be missing potentially critical information and contain other information that could be distracting or misleading. In Microsoft Word, for example, one can substitute a date field or macro for the current date (other kinds of dynamic fields are also available). Word substitutes today’s date for the contents of the date field every time the document is opened. Internally, the document contains a command to display the date, rather than the actual date. The printout changes depending on the day the document was rendered, which might be months after it was written. Other data may also be embedded in a document, including voice annotations, additional documents (such as a spreadsheet, only part of which is displayed), hidden text, comments and other information. Having only a printout of the document means that there is no way to tell what information you could be missing.
The problematic relationship between electronic and paper versions of documents is most apparent in databases and spreadsheets. These electronic “documents” enjoy widespread business use but are not typically designed to be printed. A spreadsheet printout, for example, eliminates the distinction between static data and formulas; numbers are all you see.
Revision Markup and Comments
In Microsoft Word, the Track Changes function can be used to record changes in the document as revisions are made. Users sometimes mistakenly believe that turning off the command to show these changes on screen or in the printed document means that the text has been removed. It hasn’t. Rather, unless specific action is taken, the revision markup remains in the document and is very easy to reveal.
Any evidence of revision marks and their modifications would obviously be lost if the document were rendered without the highlights. These marks could provide critical information, for example, by showing how earlier versions of a document described some event or specified tentative contract terms.
Hidden Text
Characters can be formatted to be “hidden.” Hidden text, by default, is neither displayed nor printed, but rather contained in the document. Users can employ hidden text to include notes or other information, or to hide sections that have been revised. Unless one takes explicit action, this text would also not be printed.
Hidden text is an even more severe problem in spreadsheets. In Excel, you can hide columns or rows of information or just specific cells. Data are routinely concealed by making the column width narrower than the text it contains. A printout of the page could conceal the hidden text. You can also hide text by covering it with graphics, such as graphs, drawings or logos. You can color the text in a cell to be the same color as its background, which would also prevent it from appearing in a printout. In order to get the complete contents of a spreadsheet, you need to view it as a spreadsheet. Ordinary printouts do not reveal this information.
The ability to conceal information is not limited to Microsoft applications. Tools from other vendors are prone to the same problems. Nor are these problems caused by intentional efforts to make it difficult to extract information from electronic documents. Often they are due to esthetic or functional judgments about how to best display data for the business purposes for which the document was written. Reviewing them in their native format provides the most power for actually understanding their content.
Although we have a tradition of seeing computers as primarily a means of preparing paper documents, in fact, an increasing percentage of computer documents were never designed to be printed directly. Some estimates are that 80 percent or more of electronic documents are never printed. In addition, an electronic document is only a potential printout. The same electronic document could be printed in many ways, each of which reveals only a fraction of the information in the original document. By accepting a printout as the true representation of an electronic document one may risk the loss of case-determining information.
Only electronic review of native format documents provides the potential for total information accessibility.
About our author . . .
Dr. Herbert Roitblat is the inventor of the core DolphinSearch technology. He was a professor for seven years at Columbia University and for 16 years at the University of Hawaii, where he studied language understanding and memory and modeled dolphin echolocation performance. He has published more than 75 papers and book chapters on cognitive science and dolphin echolocation. He is the Chief Scientist at DolphinSearch and is currently finishing a book on electronic discovery.