How Full is the Full in
Full-Text?
A comparative study of paper periodicals with their web-based equivalents in the Ebsco, Information Access Company (IAC), UMI, and Wilson databases.
 
Carol Franck
Holly Chambers
SUNY Potsdam
 
Poster Session presented at the ALA annual conference
Washington D.C.
June 27, 1998

[ Abstract ] [ Introduction ] [ Methodology ] [ Results ] [ Figures ] [ Discussion ] [ Conclusions ]

Abstract:

The availability of “full-text” periodical articles via the World Wide Web is an area of exploding growth and change.  How accurately do the vendors supplying such services replicate the periodicals they purport to deliver?  Such an evaluation is vital when determining serial cancellations, subscriptions, and other costly decisions.  This study examined two current and one older issue for each of 25 journals and magazines.  The paper copy was compared with its on-line equivalent in the UMI, Information Access Company (IAC), Wilson, and Ebsco databases.  A standardized chart of components found in paper periodicals was developed and used to characterize each issue.  These charts were then used to determine the comprehensiveness, and for some components, quality, with which each vendor covered each issue.  Content was examined for representation of type (e.g. editorials, book reviews, articles) and for completeness of text (e.g. author affiliation, sidebars, footnotes).  Format was examined for handling/inclusion of graphics (tables, figures, and pictures), mathematical and other symbols, and non-standard text (e.g. poetry, dialogue).  There were substantial differences between vendors in the fidelity of their online products to the paper equivalent.  UMI provided the most accurate representation, followed by IAC, Ebsco, and Wilson.

Introduction:

The term "full-text" has no standard definition when it comes to electronic databases.  It implies that substantive content will be presented, but precisely what will be covered is not clear.  Venders of full-text periodical collections have begun offering their products through the World Wide Web, but what exactly are they offering?  A question of great importance to librarians, as they consider transitions to online resources and possible cancellations of print subscriptions, is how accurately vendors reproduce the paper content of a periodical through the World Wide Web.  Recent studies of vendor products focus more on interface features and listings of journals offered than issue-level specifics.  This study evaluates online representation of periodicals by comparing paper issues with their Web-based version in four different "full-text" databases.
 
Methodology:

We selected twenty-six periodicals (figure 1) currently received by the SUNY Potsdam libraries which were also present in web-accessible periodical databases from four vendors:  IAC's Expanded Academic ASAP (through Searchbank), UMI's Periodical Abstracts Research II (through ProQuest ), Wilson's Omnifile (through WilsonWeb), and Ebsco's Academic Search FullTEXT Elite (through EbscoHost.)  Three issues of each title were used for the study - the earliest issue of the year following that in which the title first appeared in all four databases, and the two most recent issues present in the library by January 1998.  Each issue in hand was searched for and compared with the online equivalent in the four databases.  Information about each online issue was recorded on a standard form (figure 2).  General comments about each vendor interface were recorded.  Fourteen of the titles were searched on a Dell Pentium using Netscape Communicator between December 18, 1997 and  February 11, 1998.  Each of the fourteen was completely searched within three days to establish comparative lag times.  Twelve titles were searched on a Power Macintosh 7200/120 using Netscape Communicator between December 20, 1997 and May 11, 1998.  No attempt was made to search these twelve within a short time frame.  Note that this study did not specifically rate vender interfaces and features, though some points are noted.

 Results:

Quantitative analyses (totals and percentages)  were compiled for the following items:

Figures 3 - 6 display the entry status of the issue (full-text, indexed only, etc.), percent of main content types in full-text (articles, book reviews, etc.) and percent of issues having complete internal content elements (graphics, sidebars, footnotes, etc.)  The black binders on the table include additional charts of our findings (i.e. number of issues for which internal content elements or graphics were incomplete, and the quality of the graphics), the raw numbers, comments on each issue, and printouts of selected examples.

Quality of the graphics varied considerably.  Factors which contributed to the wide quality range include whether the graphic was treated as a picture file or the information was re-keyed, and whether the material was presented in HTML or PDF form.  For example, tables were often successfully re-keyed, but a re-keyed chart could easily lose some usability.  Formatting was a big variable in re-keyed graphics.  When the graphic was treated as a picture file, the quality was almost always higher, with the main problem being images that were too dark.

No database completely reproduced the paper issues.  All points studied, including all of the main content types and internal content elements were missing from some full-text issues in every database.  Certain types of material were almost always missing, such as advertisements, classifieds, back covers, etc.  Fidelity of the material online to the issue in hand ranged from high to, in several cases, unusable or illegible.

The time lag of journal data entry into vendor databases varied considerably.  UMI was the most prompt in entering data, followed by Ebsco and SearchBank.  Wilson had a longer time lag.  Vendor policies varied in other areas as well.  For example, Wilson does not include the issue number anywhere, and UMI systematically includes the table of contents for each issue.

Journals Studied Figure 1:  List of Journals used in this study
Scan Pending Figure 2:  Survey form created for this study
Figure 3:  Percentages of Issues which are Full-Text, Indexed Only, or Not in the Database for each vendor
Figure 4:  Percentages of Content Items (articles, book reviews, editorials,...) found in the Full-Text issues for each vendor
Figure 5:  Percentage of Full-Text Issues with Complete Internal Content Elements
Figure 6:  Percentage of Full-Text Issues with Complete Graphics
Note: Representative examples of printouts from the four databases shown during the poster session are not reproduced here.

Discussion:

Notable differences exist among the four databases.  Material which was primarily text could be handled by all the vendors, but graphics and uncommon features (foldouts, oversized volumes) caused difficulty and some items were not present or mentioned at all (map inserts).  Even the small sample of periodicals examined revealed significant errors and omissions on the part of all four vendors.  Problems encountered included missing items, missing or low quality graphics of all types, mismatched citations and content, significant formatting problems for tables and poetry, and great difficulty representing non-ASCII symbols such as are found in equations.  Most disturbing were the instances where the vendors failed to mention when something had been omitted, such as charts or other figures – potentially key elements of an article's content.  The assumption that an article available in "full-text" is correctly described, even if not complete, is false.

Overall, UMI came closest to representing the full content of a paper issue electronically.  UMI had the highest percentage of full-text issues as well as the most complete representation of content through its scanned PDF files.  It also had the shortest lag time for entering data; 75 of the 78 issues studied were available in full-text.  UMI offers three types of "full-text" entries: text only, text + graphics, and scanned PDF files.  In terms of graphics quality, the text + graphics was uniformly high, but that option was not always available.  The PDF files had darker, blurrier graphics - a significant problem with pictures both on the screen and in printouts..

IAC and Ebsco ranked in the middle.  IAC had more graphics than Ebsco and its scanned PDF files were of excellent quality - better than UMI, though only available for selected items.  Ebsco had a shorter lag time, graphics of generally higher quality (though fewer of them), and  some graphics were in color.  In Ebsco, tables were often re-typed rather than given as picture files; this tended to make them less usable.  Comparing the oldest issues with the newest, it is clear that both SearchBank and Ebsco are improving their graphics capabilities.

Wilson's lack of graphics capability and its lengthy data entry lag time put it in fourth place.  The limited graphics present were re-entered as text, not included as picture files.  Only 40 of the 78 issues examined were present in Wilson at the time of the study. Wilson's strengths presently lie in its subject indexing and abstracting, not in its full-text capabilities.

Full-text versions of periodicals are also available on the Web through other subscription models, such as individual subscriptions through the publisher, or OCLC's Electronic Collections Online.  Periodicals available through these sources were not looked at in this study, and may more accurately represent the paper version.
 
Conclusions:

The four web-accessible full-text collections examined in this study do not, in their present incarnations, provide complete representations of print periodicals.  Certain common features, available in paper, but often left out by the vendors, may be the meat of an issue for some readers.  Notices of professional meetings, letters to the editor, or book reviews may be the most important items for some people.  In a magazine, the classifieds or the advertisements may be of most interest, and certainly vendor graphics are not yet adequate to get the feel of a  newsstand glossy.  Also, basic points such as whether each article is there and represented accurately, or whether a periodical will be archived, must still be addressed.

There are other limitations to these products.  Libraries which purchase access to these products need to realize that merely adequate service usually requires a fairly powerful computer, a reasonably fast internet connection, and access to laser printing.  In certain cases, the only usable version of an article is the PDF file, which requires Acrobat Reader to open, can be large, and sometimes is not legible on the computer screen.

The four products studied provide subject access, indexing and abstracting for hundreds to thousands of different journals, and good, full-text coverage of articles in many cases.  Libraries will, and should, subscribe to these databases for the many useful services they can provide, but they are not (yet) a complete substitute for paper periodicals and should not be thought of as such.



Copyright 1998.
No portion of this page may be reproduced without the permission of the authors.
This page is produced and maintained by Carol Franck (franckcr@potsdam.edu)
Last Updated: June 24, 1998.