Crawford's Compressed Display

From JOCRAW@macc.wisc.edu Wed Feb 14 15:52:00 1996
Date: Thu, 2 Feb 1995 08:33:24 -0500
From: Josephine Crawford
Reply to: usmarc@loc.gov
To: Multiple recipients of list
Subject: Compression study : File 3 of 3

File 3 of 3                                  Josephine Crawford
re: MARBI Proposal 95-2                      Univ of Wisconsin
1/31/95                                      jocraw@macc.wisc.edu
 
COMPRESSED OPAC DISPLAY

(with some observations at the end)

SU;TS BIOTECHNOLOGY [single keyword search of topical subject headings]

------------------------------------------------------------------------------

   LINE   ITEMS           LIST OF ENTRIES          129 LINES MATCH YOUR SEARCH
     1.     1  Acremonium--Biotechnology.
     2.   124  Agricultural biotechnology.
     3.     4  Algae--Biotechnology.
     4.     3  Amino acids--Biotechnology.
     5.     1  Amylodextrins--Biotechnology--Congresses.
     6      9  Animal biotechnology.
     7.     24 Animal cell biotechnology.
     8.     2  Antibiotics--Biotechnology.
     9.     1  Antisense DNA--Biotechnology.
    10.     1  Antisense RNA--Biotechnology.
    11.     1  Archaebacteria--Biotechnology--Congresses.
    12.     1  Arid regions plants--Biotechnology--Congresses.
    13.     1  Aromatic plants--Biotechnology.
    14.     1  Aspergillus--Biotechnology.
    15.     3  Bacillus (Bacteria)--Biotechnology.
    16.     3  Bacillus subtilis--Biotechnology.
    17.     2  Barley--Biotechnology.
    18.     1  Beta lactam antibiotics--Biotechnology.
    19.     1  Bilayer lipid membranes--Biotechnology.
    20.     1  Biological pest control agents--Biotechnology.
    21.     1  Biomass chemicals--Biotechnology.
    22.     1  Biomimetics--Biotechnology--Congresses.
    23.     1  Biopolymers--Biotechnology--Congresses.
    24.   634  Biotechnology.
    25.   108  Biotechnology industries.
    26.     3  Biotechnology industry.
    27.     5  Biotechnology laboratories.
    28.     1  Blood coagulation factors--Biotechnology--Congresses.
    29.     2  Blood products--Biotechnology.
    30.     1  Blood proteins--Biotechnology--Congresses.
    31.     1  Carbohydrates--Biotechnology--Congresses.
    32.     4  Cellulose--Biotechnology.
    33.     1  Chiral drugs--Biotechnology.
    34.     1  Clostridium--Biotechnology.
    35.     1  Coal--Biotechnology--Congresses.
    36.     2  Corn--Biotechnology.
    37.     2  Cyanobacteria--Biotechnology.
    38.     2  Drugs--Biotechnology
    39.     1  Endophytic fungi--Biotechnology.
    40.     6  Environmental biotechnology
    41.    17  Enzymes--Biotechnology.
    42.     2  Erythrocytes--Biotechnology--Congresses.
    43.     1  Erythropoietin--Biotechnology--Congresses.
    44.     1  Flavor--Biotechnology--Congresses.
    45.     1  Flavoring essences--Biotechnology.
    46.    31  Food--Biotechnology.
    47.     3  Forestry biotechnology.
    48.    11  Fungi--Biotechnology.
    49.     1  Gelidium--Biotechnology--Congresses.
    50.     1  Glycoproteins--Biotechnology--Congresses.
    51.     1  Growth factors--Biotechnology.
    52.     1  Growth promoting substances--Biotechnology.
    53.     1  Herbicide-tolerant crops--Biotechnology.
    54.     1  Hops--Biotechnology--Congresss.
    55.     3  Immobilized enzymes--Biotechnology.
    56.     1  Immunoglobulins--Biotechnology.
    57.     1  Information storage and retrieval systems--Biotechnology.
    58.     1  Insect cell biotechnology.
    59.     1  Insulin--Biotechnology.
    60.     1  Legumes--Biotechnology.
    61.     1  Lignin--Biotechnology.
    62.     4  Lignocellulose--Biotechnology.
    63.     2  Lipid membranes--Biotechnology.
    64.     1  Lipids--Biotechnology--Congresses.
    65.     1  Liposomes--Biotechnology.
    66.     ?  Marine biotechnology.
    67.     1  Materials--Biotechnology--Congresses.
    68.     1  Medicinal plants--Biotechnology.
    69.     1  Methylotrophic microorganisms--Biotechnology.
    70.    24  Microbial biotechnology.
    71.     1  Microbial enzymes--Biotechnology.
    72.     1  Microbial metabolites--Biotechnology.
    73.     1  Microbial peptides--Biotechnology.
    74.     1  Microbial polysaccharides--Biotechnology--Congresses.
    75.     2  Microorganisms--Biotechnology--Catalogs and collections.
    76.     2  Molds (Fungi)--Biotechnology.
    77.     1  Monoclonal antibodies--Biotechnology.
    78.     1  Natural gas--Biotechnology--Congresses.
    79.     1  Nerve growth factor--Biotechnology.
    80.     1  Oils and fats--Biotechnology--Congresses.
    81.     4  Oilseed plants--Biotechnology.
    82.     1  Oligosaccharides--Biotechnology--Congresses.
    83.     1  Optical isomers--Biotechnology.
    84.     1  Organic compounds--Biotechnology--Congresses.
    85.     1  Organic solvents--Biotechnology.
    86.     1  Paper mills--Biotechnology--Congresses.
    87.     1  Penicillium--Biotechnology.
    88.     1  Peptides--Biotechnology.
    89.     1  Petroleum--Biotechnology--Congresses.
    90.    17  Pharmaceutical biotechnology.
    91.     1  Phenylalanine--Biotechnology.
    92.     1  Pigments (Biology)--Biotechnology.
    93.    60  Plant biotechnology.
    94.     1  Plant lipids--Biotechnology.
    95.     1  Plant products--Biotechnology.
    96.     1  Plant viruses--Biotechnology.
   100.     1  Plastics--Biotechnology.
   101.     1  Polyethylene glycol--Biotechnology.
   102.     1  Polymers--Biotechnology--Congresses.
   103.     2  Polysaccharides--Biotechnology--Congresses.
   104.     1  Potatoes--Biotechnology.
   105.     1  Proteinase--Biotechnology.
   106.     1  Proteinase--Inhibitors--Biotechnology.
   107.    14  Proteins--Biotechnology.
   108.     1  Pseudomonas--Biotechnology--Congresses.
   109.     2  Rice--Biotechnology.
   110.     1  Saccharomyces--Biotechnology.
   111.     2  Saccharomyces cerevisiae--Biotechnology.
   112.     2  Single cell proteins--Biotechnology.
   113.     1  Soybean--Biotechnology--Economic aspects--United
                 States--Congresses.
   114.     1  Sustainable agriculture--Government policy--Biotechnology
   115.     1  Synthetic vaccines--Biotechnology.
   116.     1  Tall fescue--Biotechnology.
   117.     1  Thaumatins--Biotechnology.
   118.     2  Tomatoes--Biotechnology.
   119.     6  Trees--Biotechnology.
   120.     1  Tropical crops--Biotechnology--Congresses.
   121.     2  Vaccines--Biotechnology.
   122.     1  Vegetables--Biotechnology.
   123.     1  Vitamins--Biotechnology.
   124.     1  Wheat--Biotechnology.
   125.     1  Wood-pulp--Biotechnology--Congresses.
   126.     2  Woody plants--Biotechnology--Congresses.
   127.     1  Xylanases--Biotechnology--Congresses.
   128.     1  Yeast--Biotechnology.
   129.     7  Yeast fungi--Biotechnology.
 

Some Final Observations
-----------------------

My compression reduced the display from 480 entries to 129.

The key issue is how to handle the headings which contain more than one category of subdivision; the statistics in File 1 show that this is not an uncommon situation. Before I started this project, I thought the solution was to "double post" under each category.

For example, the heading:

Biotechnology--Methodology--Handbooks, manuals, etc.

would appear under:

Biotechnology--SUBDIVIDED BY TOPICAL ASPECT and Biotechnology--SUBDIVIDED BY FORM OR TYPE OF MATERIAL

This approach might be best in some OPACs given how the displays are designed. However, in working through my sample, I came to the conclusion that users were best served by "full" compression on the subject heading list; only if the user requested display of a "compressed entry" would the OPAC display the subdivision choices. On this second display, double posting should occur in my opinion. (See example screen below.)

In a keyword search, the compression rules will have to deal with a real variety of subject heading combinations. My rule of thumb was to collapse like headings up to the highest level present on the list. For instance:

Example 1:
Biotechnology

exists as a main heading in the "uncollapsed" list. I pulled together all subdivisions of this main heading in my "compressed" display.

Example 2:
Algae--Biotechnology Algae--Biotechnology--Congresses

exist on the "uncollapsed" list. My compression moved up one level, to the first subdivision, but did not move all the way up to the main heading.

Example 3:
Biopolymers--Biotechnology--Congresses

does not get collapsed at all, since it exists alone on the "uncompressed" display.

If a user asks for the display of the BIOTECHNOLOGY main heading on my "compressed" list, I envision supplying him/her with a second screen along these lines.....

--------------------------------------------------------------------------

129 SUBJECT HEADINGS MATCH YOUR ORIGINAL SEARCH

The subject heading you requested has enough material cataloged under it that you have several display choices. Type the letter of your choice.

Line 24 : BIOTECHNOLOGY

	     Item Count	
 
     a.        634            All items cataloged under this heading.
				   [matches number on "compressed" list]
 
     b.        175            General works on your subject.
                                   [$a only in subject heading]
 
     c.        250            Broken down by topical aspects.
                                   [$x present in subject heading]
 
     d.        133            Broken down by geographical area.
                                   [$y present in subject heading]
 
     e.        336            Broken down by form or type of material.
                                   [$v present in subject heading]
------------------------------------------------------------------------
[chronological period would be another choice, if any existed in my sample]

If a user chose to display one of the first two choices (A or B), then the user would see an alphabetical list of the items themselves.

If a user chose to display one of the last three choices (C or D or E), then the user would see an alphabetical list of all the subject headings where the applicable subfield code was present.

Given that we are dealing with a total of 634 catalog records under this main subject heading, and given that the total number of catalog records posted under C, D, and E above is 894, I can report a duplication count of 260. In other words, in this example, doubleposting occurs for roughly 1/3 of the catalog records.

As a systems librarian, I am aware of the fact that compression and double posting would need to be evaluated from the perspective of computer processing limitations, response time requirements, etc. My goal here is to move people's thinking along on what might be possible. I believe that I played with the data enough to have found other "logic" issues, but I did not find anything significant from an analysis perspective, at least in my small sample of 480 headings.

It will be interesting to see what other people's thoughts are on this quick study.

(note: many thanks to Arlene Taylor and Martha Yee who both suggested some wording on the above display.)

Back to Form Data Follow-Up page

(saved \jo\file3)