From JOCRAW@macc.wisc.edu Wed Feb 14 15:51:21 1996
In preparation for the MARBI discussion on Proposal 95-2, I took a little
time this weekend to work on the questions raised in the proposal in the
section "Concept-based Display" (that is, a compressed OPAC headings
display). At the bottom of page 5 of the Proposal, there is the following
example and remarks:
----------------------------------------------------------------------------
I felt that this issue could be clarified by some old-fashioned systems
analysis using some real data. In addition, such an analysis might uncover
other problems/issues which should be addressed. Given that the Univ of
Wisconsin-Madison OPAC has the capability to download a subject heading
list which I could then manipulate using a spreadsheet and word processor,
I picked out a common keyword search and set to work. I analyzed the data
statistically and I manipulated the headings by hand to achieve
compression, so that I could show Before/After displays.
This report is divided into three files:
Contents of Dataset (please think of the stats below as
------------------- preliminary only; double-checking not
yet done)
I chose to perform a topical subject heading keyword search on a single
word: BIOTECHNOLOGY. This resulted in an alphabetical list of 480 subject
headings, beginning with "Acremonimum--Biotechnology" and ending with
"Yeast fungi--Biotechnology." The list includes a large number of headings
which begin with the keyword Biotechnology. These 480 subject headings
come from 1055 catalog records.
The dataset includes a mixture of LCSH and MeSH headings and the current
display makes no attempt to differentiate between the two. In addition,
please note that our automated authority control programs have not been
extended to topical subject headings as of yet, so that no references or
scope notes appear and some of the headings need correction.
I could have performed a left-to-right phrase search rather than a keyword
search, thereby limiting my topical subject search to just those headings
beginning with the word BIOTECHNOLOGY. However, I prefer to work with a
"worst case" scenario in order to uncover issues and analyze solutions.
Therefore, I choose the keyword search so that the resulting OPAC display
has more content and complexity.
In my analysis, I assigned each of the 480 subject headings to one or more
of the following categories:
An observation
In an alphabetical, uncompressed display, the display of a main heading and
all its subdivisions can be interrupted by an intervening term (and its
subdivisions). This is the case in my sample dataset. This problem
disappears in a compressed display, as long as the program logic is set up
so that the computer searches through to the very end of the headings under
the main term with which it is working. I see this as a helpful change and
have therefore manipulated my "compressed" display along these lines.
Back to Form Data Follow-Up page
(saved \jo\file 1)
Date: Thu, 2 Feb 1995 08:28:23 -0500
From: Josephine Crawford
Reply to: usmarc@loc.gov
To: Multiple recipients of list
Subject: Compression Study : File 1 of 3
File 1 of 3 Josephine Crawford
re: MARBI Proposal 95-2 Univ of Wisconsin
1/31/95 jocraw@macc.wisc.edu
"Example of such a display:
English literature -- SUBDIVIDED BY CHRONOLOGICAL PERIOD
English literature -- SUBDIVIDED BY FORM OR TYPE OF MATERIAL
English literature -- SUBDIVIDED BY GEOGRAPHIC AREA
It should be noted, however, that with LCSH strings, such an
abbreviated display reflects only the nature of the subdivisions at the
level of the first subdivision. Selecting and viewing English Literature -
- SUBDIVIDED BY FORM OR TYPE OF MATERIAL does not retrieve all instances of
form subdivisions in strings that begin with English literature. Strings
with English literature subdivided by geographic or chronological
subdivisions may themselves have additional form subdivisions...."
File 1 : Introduction and description of my methodology;
Contents of Dataset (categories, statistics);
An observation.
File 2 : Sample "uncompressed" OPAC display.
(what occurs now)
File 3 : Sample "compressed" OPAC display;
Some final observations.
A) TOPICAL SUBJECT HEADING, NO SUBDIVISIONS
e.g. Agricultural biotechnology
Biotechnology industry
There are only 12 headings of this type but these 12 headings map to
289 catalog records. That is, if a user requests the display of these 12
headings separately, the user would see 289 items divided into twelve
separate sets.
B) ONE OR MORE TOPICAL SUBDIVISION PRESENT IN HEADING
e.g. Agricultural Biotechnology--Economic aspects
Biotechnology--Computer programs
Lignocellulose--Biotechnology--Congresses
Marine biotechnology--Research--United States
There are 360 headings of this type, mapping to 640 catalog records.
Of these 360 headings, 82 also have geographic subdivisions and 143 also
have form subdivisions, as in the last two examples above. These latter
statistics may be important in getting a handle on the compression issue
quoted above from the MARBI proposal.
C) ONE OR MORE GEOGRAPHIC SUBDIVISION
e.g. Agricultural biotechnology--Kenya
Biotechnology industries--Wisconsin
Biotechnology industries--Wisconsin--Directories
There are 60 subject headings with a geographic subdivision, mapping
to 305 catalog records. In addition, 47 of these are followed by a form
subdivision, as in the Directories example above.
D) ONE OR MORE CHRONOLOGICAL SUBDIVISIONS
There are none of this type in the dataset.
E) ONE OR MORE FORM/GENRE SUBDIVISIONS
e.g. Microbial Biotechnology--Periodicals
Pharmaceutical biotechnology--Congresses
There are 219 headings of this type, mapping to 563 catalog records.
PLEASE NOTE: this category composes just under half of the headings in
the dataset, and also just over half of the linked catalog records. (I
guess I was lucky enough to hit on a search rich with form/genre data.)
F) TWO FORM SUBDIVISIONS IN SAME SUBJECT HEADING
e.g. Biotechnology--Bibliography--Periodicals
Three of these headings have two form subdivisions.
G) ALL THREE TYPES OF SUBDIVISIONS IN THE SAME HEADING
e.g. Biotechnology--Databases--North America--Directories
Plant biotechnology--Research--Japan--Periodicals
There are 19 headings of this type; may be important for the
compression issue quoted from MARBI proposal.
--------------
e.g. Biotechnology--History
Biotechnology industries
Biotechnology industries--Argentina
Biotechnology--Information Services
Biotechnology--Instrumentation
Biotechnology laboratories
Biotechnology--Latvia
[uncompressed display shows interruption of logical sequence]