DESIGN OF COMPUTER ASSISTED TESTING SYSTEM FOR DISTANCE EDUCATION S

Design of a Computer Assisted Testing System
for Distance Education Students

Ms. Robin G. Wingard and Dr. Nick Laudato
Center for Instructional Development & Distance Education
University of Pittsburgh

Ms. Robin G. Wingard, Assistant Director
University External Studies Program
University of Pittsburgh
3804 Forbes Avenue
Pittsburgh, Pennsylvania 15260
Phone: (412) 624-7216
Fax: (412) 624-7213
e-mail: rwi+@pitt.edu

Dr. Nick Laudato, Associate Director
Instructional Technology
University of Pittsburgh
3804 Forbes Avenue
Pittsburgh, Pennsylvania 15260
Phone: (412) 624-4999
Fax: (412) 624-7213
e-mail: laudato+@pitt.edu
URL: http://www.pitt.edu/~laudato

Contents

Background

UESP Testing Procedures

Program Design

Test System Requirements

Alternative Solutions

Program Development

Development Methodology

Early Prototypes

Working Prototype

Pilot Effort

Summary and conclusions

Future Directions

Design of a Computer Assisted Testing System
for Distance Education Students

With the thoughtful application of emerging technologies, distance education initiatives are not only allowing an increasing number of nontraditional students to access educational alternatives, but are introducing new solutions to old problems encountered by both nontraditional and traditional delivery systems. This paper reports on the design and implementation of a Computer Assisted Testing System (CATS) developed to support the University of Pittsburgh External Studies Program (UESP). CATS applies computer technology to address some drawbacks inherent in standard distance education testing procedures and offers educators a tool to derive pedagogical benefits from academic evaluation activities.

Background

Since 1972, Pitt’s University External Studies Program (UESP) has provided nontraditional students with the opportunity to earn credits toward a baccalaureate degree from the University of Pittsburgh. Every term, UESP serves between 1,100 and 1,300 students in approximately 70 sections offered from an inventory of over 130 courses. Students can earn credits in 30 discipline areas or complete one of three undergraduate majors (psychology, history or economics) or one of two areas of concentration (humanities or social studies) entirely through UESP.

The students served by UESP are generally an older population, representing a wide range of backgrounds and circumstances. They typically share an inability to accommodate traditional, campus-based programs of study due to conflicting professional and/or family obligations, geographic location or restricted mobility. In the UESP distance education model, these students complete course requirements independently using self-instructional print materials developed by University faculty. UESP also provides three on-campus Saturday classes (workshops) during the term for each course section. Students progress through the materials and master the course requirements where and when they are best able.

Emerging technological resources, including two-way audio conferencing, interactive television (ITV), and computer-assisted instruction have significantly expanded distance education alternatives beyond self-instructional print materials. Instructors can deliver instruction to different locations, to multiple locations simultaneously, and through computer networks to students’ homes or public computing labs. The administration of academic testing has been less amenable to this kind of flexibility, due primarily to security concerns.

UESP Testing Procedures

The University External Studies Program's standard testing procedures provide flexibility in when and where supervised course exams are administered. Students progress at their own pace through course requirements, taking course-specific supervised exams when they are able and at a testing site convenient to them. Supervised testing sites have been established at each of the University of Pittsburgh’s five campuses and at seven off-campus-testing sites located throughout Western Pennsylvania. Strict testing procedures are enforced at all testing locations to ensure consistent exam security.

Although these procedures support the separation in time and space of the learner from the learning facilitator, they introduce inherent administrative difficulties and restrict the pedagogical benefits derived from the evaluation process itself. The procedures involve mailing completed exams from off-campus testing sites to the Pittsburgh campus, then distributing the exams to the appropriate faculty member on campus for scoring, and finally, mailing feedback to the student. These steps introduce a significant time delay between exam administration and the time students receive feedback on their performance. This time delay not only reduces the relevancy and effectiveness of the feedback for students but also has been found to inhibit continuity in students’ subsequent progress. In addition, reliance on mail services introduces security and reliability risks and incurs considerable expense.

UESP Testing System Information Flow

Other limitations exist as well. The instructional benefits derived from the feedback are limited as a result of the self-paced nature of the University External Studies Program. Because students take exams at a time of their individual choosing (not simultaneously) instructors are unable to return graded exams without jeopardizing exam security. Students receive an exam score and general comments on their performance, but no item-specific substantive feedback. They can only review the particulars of their performance by meeting with the course instructor or an administrator on campus, thus compromising the intended flexibility of the distance education model.

The inability to provide prompt, individualized feedback on each student’s performance is as likely to be experienced in large traditional college classrooms as in distance education programs. However, the problem is compounded in distance education by the defining characteristics of the self-paced program itself, i.e., the need to support and evaluate students’ learning at different times and in different locations.

UESP’s Computer Assisted Testing System (CATS) was conceived to address the inherent disadvantages of traditional paper-and-pencil testing and delivery procedures, and to make academic evaluation and feedback as accessible to the distance learner as the instruction itself. The remainder of this paper describes the design, development, and initial pilot implementation of CATS.

Program Design

In order to assist in articulating the testing system needs and evaluating alternative solutions, the authors formed a design team composed of UESP support staff, a regional campus representative, and faculty members with experience teaching through UESP. The CATS Design Team provided varied perspectives to the planning and design process, and followed the project through to implementation. They performed research on testing and instructional systems and software, identified testing system needs, analyzed their collective research and requirements, and proposed alternative solutions.

Test System Requirements

In order to be successful, the envisioned Computer Assisted Testing System would need to address several fundamental conditions. First and foremost, it must be academically equivalent to the existing testing system. Because of the size and complexity of UESP’s current supervised testing system, CATS would be phased in gradually over several years. Given this need for long-term coexistence, CATS must be completely compatible with the existing testing system. It must be able to operate at all of the University's regional campus testing sites and at community testing sites that are not linked to the University's computing network. In addition, it must be able to administer exams simultaneously at multiple sites.

Based on its analysis, the CATS Design Team identified the following requirements for a computer-assisted testing system:

The testing experience through CATS should be comparable to the traditional, paper-and-pencil testing experience for students. The introduction of a computer into the testing environment could not compromise students' ability to perform well. Research on computer assisted testing presents conflicting opinions on the influence of introducing technology into the testing process and on the influence of specificity, timing and acceptability of feedback on students' attitudes, their anxiety levels and their performances. Burke provides support for the intuitive notion that previous computing experience influences the level of anxiety experienced and suggests that supportive interfaces such as clear instructions, practice time, and the ability to skip questions and correct mistakes can equalize this effect. Others found that, even with practice time, anxiety levels continued to increase but that performances were not affected. Although the literature yields conflicting results, it is clear that CATS must be carefully designed to ensure comparability with the existing testing system.
CATS should have a user-friendly interface to facilitate its use by even the most naive computer users. The distance learning population served by UESP could be assumed to have widely varied computer skills, ranging from those with a high level of expertise to those with no experience at all. Students testing with CATS could not be put at a disadvantage to students who did not test on CATS. They would need the same flexibility when testing on the computer as they would with a paper and pencil exam. Students must be able to move back and forth through the exam freely, to change answers, fill in unanswered questions or simply review their own responses.
CATS should score exams in real time and provide the results to the student immediately upon completion of the exam. Immediate feedback is critical to CATS’s ability to address the limitations of the current manual testing system.
CATS should provide detailed feedback on performance. Students should be given the opportunity to review substantive, item-specific feedback written by the course instructor without jeopardizing the validity of the exam for students testing at a later time. This capability restores the instructional benefits of testing and provides students with a means of evaluating and ultimately adjusting their learning and test taking effectiveness.
CATS software should be "bullet proof." It should virtually never fail, and it should allow the student's work to be recovered in the event of a power outage or system failure.
CATS should provide an extremely high level of exam security. It should completely confine the test taker to a controlled, limited environment, preventing the student from accessing exam results or other exam items. Established supervised testing procedures should be continued. These require that a testing administrator be physically present to perform the following tasks:

Verify the tester's identification;
Initiate the testing session;
Supervise the student during the session, ensuring that the student use no other resources than those expressly permitted by the instructor and that no information about the exam is removed from the testing site; and,
Stop an exam in the event of a suspected academic integrity violation and secure the student's work to that point.

CATS should provide data collection capabilities to allow a wide range of research opportunities relevant to test taking activities.
The process of creating and modifying CATS exams should be relatively fast and easy in order to accommodate both initial exam creation and the anticipated frequent course revisions associated with self-instructional materials. It should be possible to easily import ASCII text and documents formatted in the RTF (Rich Text Format) standard.
Exam items should accommodate varied formats, including multiple choice, true/false, short answer, and essay.
Exam items should be able to contain graphic and audio components. A review of the range of courses routinely offered through UESP suggest numerous additional benefits that could be derived from incorporating multimedia components into exam items. For example, the inclusion of graphic and audio components could greatly enhance the effectiveness of fine arts and music exams.

Alternative Solutions

The CATS Design Team identified several vendors of specialized testing software and acquired and reviewed their product literature. Whenever possible, they requested demo and trial packages, and reviewed them in detail. A graduate student intern tested the trial packages by applying their procedures and tools to the task of converting samples of the existing UESP test materials.

The Design Team was disappointed with the character-based user interfaces on all of specialized testing packages they reviewed. They became convinced that a testing system that utilized a Graphical User Interface (GUI) would be essential to accomplishing the ease-of-use requirements they had articulated. They also found limitations with the software, and serious difficulties in converting existing exam material into each package’s proprietary editor. Consequently, the Design Team turned its attention to the evaluation of general Computer-Assisted Instruction (CAI) packages.

The evaluation of general-purpose CAI packages followed a path similar to the evaluation of the specialized testing packages. The CAI packages capable of generating GUI applications seemed to be optimized for the creation of multimedia applications. Again, they all utilized a proprietary editor to create test items, and all had limited ability to import existing exam items while keeping the format of the item intact. The graduate student assigned to the project again tried each package to experiment with the procedures and tools for building exams, and found the logic of the object oriented development tools as daunting as any programming language.

The biggest disappointment with the general-purpose CAI packages was their inability to generate exams without a significant development effort, both in programming the test application and in creating the test items. This was a critical consideration when attempting to create tests for the 130 courses in the UESP course inventory.

The CATS Design Team rejected the alternatives of purchasing an off-the-shelf testing system and of creating test applications from general-purpose CAI packages, and began an investigation of the process of developing a general test administration program.

Program Development

The authors conceived of the approach of creating a GUI-based Windows application that could present test items, provide the test-taker with an easy-to-use navigation system, and store and score the exam responses. The test items would be common objects that adhere to the OLE (Object Linking and Embedding) 2.0 de facto standard advanced by Microsoft Corporation. They could, therefore, be created using a variety of commonly available desktop productivity tools.

Development Methodology

Concerned about the long time frames associated with application development efforts, the authors chose to use the prototyping techniques associated with the rapid application development (RAD) methodology. This approach differs from the traditional system development life cycle (SDLC) approach in several significant ways. Most importantly, it is an iterative development approach that incorporates ongoing end user involvement in the program design.

Instead of beginning with an exhaustive requirements phase including detailed functional and program specifications, the program developers quickly generate a series of application prototypes that demonstrate the evolving look-and-feel of the desired application, while identifying the technical issues that will need to be resolved and experimenting with alternative solutions. The end users are intimately involved in all phases of development, as the project iterates through design, prototyping, testing, and assessment. Through each iteration, the end product becomes more refined and better matches the intended outcome.

Iterative Development Process

This rapid application development approach is enabled through the use of visual, object-oriented development languages and database management tools. For the CATS development, the authors selected Visual Basic and Access.

Early Prototypes

A proof-of-concept prototype application was developed to illustrate the technical feasibility of the approach. The prototype was developed in Visual Basic, utilizing an OLE control to display exam items created as separate Microsoft Word documents. The items were organized and sequenced through a series of Microsoft Access databases that contain data about each course, exam, and exam item.

The proof-of-concept prototype seemed to adequately address many of the ease-of-use and performance requirements that had been articulated by the Design Team. The next step was to develop a working prototype that would address the security and production concerns. The working prototype was developed and tested under carefully monitored conditions during the summer and fall terms of 1995.

Working Prototype

Exam creation in CATS was designed to be easy and straightforward. Microsoft Word is used to create items with a standard template that defines page size and margins so the test item precisely fits the dimensions allowed within the testing screen. Word allows any graphic or OLE-compatible object to be inserted into the exam item. Style sheets are used to ensure consistency among items in an exam and between different exams. Each test item is stored as a separate document and is indexed in a Microsoft Access database. The process of creating an exam is therefore one of creating the items as Word documents, and updating the Access databases with information about the items. The items themselves are usually cut from existing tests and pasted into the separate Word documents.

CATS is designed to work within the existing UESP supervised testing system. To initiate a CATS session, the testing administrator must log into a CATS machine, select the course and term from the list provided, and then select the specific exam from the resultant exam list. After the exam is selected, control of the computer can be turned over to the student, who completes a short information form and proceeds to take the exam.

CATS uses an authentication routine to protect the course and exam lists, and to prevent the student from backing out of the "Student Information" screen or from exiting the entire program. The student can quit the exam at anytime without invoking the authentication process.

CATS was designed to provide a secure testing environment for the administration of exams. For this reason, the authors decided to develop CATS for the Windows NT environment. Windows NT is Microsoft’s high-end 32-bit desktop operating system designed to comply with the federal government’s C2 security designation. NT can provide security equivalent to mainframe operating systems such as Unix, VMS, and MVS. CATS utilized the following measures to ensure exam security:

The microcomputer used for testing is configured to boot only when a password is supplied. The ability to boot from the diskette drive is also disabled to ensure that the machine boots from the hard drive. The hard drive is formatted as an NTFS (NT File System) partition. This enables file and directory-level security and prevents access from DOS even if a user somehow succeeds in booting the machine from the floppy drive.
The CATS application is written to take control of the machine immediately after Windows NT is successfully launched and the testing administrator logs into the test account. Once it has executed, CATS takes full control of the computer screen and disallows all access to any system process or application. This prevents students from accessing and modifying exam items or test results. Many of the inherent multitasking capabilities of the Windows environment had to be disabled in order to achieve this control.

CATS Information Flow

The microcomputers on which CATS is implemented must be capable of running the Windows NT operating system, requiring an Intel Pentium-class machine with 32 MB of memory. The machines were also configured with 17-inch monitors that are configured to display true colors at 640 by 480 dpi. The machines are attached to an uninterruptable power source (UPS) to prevent loss of data in the event of a power failure. Each CATS machine is attached to the network for the purposed of exam distribution, but is not dependant on the network connection for test taking.

Immediately upon completion of an exam, CATS grades test items and provides the student with feedback. The grading algorithm is simple-if the student’s answer exactly matches the instructor’s answer (after case is changed and superfluous spaces are eliminated), the answer is graded as correct. This works perfectly for objective test items (multiple choice and true/false), but is limited for short answer questions and inapplicable to essay questions. These must be sent to the faculty member for grading. The score provided the student on screen is calculated on the items gradable by the system only.

CATS includes an option, available at the discretion of the instructor, to provide detailed substantive feedback related to each individual item. This option is available immediately after the completion of the exam, and allows the student to review each exam item and compare his or her answer to the instructor’s. An instructor can provide a statement describing what ingredients could constitute an acceptable answer to an essay question, allowing the student to assess how closely the answers matched. For example, on a short answer question, if the student responded "Samuel Clemens," but the instructor designated the answer as "Mark Twain," CATS would not have graded the answer because it was not an exact match, but the student would feel comfortable that the answer matched.

CATS generates a set of three reports for each testing session:

Answer Report: The Answer Report is analogous to a completed paper-and-pencil exam. It shows the student’s response to each item, and the program’s grade. It is distributed to the course instructor for his or her records.

Feedback Report: The Feedback Report provides an addressed exam summary, formatted to fit in a windowed envelope. It is distributed to the instructor, who reviews the program’s grade, adds any relevant comments, and mails it to the student, via standard mail or e-mail.

Exam Protocol: The Exam Protocol provides a detailed record of the exam activity. It records and time-stamps the student’s answer for every test item visited, in sequence, both during the exam and during feedback review. This data is stored after every student action, so it can be used to recover the exam in the event of a power or system failure. The Exam Protocol also provides UESP researchers with a powerful data collection tool to serve as a resource in analyzing test-taking behavior.

CATS also addresses the administrative components of the exam process. Exams can be distributed to remote testing sites over the network, and exam results can be electronically retrieved from the testing machines and electronically distributed to faculty for final grading.

Based on the success of the working prototype, the authors sought and received financial support from Bell Atlantic Corporation for assistance in developing and deploying a production version of CATS. Bell Atlantic support helped to purchase testing machines, software, and some programming support. To create a completely secure system, the production version was developed to run in the Windows NT environment as a 32-bit application.

Pilot Effort

Twenty-nine students took 53 exams on two introductory psychology courses during this early development stage. Students were encouraged to try the system for at least one of the course exams but were not required to do so. Before beginning an actual course exam, they were given the opportunity to take a practice exam. In addition, students were allowed to try CATS through the first four questions on each exam and, if not comfortable, to quit using CATS and ask for the pencil-and-paper version of the exam with no penalty.

No student took advantage of this "release" option. Students' informal responses to the experience have been positive. Even students with no computing experience have reported feeling comfortable using the system. In most cases, students who have taken one exam on CATS, return to take subsequent exams on the computer. Students moved freely through the testing process, reporting no difficulty in returning to items to review or change responses. A preliminary review of the testing protocols revealed that twelve students choose not to take advantage of the review option for a total of sixteen exams. The review option was used for the remaining exams, with most students electing to review only their incorrect responses. Only one student elected to review all test items through all exams.

Summary and Conclusions

Several significant advantages were gained by developing a computer assisted testing system to specifically meet the UESP’s needs. First, CATS can be modified freely and as often as necessary, to meet the evolving needs of a particular course, the program, or the University. CATS can also accommodate alternative needs. For example, CATS is being deployed in two of the University’s schools to administer Math diagnostic exams. After a student has taken the diagnostic exam, the answer file generated from the CATS session is electronically parsed and posted to the University’s student information system, where existing software produces a diagnosis, updates the student’s academic record, and prints a summary for the student and advisor.

CATS has been successful because of its key features. It provides a user-friendly interface and easy navigation tools to the test taker, requiring little or no training to use. It allows the test maker to create exams quickly and easily, using common desktop tools. It allows the test administrator to electronically distribute exams to the testing site and exam results to the instructor, avoiding the time delays, costs, and risks inherent in mailing. In addition, instructors are provided with a vehicle for delivering immediate substantive feedback to students upon completion of the exam, without compromising subsequent exam security. This capability introduces the potential for refining the self-monitoring and regulatory skills required of independent learners.

Finally, the research opportunities facilitated by the data collection capabilities of CATS provide valuable information on students’ test taking activities. Each test session generates a protocol file that tracks and times students’ activities within the testing session, including the student’s use of the review option. This data provides researchers with the information needed to study the relationship between performance, test behavior and use of feedback, as well as the impact of various levels of feedback on students’ subsequent independent learning and test taking behaviors. As the inventory of exams available on CATS is increased and the availability of computer-assisted testing stations is expanded to off-campus sites, faculty interests are expected to provide additional research questions to explore.

Future Directions

Current plans for CATS include enhancing the program to accommodate special testing conditions, deploying it to UESP's regional testing sites, and automating procedures for managing and distributing exam changes. Future enhancements include the provision for automated data encryption to allow exam results to be safely distributed via e-mail, the addition of special directions and/or instructional screens, and the ability to capture student comments at various points in the testing process. Development will also begin on software to facilitate exam administration, automate and control exam distribution, and automatically recover from catastrophes such as power failures or machine crashes.