Assessment of Crew Workload Measurement Methods Volume I
Assessment of Crew Workload Measurement Methods Volume I
Assessment of Crew Workload Measurement Methods Volume I
Volume I
DOb
1989 SSeptember
90 01 23 102
NOTICE
When Government drawings, specifications, or other data are used for any purpose
other than in connection with a definitely related Government procurement operation,
the United States Government thereby incurs no responsibility nor any obligation
whatsoever; and the fact that the government may have formulated, furnished, or in
any way supplied the said drawings, specifications, or other data, is not to be re-
garded by implication or otherwise as in any manner licensing the holder or any
other person or corporation, or conveying any rights or permission to manufacture
use, or sell any patented invention that may in any way be related thereto.
This report has been reviewed by the Office of Public Affairs (ASD/PA) and is
releasable to the National Technical Information Service (NTIS). At NTIS, it will
be available to the general public, including foreign nations.
This technical report has been reviewed and is approved for publication.
SCr
If your address has changed, if you wish to be removed from our mailing list, or
if the addressee is no longer employed by your organization please notifyjRyIGX-,
W-PAFB, OH 45433 to help us maintain a current mailing list.
Copies of this report should not be returned unless return is required by security
considerations, contractual obligations, or notice on a specific document.
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE
Form Approved
REPORT DOCUMENTATION PAGE OM No. 0704-0188
"Ia.REPORT SECURITY CLASSIFiCATION lb RESTRICTIVE MARKINGS
UNCLASSIFIED NONE
2a. SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/AVAILABILITY OF REPORT
APPROVED FOR PUBLIC RELEASE
2b. DECLASSIFICATION / DOWNGRADING SCHEDULE DISTRIBUTION UNLIMITED
WRDC-TR-89-7006 VOLUME I
6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION
DOUGLAS AIRCRAFT COMPANY (If applicable) WRIGHT RESEARCH AND DEVELOPMENT CENTER
D COCKPIT INTEGRATION DIRECTORATE (WRDC/KT)
fic. ADDRESS (City, State, and ZIPCode) 7b. ADDRESS (City, State, and ZIP Code)
8a. NAME OF FUNDING/SPONSORING 8b. OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
* ORGANIZATION (If applicable)
FEDERAL AVIATION ADMINISTRATIO APM 430 F33615-86-C-3600
8c. ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS
800 Independence Avenue PROGRAM PROJECT TASK WORK UNIT
Washington DC 20591 ELEMENT NO NO. NO ACCESSION NO.
62201F 2403 04 50
17. COSATI CODES 18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number)
FIELD GROUP SUB-GROUP Subjective Measures
Performance Measures
05 09, Physiological Measures
19. ABSTRACT (Continue on reverse if necessary and identify by block number)
This report summarizes the work conducted as part of an FAA/U.S. Air Force sponsored contract
(F33615-C-86-33 0) "The Assessment of Crew Workload Measurement Methods, Techniques, and
Procedures". *ýe primary goal of the contract was to identify assessment techniques which
,emonstrate evidence of validity and reliability and are suitable as measures of flightcrew
workload for aircraft certification.
To use a workload assessment technique with confidence for the certification of an aircraft
flightdeck, the validity and reliability of the technique must be well established. Validity
is the capability of the assessment technique to measure the abstract construct it is
proposed to measure. Reliability is the capability of the measure to produce the same result
with repeated testing.
A comprehensive literature review was conducted to identify workload measures which have an
empirical record -f validity and reliability. All candidate workload assessment techniques•
The process by which this contract was conducted allows us to make factual
statements regarding the validity and reliability of workload measures.
The findings of validity and reliability for the workload measures tested are
repeatable as demonstrated by the replication of results in the second
simulation study. The method employed in this contract allows for an audit
trail of the process by which an assessment technique is determined to
be valid and reliable. A summary of the steps compled for this contract
includes:
/a
TABLE OF CONTENTS
Section TtleP
1.0 INTRODUCTION ............................................................................... 1
1.1 O bjectives .............................................................................. 1
1.2 Background .......................................................................... 2
1.3 Scope ................................................................................. . . 5
1.3.1 Program Constraints ................................................... 5
1.3.2 Design Versus Certification ......................................... 5
1.3.3 Certification Applications ............................................. 6
1.3.4 Consideration of Military Standards ............................. 7
1.3.5 Relative Versus Absolute Measurement ...................... 7
1.4 Currently Used Techniques .................................................... 8
1.5 Overview Volume One ........................................................... 8
2.0 SELECTION CRITERIA FOR WORKLOAD ASSESSMENT
TECHNIQUES ................................................................................... 9
2.1 Validity Criteria ..................................................................... 9
2.2 Reliability Criteria .................................................................... 9
2.3 Applicability Criteria ............................................................. 10
3.0 SUMMARIZE KNOWLEDGE ABOUT WORKLOAD MEASURES ......... 12
3.1 Description of Literature Review ........................................... 12
3.2 Description of Fact Matrix Development ............................... 12
4.0 WORKSHOP ONE (LONG BEACH, CA) ........................................... 15
4.1 Objectives ............................................................................ 15
4.2 Attendees ............................................................................ 15
4.3 Description of Events .......................................................... 15
4.4 Results of Workshop ............................................................. 16
5.0 DEVELOPMENT OF MISSION SCENARIOS ..................... 18
5.1 Selection of Critical Workload Events ...................... 18
5.2 Scenario Description ................................. 18
5.2.1 Weather .................................................................... 19
5.2.2 Flight Environment .................................................... 19
5.2.3 Aircraft Considerations ............................................. 20
5.3 Procedure Used to Categorize Operationally Relevant
Types of Workload with FAR 25.1523 Appendix D ................ 22
6.0 ROLE OF TIME-LINE ANALYSIS ................................................... 30
6.1 Objective .............................................................................. 30
6.2 Description of TLA Task ...................................................... 30
iii
TABLE OF CONTENTS
(Continued)
iv
TABLE OF CONTENTS
(Continued)
Section T PaTt
Acces;o I Fo0
NTIS CR:,t
DTIC TAd
Ur.,
•! . . .. .......
V
: : : :; :.1 _ •V
LIST OF FIGURES
Figur Title
vi
LIST OF FIGURES
(Continued)
vii
LIST OF FIGURES
(Continued)
Fiaure T•itI
9.2.1.1-12 Bedford Scale Scores (All Flights) ................................................................... 125
9.2.1.2-1 Eyeblink Rate (Nominal Flight) ........................................................................ 128
9.2.1.2-2 Eyeblink Rate (Com munication Flight) ............................................................ 128
9.2.1.2-3 Eyeblink Rate (Malfunction Flight) ................................................................... 129
9.2.1.2-4 Eyeblink Rate (All Flights) ............................................................................... 129
9.2.1.2-5 Interbeat Interval Mean Scores (Nom inal Flight) ............................................. 131
9.2.1.2-6 Interbeat Interval Mean Scores (Com m unication Flight) ................................. 131
9.2.1.2-7 Interbeat Interval Mean Scores (Malfunction Flight) ........................................ 132
9.2.1.2-8 Interbeat Interval Mean Scores (All Flights) ..................................................... 132
9.2.1.2-9 Standard Deviation of Interbeat Interval
(Nom inal Flight) .............................................................................................. 135
9.2.1.2-10 Standard Deviation of Interbeat Interval
(Com m unication Flight) ................................................................................ 135
9.2.1.2-11 Standard Deviation of Interbeat Interval
(Malfunction Flight) ........................................................................................ 136
9.2.1.2-12 Standard Deviation of Interbeat Interval
(All Flights) ..................................................................................................... 136
9.2.1.2-13 Spectral Analysis: Blood Pressure Component
(Nom inal Flight) .............................................................................................. 139
9.2.1.2-14 Spectral Analysis: Blood Pressure Component
(Comm unication Flight) ................................................................................. 139
9.2.1.2-15 Spectral Analysis: Blood Pressure Component
(Malfunction Flight) ........................................................................................ 140
9.2.1.2-16 Spectral Analysis: Blood Pressure Component
(All Flights) ..................................................................................................... 140
9.2.1.2-17 Spectral Analysis: Respiration Component
(Nom inal Flight) ............................................................................................. 142
9.2.1.2-18 Spectral Analysis: Respiration Component
(Com m unication Flight) ................................................................................. 142
viii
LIST OF FIGURES
(Continued)
F.ig_
uI Title Paae
9.2.1.2-19 Spectral Analysis: Respiration Component
(Malfunction Flight) ........................................................................................ 143
9.2.1.2-20 Spectral Analysis: Respiration Component
(All F lig h ts) ..................................................................................................... 143
9.2.1.3-1 W heel Control Inputs (Nominal Flight) ............................................................ 146
9.2.1.3-2 Wheel Control Inputs (Communication Flight) ................................................ 146
9.2.1.3-3 W heel Control Inputs (Malfunction Flight) ....................................................... 147
9.2.1.3-4 Wheel Control Inputs (All Flights) ................................................................... 147
* 9.2.1.3-5 Column Control Inputs (Nominal Flight) .......................................................... 150
9.2.1.3-6 Column Control Inputs (Communication Flight) .............................................. 150
9.2.1.3-7 Column Control Inputs (Malfunction Flight) .................................................... 151
9.2.1.3-8 Column Control Inputs (All Flights) ................................................................. 151
9.2.1.3-9 Pedal Control Inputs (Nominal Flight) ............................................................. 154
9.2.1.3-10 Pedal Control Inputs (Communication Flight) ................................................. 154
3.2.1.3-11 Pedal Control Inputs (Malfunction Flight) ........................................................ 155
9.2.1.3-12 Pedal Control Inputs (All Flights) ..................................................................... 155
ix
LIST OF TABLES
x
LIST OF TABLES
(Continued)
Table TaleP
7.2.2.3-4 Test-Retest Reliability Correlations Column Control Inputs ......................... 83
7.2.2.3-5 Mean Scores - Pedal Control Inputs ............................................................ 84
7.2.2.3-6 Test-Retest Reliability Correlations - Pedal Control Inputs ........................... 85
7.2.2.3-7 Mean Scores - Secondary Task Response Time ........................................ 87
7.2.2.3-8 Test-Retest Reliability Correlations
Secondary Task Response Time ................................................................. 88
7.2.2.3-9 Mean Scores - Secondary Task Probe Accuracy ........................................ 89
7.2.2.3-10 Test-Retest Reliability Correlations
Secondary Task Probe Accuracy ............................................................... 90
7.3-1 Part-Task Simulation Correlation Matrix ...................................................... 91
7.3-2 Part-Task Simulation Correlation Matrix (Continued) ................................... 92
7.4-1 Part-Task Simulation Principal Component Analysis .................................... 94
7.5-1 Part-Task Analysis Summary ........................................................................ 95
7.5-2 Rank Order of Phases of Flight for Workload Measures
(Part-Task Simulation Low Workload Flight) ............................................... 97
7.5-3 Rank Order of Phases of Flight for Workload Measures
(Part-Task Simulation High Workload Flight) ............................................... 98
9.1.2.1-1 Scenario Differences between Low & High Workload Flights .......................... 106
9.2.1.1-1 Mean Scores - Inflight SWAT .......................................................................... 117
9.2.1.1-2 Test-Retest Reliability Correlations - Inflight SWAT ......................................... 117
9.2.1.1-3 Mean Scores - Postflight SWAT ...................................................................... 121
9.2.1.1-4 Test-Retest Reliability Correlations - Postflight SWAT ..................................... 121
9.2.1.1-5 Mean Scores - Bedford Scale ......................................................................... 126
9.2.1.1-6 Test-Retest Reliability Correlations - Bedford Scale ........................................ 126
9.2.1.2-1 Mean Scores - Eyeblink Rate .......................................................................... 130
xi
LIST OF TABLES
(Continued)
Table Tt Pae
xii
GLOSSARY
xiii
GLOSSARY
(Continued)
xiv
PREFACE
This report is the result of 2 years of research sponsored by the USAF and the FAA directed
toward the evaluation of crew workload assessment techniques for aircraft certification. This
study was conducted as a joint effort by the two major U.S. manufacturers of commercial
transport airplanes: Douglas Aircraft Company and Boeing Commercial Airplanes. The
primary purpose of this volume is to report the results of the contract effort. The objective of
this contract was to provide assessment criteria to enable the FAA to evaluate workload
measurement plans for crew size substantiation and workload acceptability during aircraft
certification efforts, not to define a single measure or battery of measures that each
manufacturer must use.
The authors wish to express appreciation to the many pilots from American, Delta, Eastern,
TWA, and United Airlines who participated in this project. Extreme gratitude is also expressed
to Preston Suit (Boeing Commercial Airplanes) for his participation as First Officer and
differences trainer for all testing conducted. Thanks go out to Pat Kullenberg (Douglas Aircraft
Company) for his help in creating and debugging the flight scenarios.
The participation of flight engineers, Mike Bortolussi, Hugh Campion, and Doranne VonEnde is
also appreciated. The efforts of the Manned Vehicle Simulation Research Facility (MVSRF) of
NASA-Ames Research Center, and their simulation sub-contractors Singer-Link and Northrup in
the preparation of the simulator and their help in conducting the tests are also appreciated.
Thanks to Todd Williams (DAC) for his development of the Fortran programs used to reduce
the simulator data. Gratitude is expressed to Vern Battiste and Sandra Hart of NASA-Ames
Research Center for their participation in the part-task simulation. The efforts of Janet
Camarata (BCA) in the preparation of the final manuscripts is greatly appreciated. Thanks also
go to our contract monitors Harry Britten-Austin and Mike Pharaoh (RAF/USAF), and Peter
Hwoschinsky (FAA) for their help and guidance throughout this program.
Thanks go to the personnel at the Man-Vehicle Systems Research Facility at NASA-Ames
including: Elliott Smith, Rod Ketchum, David Gates, Barry Sullivan, Bob Shipley, Bob Shiner,
and the cast and crew who feed the VAX all the l's and O's that make the simulation work.
Finally, the authors wish to thank the following student interns from California State University of
Long Beach for their help in the data reduction and analyses: David Nixon, Mariel Sipman,
Diana Kargoo, Toni LaFranchi, Terry Knight, Judy Wong, and Peggy Dolan.
XV
1.0 INTRODUCTION
In 1981 the President's task force on crew complement reported on workload
assessment techniques employed in aircraft certification programs. In general, the
workload measures and test domain employed in recent certification programs were
judged to be "state of the art"; however, a number of improvements have been
recommended for future efforts (McLucas, Drinkwater, and Leaf, 1981). The
recommendations include:
The distinction between perceived mental workload and the perceived workload in
general, has not always been easy to make. Although early investigators such as
Cooper and Harper (1969) have distinguished between physical and mental effort,
completely satisfying methods for separating them have not been developed. Physical
workload is computed in terms of the actual movements (i.e., eye and hand movements)
needed to execute a procedure. Mental workload is assumed to occur when a human
operator performs higher order functions such as perception, information processing, or
decision-making. Excessive mental workload will manifest itself in the system by longer
operator processing time, shedding of tasks, increased errors, performance
decrements, and motivational lapses.
2
TABLE 1.2-1. FEDERAL AVIATION REGULATION REQUIREMENTS
REQUIREMENT
FAR 25 APPENDIX D
Criteria for determining minimum flight crew. The following are considered by the
Agency in determining the minimum flight crew under 25.1523:
a. Basic workload function. The following basic workload functions are considered:
(1) Flight path control
2) Collision avoidance
°13) Navigation
4) Communications
(5) Operation and monitoring of aircraft engines and systems
(6) Command decisions
b. Workload factors. The following workload factors are considered significant when
analyzing and demonstrating workload for minimum flight crew determination:
3
TABLE 1.2-1. FEDERAL AVIATION REGULATION REQUIREMENTS
(Continued)
(1) The accessibility, ease, and simplicity of operation of all necessary flight,
power and equipment controls, including emergency fuel shutoff valves,
electrical controls, electronic controls, pressurization system controls, and
engine controls.
(2) The accessibility and conspicuity of all necessary instruments and failure
warning devices such as fire warning, electrical system malfunction, and
other failure or caution indicators. The extent to which such instruments or
devices direct the proper corrective action is also considered.
(3) The number, urgency, and complexity of operating procedures with
particular consideration given to the specific fuel management schedule
imposed by center of gravity, structural and other considerations of an
airworthiness nature, and to the ability of each engine to operate at all time
from single tank or source which is automatically replenished if fuel is also
stored in other tanks.
(4) The degree and duration of concentrated mental and physical effort
involved in normal operation and in diagnosing and coping with
malfunctions and emergencies.
(5) The extent of required monitoring of the fuel, hydraulic, pressurization,
electrical, electronic, deicing, and other systems while enroute.
(6) The actions requiring a crewmember to be unavailable at his assigned duty
station, including: observation of systems, emergency operation of any
control, and emergencies in any compartment.
(7) The degree of automation provided in the aircraft systems to afford (after
failure or malfunctions) automatic crossover or isolation of difficulties to
minimize the need for flight crew action to guard against loss of hydraulic
or electric power to flight controls or to other essential systems.
4
1.3 SCOPE
Valid, reliable, and applicable techniques for workload assessment are addressed in the
following contract. Issues of overload and underload are addressed only to the degree
that those states can be inferred using valid and reliable workload measurement
techniques, but no attempt is made to create those situations during simulation so that
the candidate measures can be evaluated for their ability to identify those states. Fatigue
is another issue related to workload that is not specifically addressed in the simulation
studies conducted to evaluate the candidate workload measures.
1.3.1 PROGRAM CONSTRAINTS
Owing to the large number of possible workload measures which could be reviewed and
evaluated, certain self-imposed limitations were outlined to insure adequate resources
would be available for a reasonable quality evaluation of the candidate measures. The
following limitations were outlined:
(a) Workload measures were selected from those which already existed. A
candidate measure had to have published evidence of validity and
reliability.
(b) The "best" measures were evaluated from each of three categories:
Subjective, Physiological, and Performance workload assessment
techniques. "Best" was defined as the most valid, reliable, and applicable
to aircraft certification. The number of measures to be evaluated was
limited by the available resources.
(c) Only measures which were suitable for high fidelity simulation or flight test
were evaluated.
(d) Measures were evaluated in a civil transport environment (as opposed to
military environments). The types of task demands addressed in scenario
selection were identified by, but not limited to, the functions and factors in
FAR 25.1523 Appendix D. (The results obtained from scenarios which are
common with military task demands will be valid for military applications).
(e) The issues of underload and fatigue were not examined.
(f) A list of "acceptable" workload measures was developed, where
"acceptable" was defined as a measure which had evidence of both validity
and reliability with representative civil transport workload. An acceptable
measure was presumed to be valid and reliable for only specific types of
workload. The designation of acceptable did not imply it could be
employed in every condition.
Owing to limitations of funding, only the pilot was used as a subject in the current
investigations. Even though the simulation test bed was a three crew aircraft (Captain,
First Officer, and Flight Engineer) the workload assessment techniques are being
considered for use in certification of newer, two-person, aircraft. Newer generations of
aircraft utilizing two crew members usually divide flightdeck responsibilities into pilot
flying (PF) and pilot not flying (PNF). PF responsibilities are primarily aircraft control and
navigation. The PNF is responsible for communication, system monitoring, and any
tasks delegated by the PF. The current investigation manipulated workload so that both
PF and PNF task demands were represented by the pilot.
5
actual workload levels experienced during line service. Besides its role as a check on
the final design to assure that workload is acceptable, workload assessment can also be
employed in the design of an aircraft or of one of its subsystems. Caution should be
exercised when applying workload assessment techniques during design due to the lack
of system integration in the to-be-evaluated flight deck. The combination of systems can
produce increases or decreases in workload that were absent during individual system
evaluation. Analytic techniques do enable the designer to make relatively conservative
estimates of what the crews task-demands will be so that it will be virtually certain that
the actual workload experienced by the crew will be acceptable.
There are workload assessment techniques that might be appropriate for use in the
design phase that were not considered in this effort because they are not yet applicable
in the certification flight environment. It is the manufacturers' responsibility to select and
propose appropriate workload assessment techniques for certification.
The nature of the equipment change should determine the workload assessment
method selected for certification. Each specific type design may generate different types
of workload, and therefore, different workload assessment techniques may be required.
Not all measures are equally sensitive to the different types of workload (e.g., physical
vs. mental).
1.3.3 CERTIFICATION APPLICATIONS
The most relevant consideration when evaluating a workload assessment plan is that the
proposed workload techniques are adequate for evaluating the anticipated workload for
the new flight deck.
In the early days of commercial jet aircraft (i.e., B-707 and DC-8), workload was primarily
of a physical nature. When the commercial flight deck of smaller aircraft (less than
200,000 lbs.) was reduced to two crew members (i.e., DC-9 and B-737), the systems
contained a higher degree of integration. The evaluation was centered on the physical
nature of workload. The workload evaluation focused on the question of the two crew
members (one crew member during the case of incapacitation) accomplishing all the
needed tasks for flying the aircraft. Next generations of aircraft (the glass cockpits of the
B-757/767 and MD-88) retained the two crew member flightdeck, but the move to
sophisticated Flight Management Systems introduced new levels of mental workload for
evaluation during the workload certification effort. Finally, the latest certification efforts
involve traditionally three crew aircraft (MD-1 1 and B-747/400) being stretched for
increased passenger and range capacity, while increasing automation levels and
eliminating the flight engineer.
In the past, commercial aircraft manufacturers have used analytic techniques and non-
structured pilot opinion for workload assessment. Analytic techniques are of particular
value to the aircraft manufacturer since they offer both the potential for identifying and
correcting workload problems early in the design phase when the cost of change is
relatively low, and a tool which can provide data for certification. One disadvantage to
the available analytic techniques is their lack of fidelity in assessing mental effort. With
the current shift of flight deck design placing more mental demands on the flight crew,
workload assessment has taken on a new challenge. The addition of structured
subjective measures to traditional objective analyses can provide information which
validates the analytic and simulation based estimates of physical workload and
enhances estimates of mental workload.
In addition to the consideration that appropriate assessment techniques be applied,
consideration must also be given to valid test methods. A partial listing of common
methodological errors is given below:
6
In addition to the consideration that appropriate assessment techniques be applied,
consideration must also be given to valid test methods. A partial listing of common
methodological errors is given below:
(a) Demand characteristics are not controlled (e.g., hints are inadvertently
given to the subject on how to rate the workload as high or low),
(b) No differences should exist in the test scenario between the baseline and
to-be-certificated aircraft, otherwise observed workload differences could
be attributed to changes in the test scenario instead of the aircraft,
(c) Order effects of testing (e.g., learning or fatigue effects) are not controlled.
1.3.4 CONSIDERATION OF MILITARY STANDARDS
A review was made of the military standard entitled: Human Engineering Requirements
for Military Systems, Equipment, and Facilities (MIL-H-46855A) and the draft military
standard entitled: Human Engineering Requirements for Measurement of Operator
Workload. The approach taken in this contract of a literature search, workshops, and
simulation testing provides a methodology appropriate to addressing the issues raised in
the Military Standards documents.
1.3.5 RELATIVE VERSUS ABSOLUTE MEASUREMENT
Workload assessment for certification relies on a relative comparison of workload levels.
Typically workload is compared between the to-be-certificated aircraft and a baseline
aircraft, which has an established record of safe performance and acceptable workload.
It is assumed that the two aircraft are being compared under conditions which are as
similar as possible to insure that any workload differences which occur are due to
differences in the aircraft design and not to other factors. Ifthe new model aircraft has
the same, or lower workload, then it is concluded that the workload is acceptable in the
new model. When performing a relative comparison with a new aircraft design, however,
there may be instances when workload levels exceed the old design. In cases such as
this the increased workload is not necessarily unacceptable, but it may become the
subject of a more in-depth workload analysis. These cases need to be considered on a
case-by-case basis with all of the operational factors taken into consideration when
evaluating the impact of workload increases.
When measuring pilot workload, or any other behavioral measure, it is essential to
consider the variable nature of the data. Behavioral data is best described in terms of
distributions, since individuals bring different skills to the task of flying it is possible to get
a distribution of workload scores from a group of pilots. The "state of the art" of
workload science does not allow for determination of a single score for the purpose of
workload assessment. Pilot to pilot variability in assessing workload is a consideration
which must be kept in mind throughout an aircraft certification effort. No absolute
measure of workload ("Redline" associated with an overload condition) is currently
available for aircraft certification. Considering the scope and magnitude of individual
differences it would be extremely risky to use a single workload measurement technique
to determine ifflight crew workload is acceptable.
A number of factors influence the ability to generalize or draw conclusions about
workload levels made in a comparative evaluation. It would not be appropriate to include
a detailed discussion of these factors here, but a partial listing of relevant factors
includes:
!a} Representativeness of subject selection,
Number of subjects tested,
Fidelity of task demands or scenarios,
7
1.4 CURRENTLY USED TECHNIQUES
Today's list of acceptable workload measures is likely to be out of date 10-15 years from
today. Any list which is fixed and cannot be modified to accommodate the
improvements developed within the workload measurement science could become more
of an obstacle than an aid in certifying the design of a new aircraft. For this reason,
emphasis should be placed on whether the most useful measure was selected for a
particular application, rather than selection of a measure merely because it was familiar
or associated with a prior list.
Finally, the methods and results of two simulation tests conducted at the NASA-Ames
Research Center are presented.
2.0 SELECTION CRITERIA FOR WORKLOAD ASSESSMENT TECHNIQUES
In order to evaluate the utility of a workload measure specific criteria regarding the
validity, reliability, and applicability need to be established. The following sections
identify specific issues that should be addressed regarding validity, reliability, and
applicability.
2.1 VALIDITY CRITERIA
Validity is defined as the capacity of an assessment technique to quantitatively evaluate
levels of workload. There are many types of validity, each affecting the ultimate
usefulness and acceptability of a workload measure. Not to consider validity when
selecting a workload measure is to employ a measure that carries greater risk of giving
spurious results, and it may be better to design an aircraft with less information than with
incorrect information. At a minimum it is proposed that the following types of validity be
addressed during measure selection, testing, and evaluation (Anastasi, 1968):
(a) CONTENT VALIDITY - It is important to determine that the operationally
relevant types of workload are being considered. By focusing on the
important types of workload found in cockpit operations, more confidence
can be obtained that the correct workload assessment techniques will be
selected and employed.
(b) PREDICTIVE VALIDITY - (Also known as Criterion Related Validity) Can
the measure be used in a predictive fashion to determine levels of
workload? This is the most important type of validity for the manufacturer
because it provides a basis for making cost/benefit decisions regarding
system design.
(c) CONSTRUCT VALIDITY - The construct validity of a workload assessment
technique is the extent to which the technique may be said to measure the
theoretical construct of workload. Since workload cannot be directly
observed, it exists only as an theoretical construct, it must be
demonstrated that the measure reflects changes in what would be
predicted for the construct of workload. To have confidence in a workload
measure, this connection must be demonstrated whether workload is
defined in terms of task demands or operator variables. Construct validity
is not accomplished in a single experiment or settled "once and for all," it
required the gradual accumulation of information from a variety of sources.
(d) FACE VALIDITY - Face validity refers to what the assessment technique
appears superficiality to measure and not necessarily what it actually
measures. Face validity can become important in how well people use an
assessment technique. If pilots or engineers are asked to use or
administer a workload measurement system that makes little sense to
them, their motivation to follow all the rules is likely to suffer.
2.2 RELIABILITY CRITERIA
Reliability is defined as the capacity of a workload measure to yield similar results with
repeated usage. How consistently does the measure yield the same answer given the
same measurement conditions? The following types of reliability should be addressed
when considering a workload for test and evaluation (Anastasi, 1968):
9
(a) TEST-RETEST RELIABILITY - The most obvious method for determining
the reliability of an assessment method is to repeat the test conditions on a
second occasion, and evaluate discrepancies in the two samples. The
reliability can be quantified by computing the correlation between the two
sets of scores obtained by the same persons on the two administrations of
the workload measure. The resulting reliability coefficient can then be
compared to established standards for any test and thereby be viewed
with some objectivity.
10
(f) Equipment constraints include such factors as limited hardware space,
limited panel space, a large distance between pilot and data collection
hardware, potential signal interference, and the inability to change the flight
deck configuration.
11
3.0 SUMMARIZE KNOWLEDGE ABOUT WORKLOAD MEASURES
In order to determine the optimal subset of workload measures to be tested in simulation
a literature review was conducted. The intent of the workload literature review was to
identify, collect, organize, and publish a cross referenced index of published articles
which addressed the issue of workload measurement.
3.1 DESCRIPTION OF LITERATURE REVIEW
LITERATURE SEARCH
Using the keyword "workload" an electronic search was conducted of library data bases
from McDonnell Douglas, the Boeing Company, NASA, the U.S. Air Force, and
universities in the metropolitan Los Angeles area. Abstract searches generated a list of
document titles which were then collected for evaluation. Only material published after
1978 was evaluated in the present effort.
Criteria were generated to facilitate discrimination of likely documents for detailed review.
The articles were sorted according to the following criteria:
(a) Empirical data from flight test or simulation,
(b) Empirical data from laboratory experimentation,
(c) Review article,
(d) Theoretical article.
12
SUBJECTIVE
Sa) Subjective Workload Assessment Technique (SWAT)
b) NASA Task Load Index (TLX)(
(c) Workload Compensation Interference/Technical
Effectiveness scale (WCI/TE)
(d) Modified Cooper-Harper
(e) Interviews
(f) Surveys
(g) "Other" subjective measures
PHYSIOLOGICAL
(a) Body Fluid
(b) Brain Activity
(c) Heart
d) Lung
Muscle
(f) Skin
(g) Vision
ýh) Voice
(i) "Other" physiological
PERFORMANCE
(a) Performance Primary Task: Time
(b) Performance Primary Task: Position
(c) Performance Primary Task: Event
(d) Performance Normal Secondary Task: Time
(e) Performance Normal Secondary Task: Position
(f) Performance Normal Secondary Task: Event
(g) Performance Artificial Secondary Task: Time
(h) Performance Artificial Secondary Task: Position
(i) Performance Artificial Secondary Task: Event
The Fact Matrix is presented as Volume 2 of Workshop One (Biferno and Boucek, 1987).
The fact matrix is organized into three sections. First, is a list of the articles contained in
the fact matrix organized alphabetically by first authors last name. Second, is a list of all
the titles, in the order they were collected, contained in the fact matrix with the
corresponding reference number. Finally, the cross reference of measure, by Appendix
D function or factor, by reliability and/or validity are presented in the fact matrices. The
numbers within the matrix cells (Table 3.2-1) are the database reference numbers which
are found in: "Proceedings Of The Assessment Of Crew Workload Measurement
Methods, Techniques And Procedures: Volume II - Library References."
13
Table 3.2-1
C0
0-
CZ
G CUN
E. cam)
00)
UL,0 - -
- U,-zuIL Or
W- 0
;3
cz
0 v 0~4 VL
4.0 WORKSHOP ONE (LONG BEACH, CA)
4.1 OBJECTIVES
Workshop number one was conducted in order to obtain agreement among workload
experts regarding a candidate list of measures for simulation testing at the NASA-Ames
Research Center. The criteria for determining the candidate workload measures was
empirical evidence of validity and reliability.
4.2 ATTENDEES
Fifty workload experts attended the two-day workshop held in Long Beach, California.
They came from a wide cross section of scientists who have expertise with workload
measurement. The area of workload measurement expertise was divided into the
familiar domains of subjective ratings, physiological recordings, and performance
measures.
4.3 DESCRIPTION OF EVENTS
On the first day presentations were made by experts regarding "state-of-the-art"
workload measures broken down by domain: Subjective Rating, Physiological
Measures, and Performance Measures. Three experts from each domain made
presentations in order to ensure unbiased representation of the domain.
On the second day the group was divided according to areas of expertise for further
panel discussions. The objective of the panel discussions was to provide an
independent review of the facts concerning the validity and reliability of workload
measures. The attendees were asked to comment on the results of the literature review
and participate by means of discussion. A free exchange of ideas was encouraged and
documented. The questions put before the workshop for all participants to answer were:
(a) Given the workload types being considered for measurement in a transport
environment, which measures are the most valid, reliable, and applicable?
(b) On what evidence do you base your opinion?
Each workshop attendee received a set of the matrices, one matrix for each of the
applicable workload types mentioned in FAR 25.1523 Appendix D. The matrices were
reviewed and modified by the experts. Each workshop participant reviewed and
discussed:
15
4.4 RESULTS OF WORKSHOP
The final list of candidate measures was influenced by the results of the workshop and
constitute the workload measures which were considered the most valid, reliable, and
applicable for certification. Workload measures suggested from the panel discussions,
measures identified by Douglas and Boeing as candidates prior to Workshop One, and
measures actually used in the Part-Task simulation are presented below (Table 4.4-1).
Timeline Analysis was used as an analytic tool in order to make a priori predictions
regarding the task demands imposed on the crew. For a measure to be considered
valid it must be able to discriminate between the levels of low and high workload
operationalized by the task demands.
16
Table 4.4-1
z
.-
0~0
0
U)
z (D -
oU) X0C) U,0
W < - c
'ID _o (U
C, o ~ - a;5
o -j0t U0 o
q: tED > 8i 'o L-
U) L -
oo
% ---
.0.5a
:
:(
0- je U)-a U-a
ID 0 0 U
:. Lu a
U,
z
z0
wu
0 0
_(
cc -I
0 tvo m 0)~u w Z
ico -
m~ 0:- c~
0Q~~~~!i*W
~ 8Q
.
yL
a
i
C r.Z
cU)4
ou a..r3- DU
cn >U-Ul 5zc
U) z
cc6
0. w
co I-
I.- -
o 0)
0,C
4(0.~- 3: 4 ~
0 L a I--
~CC
c U
a*.0
W .a;V-r
L4( >.
0I -q.
17f
5.0 DEVELOPMENT OF MISSION SCENARIOS
The mission scenarios used for evaluating a new aircraft flightdeck for workload requires
careful consideration in order to manipulate task demands relevant to FAR 25.1523
Appendix D.
During the aircraft certification process the manufacturer provides a description of the
flight scenarios for the simulation testing, flight test, or the "mini-airline" operation, for
workload evaluation to the FAA. The level of detail for scenario description must be
sufficient enough to allow for the discussion of the flightcrew actions that are to be
evaluated in flight.
It is during high fidelity simulation, or actual operation, that workload measures provide
information about the workload imposed on the crew in order to confirm that they can
reliably cope when in airline service.
Seven mission scenarios were developed for the simulation testing portion of this
contract effort. Four simulation scenarios were used in the Part-Task simulation testing:
two short (30 minute) segments, a high and a low workload condition of each. Three
simulation scenarios were used in the Full-Mission testing: two short (30 minute)
segments and one long (1:30 minute) segments, and are reported in the Methods
section of the Full-Mission simulation. Detailed descriptions of these scenarios can be
found in the appropriate simulation design sections; 7.1.2.1 and 9.1.2.1, respectively.
5.1 SELECTION OF CRITICAL WORKLOAD EVENTS
A sampling of normal and non-normal procedures was implemented in the simulation
scenarios to manipulate the task demands of the flight crew in order to vary workload.
The face validity (realism) of the task demand manipulations was verified by flight
operations personnel (from both Douglas and Boeing). Operating considerations such
as weather, routes, weight, and balance were considered representative of actual B-727
operations. Malfunction conditions were selected that would exercise the functions and
factors of FAR 25.1523 Appendix D, and would not be considered improbable events.
Preplanned dispatch-inoperative items (i.e., Autopilot Inop) that could result in added
workload were incorporated in the simulation program.
To determine that the high workload events to be used in the scenarios demonstrated
face validity there was close coordination with flight operations personnel (from Douglas
and Boeing) to ensure the following: malfunctions occurred in the scenarios at a logical
time and phase of flight and required operationally correct responses from the crew;
multiple failures occurred in a logical order and were representative of malfunctions
experienced in airline operations. High workload events were also coordinated in the Air
Traffic Control (ATC) scripts with ATC personnel.
The method of "triggering" the malfunction was planned so that the malfunction occurred
consistently across evaluation flights.
5.2 SCENARIO DESCRIPTION
The following is a list of items essential for thorough scenario description. All the
following items should be considered in the development of scenarios used in a
certification effort.
18
5.2.1 WEATHER
A consideration for simulation testing is the weather conditions to be experienced during
the evaluation. The weather conditions for the scenario should include both Instrument
Meteorological Conditions (IMC) and Visual Meteorological Conditions (VMC). In
addition, altimeter setting, winds, temperature, ceiling, visibility, and dewpoint need to be
specified. The weather can then be developed for each of the airfields used in the
scenario. Winds aloft, surface observations (ATIS), terminal forecasts should all be
developed for the simulation test.
5.2.2 FLIGHT ENVIRONMENT
19
The ATC scripts used in the Full-Mission simulation testing performed at NASA-Ames
(Appendix C) provide an example of the detail that was required for ATC simulation in the
simulation testing environment.
The capability exists for full ATC support at the NASA Ames 727 simulator facility;
therefore, ATC scripts had to be developed. ATC communications required by the
scenario were integrated into the detailed scenario that was built.
During the simulation tests ATC communication, similar to revenue service, included
many calls to 'other' aircraft in the ATC system in order to replicate "real world"
conditions.
In addition, the visual system was programmed to support a percentage of the 'other'
aircraft as visual targets in order to enhance simulation fidelity. The remainder of the
ATC calls to 'other' aircraft did not include visual targets. To ensure that the ATC calls
occurred at the same point in each flight during the simulation tests was a difficult task.
The ATC personnel were provided operational cues (i.e., measurement window
openings and closings) to aid in their attempt to make calls in a consistent fashion. The
controllers used lights associated with data window 'triggers' to aid in the timing of their
interaction.
5.2.3 AIRCRAFT CONSIDERATIONS
Malfunctions should be planned and coordinated so that they occur logically in the
scenario and require operationally correct responses from the crew. The method of
triggering' a malfunction should be planned so the malfunction occurs consistently
across the subject population. Pilot deviation from course or altitude must be
considered when specification of the malfunction is considered.
Dispatching the aircraft with allowable systems inoperative is another technique for
manipulating workload. Autopilots or primary pressurization systems inoperative are
candidate manipulations to vary task demands.
DATA COLLECTION WINDOWS
When collecting data for specific phases of flight during the workload evaluation
scenario, defining the data collection periods (i.e., data windows) is very important. The
20
method of triggering' the opening and closing of the windows must be chosen to cause
the same events to occur in the same measurement window across the subject
population. The windows should be triggered whenever possible by system defined
events such as "flaps up" or the aircraft crossing a certain prescribed altitude during a
profile descent. If operational events cannot be determined so as to preclude individual
variation from flying performance, windows can be triggered by time events; for instance,
a window can be closed 2 minutes after it opens. If a malfunction is planned, the window
should be triggered to open prior to the malfunction.
DISTRACTION SCENARIOS
In addition to malfunctions, distractions were included in scenarios. When the same pilot
flew the same scenario in test and retest, these distractors helped disguise the fact that
the same scenario was being repeated. Examples of distractions used are shown
below:
21
*CAPT ACKNOWLEDGES AND NOTES F/O TURNING DE-FOGGER
ON.
These distractions did not occur during data collection windows.
5.3 PROCEDURE USED TO CATEGORIZE OPERATIONALLY RELEVANT
WORKLOAD WITH FAR 25.1523 Appendix D
To ensure the functions and factors of FAR 25.1523 Appendix D, are represented by the
task demands used in testing a scheme is developed for mapping the functions and
factors onto the tasks required by the mission scenarios. The goal is to evaluate the
specific workload types (FAR 25.1523 Appendix D, functions and factors) with the
appropriate crew complement during realistic operating conditions, including
representative weather, air traffic, and airline operational duties.
The scenarios built for the simulation tests were based on the incorporation of
operationally relevant types of task-demands being placed on the flightcrew. Flight
scenarios were specified to include task-demands which pose a concern for transport
flightcrew workload. An analytical assessment of these task-demands was
accomplished using the Boeing Commercial Airplanes Time-Line Analysis (TLA).
Workload is thought to be a multidimensional construct combining the demands
imposed on the pilot as he attempts to achieve the flight objectives, and the momentary
capacity of the pilot to meet these demands. It is important that the workload evaluation
scenario simulate a multi-faceted environment to provide a representative range of task
demands for the subject pilots. While the tasks performed by the subjects in evaluation
studies should be representative of those performed in actual flight operations, care
should be taken to map the workload experienced in performing the tasks to the
functions and factors of FAR 25.1523 Appendix D..
The task of operationally defining the functions and factors of Appendix D should
include: Specification of the mission segments to be included in the evaluation, and
development and implementation of a categorization scheme to map the specific tasks
of the flightcrew to the functions and factors found in FAR 25.1523 Appendix D. A
description of the process of mapping functions and factors to specific tasks follows.
Scenarios were constructed which described the flightcrew's actions down to the task
and subtask level. A representative task is "tune VHF radio 1 to 123.9." Subtask
description goes a step further such as, "move right hand to VHF radio 1 channel
selector knob", "turn knob counter-clockwise to 123.9," "confirm frequency 123.9 is
visible in display window," "return right hand to rest."
Each workload function and workload factor of FAR 25.1523 Appendix D, was assigned
an operational definition. The operational definitions that were generated and used in
this study are shown in Figures 5.3-1 through 5.3-4. Using these definitions assured
consistency when mapping the functions/factors onto the tasks of the scenarios. The
creation of the operational definitions included discussions with flight operations
personnel from Douglas and Boeing. These rules could be defined differently with little
consequence as long as they were used consistently in the mapping process. In fact,
when two experimenters (one Dr'uglas and one Boeing) independently assigned
function/factor mappings to the steps of the scenarios using the pre-established
definitions, their mappings were nearly identical. For the few differences that did exist,
flight operations personnel were consulted and the differences reconciled.
22
Figure 5.3-1
LLE
00
CCu
0u
CL
0L
o ow
00
C2 6100000 0
EE E
ELLE
0 0I
U) Cc c CO
C )C ... )UCO).u
LEE 0
E clJ
LEE 23
CLtmm
Figure 5.3-2
4))
0 0C
mm
w E
0 E *0
) SIMO
CD00
0
GIIM MW 0-O
m 0
0U*I0
Ma E
000
U m ~~0) ()Mo0 M
r~ 0 0~
Lwmm E m
*-0 c
'0 0 M (a
(
MINES
00 0)~
00O
W'-
I.L 000
CU)
In *o0w 0
.*.
UNIN c 0 70
(J
4) I 04m
0~ C LULL. 0
0 U
24
Figure 5.3-3
&Mm C .
0 =
#loft% C1 .MC 0.
a)0 fMw00m
.o 0 CC
C.) EE C1
Moo0
' Lo o.
00- 0o
( ~ 0 som
IC 00 L
0 C - C 0
0. 0
0- 0
cc 0 INE
0C 0u E SmsO
%6.. CcC
0*
0. CCu
MAu somm
Nq &
0 E > 0&- 0-
ME -W r.
25
Figure 5.3-4
.00
Cow
d 0
Cu 0L
0
U6U
0) 0c
01 0 c
cc E)
U. LLd
cc0 co0 0
0u Iw 0 00
m)
_ .wi mu
1..h 0 00 0
o lmo 8o 4)fý()3
0)
. o*
ot 2u ) 0O 0
*Ug
0 >,m L
Eu LL c
E cc
c 0 0
04 C20
000 Z0. 0 % 0= 0
z 0 0
caE E
26
An example of the function and factor mapping is shown in Figure 5.3-5. As shown, the
functions and factors are not mutually exclusive. In fact, many steps of the scenarios
represent three or more functions and three or more factors. By the pre-defined rules,
the action, "calls, 'gear up"', for example, represents functions 4, 5, and 6
(communications, operations and monitoring of aircraft engines and systems, and
command decisions, respectively) and factors 3 and SA (procedures and
communication, respectively). The level of task specification done for the scenarios can
also be seen in the task listing included in Appendix B. In a 20 second period of time, six
independent tasks are performed, each representing a variety of functions and factors.
The mapping procedure assured that a priori predictions of workload differences could
be made prior to collecting data using the workload assessment techniques in the
simulation at NASA-Ames. Measurement periods within the flight could be identified
where workload levels were high or low. Function and factor tally sheets were prepared
for each of the scenarios indicating frequency counts for the number of occurrences of
each function and factor. An example tally sheet is shown in Figure 5.3-6.
To operationalize the differences between mental and physical workload, each function
and factor was assigned to either mental or physical. No task is purely physical or purely
mental; however, the assumption was made that tasks do lie along a continuum from
pnysical to mental workload, and therefore, the task could be assigned to the end of the
continuum that more closely described it. This procedure allowed physical and mental
task loadings to be made for each phase of flight, for every scenario. Mental workload
increases with mediational tasks such as perception, cognitive processing, and decision
making. Physical workload increases with gross motor movement. In this manner, it
could be judged which measurement periods contained relatively high levels of mental
workload and which involved predominantly physical workload.
27
Figure 5.3-5
00
0.m L
m 0
o0 ~
E_
CL >0
W~~& 00 .L
0 .~ 000 0.CL~ ~CL
(I+ .
0u(l~.~
00
0O E
o 0
w CD C 0
0 0 0
ma X 0
LU Ccx cn 1 cc Co
CL
280
Figure 5.3-6
x0
o+~EcH
00. _
.2~~~ co~. o
>- -0 x cz
7o Z (DC;) 0" 0 E0)
AH+1
0
o- -60 ()
-a - c -
cococ c
cz~
LLE
CZC
I C: 03 E0
LM0)
Cr4L
cc 29 C
6.0 ROLE OF TIME-LINE ANALYSIS
The analytical measure used to predict, a priori, levels of workload was Timeline Analysis
(TLA). TLA computes the ratio of time required, that is, execution time, to time available
throughout a mission scenario (Miller, 1976). A serious criticism of TLA is the serial
approach it takes in calculating task execution when it is known that pilots can conduct
multiple actions in parallel. Flaws in the method leads to an over-estimation of workload,
an error on the side of safety which is appropriate when evaluating a new aircraft.
Boeing Commercial Airplane's TLA technique was used in this study because of the
database available for the B-727 aircraft which was the testbed for the simulation studies.
6.1 OBJECTIVE
The timeline analysis was used to identify high and low task demand levels. A detailed
task timeline analysis was performed on the flight scenarios to be used in the simulation
testing at NASA-Ames.
Time based mission scenarios were built describing the crew activity associated with
flying the routes used during testing at NASA Ames. Creation of the scenario is based
upon data derived from mission flight plans, maps, approach and landing charts,
interaction with ATC, aircraft performance data and aircraft operations manuals. Using
this information base, detailed procedures were developed which defined the actions a
crew member must accomplish to successfully complete a mission (Figures 6.2-4).
These procedures establish the basic work time units from which the TLA workload
statistics were derived. Normal as well as high workload procedures were developed for
the routes.
TLA represents workload requirements for various body channels including: Visual,
manual (left, right, and both), verbal, auditory and cognitive. Execution time estimates
are calculated in terms of hand and eye motions used to execute a procedure and the
time in transit required to accomplish these motions (Figure 6.2-5). Manual motion is
calculated from one control to the next along a curve to simulate lifting the hand from
one control, moving it in an arc to the next control and lowering the hand. Dwell times
required to use a specific control or display are selected from stored data tables.
Summing the transit and dwell times for each action produces an estimate of the total
time required by an operator to successfully complete the procedure.
In the geometric data base each device on the flight deck is characterized in terms of it's
location, dwell time, and complexity score (Figure 6.2-6). This device complexity score is
based upon the information content of the possible states the device presents (Figure
30
Figure 6.2-1
40)
(10
0 0
fo
V1
i. 0l fl
Oki
0)11 -&~iII''
ON@~
31
Figure 6.2-2
C,,d
4)4
ImI
E -•
•o € --
0
c-
03
32
Figure 6.2-3
C*
M) CO)
oL 0 C0 C4
0E 0 -0
ci: 5
00 %.C *-C - 0 (A
(U cc 0 3 0 Uc
rA 4' c~ 2 0(
a_ 0 f
C C .- c
(0 w CG) C 0) 0X o-00
(A 0 C ~ C I--
1-0-
%Iw
u.u
0U
xV
'I"'
Cc cc33
Figure 6.-
'a
U o 0
00
0 0
IDI
o
r 0W
4c) 0 -Er cL
0
(a 0 C6 -0 13
0- 0
4 ~ > )r 0) 0 0
0 -*
-Z W C)4.
0 4) :1
CL !~&~'
Cl) 000 90 .2 Nv4c
0010.I, CZ
ico
aCM
0u uI~N -L%
o -c
WOME4
This page intentionally left blank
35
Figure 6.2-5
CMC
0)
0)
U')0) P,)
c 0
0 0a)C
II 0 00
I- I-1-1
()LO LU
0) C'
COu
(D co
) (D Lo -, C oý,
Eq q 50 0
(D 0--
CU) N1O
a)
0
QD 0
U)II _ ___
__ >V
Eo Lo 0 CD
CD C
36>
Figure 6.2-6
4-'o 0 ~ 0 C
I-0
0E CLii-
4-'z 5,
E w
_E x w 0
_ >z
CDa (1 () 0 1
N<
c3 ) gut, D0 ) /
(I)L > 00 U)
(zC/ U) < 0
L- 0
0Oci)
____ c
Lij~~ C) <wU
U- - :
CI 0 a:cc 05
C~ x
CLo.
C) Cl
0o
w 2 z
37-
6.2-7). It serves as the basis for estimating cognitive workload. The procedure
complexity is the sum of the device complexity scores for all steps of the procedure.
Cognitive workload is computed by an algorithm that relates the complexity score to the
number of choices. The algorithm calculates how long it takes a person to act
depending on the number of choices available. Verbal and auditory response times
were derived from data contained in the Index of ElectronicEquipment Operability.
Workload estimates are made for each task, procedure, phase of flight, and for the entire
scenario. Analyses were completed only for the captain's tasks.
38
Figure 6.2-7
a)D C:j
0 0 I"
D 1 0
co c;c O
0 0
(a) C) • >' a J >•
x0 -c
Lo.11
CD E E E >E
o - I- |o
0U> 0 0
0
(D Eca C
0
mL--I~o IN PO
> () U)
. c. f-o IIE
a) >~
_ >
0 _) a, )
.E .x
a ) U)
o 0 .5
0 ~0 =N3
39
7.0 PART-TASK SIMULATION TESTING
The Part-Task simulation was performed at the NASA-Ames Research Center, Moffett
Field, California. To ensure the generalizability of the test results to the real world, the
Part-Task testing environment was as similar to the operational environment as possible
(e.g., the flight deck of the production aircraft during certification flights).
7.1 METHOD
Only Captains were used for the following evaluation. Newer aircraft, with two
crew flightdecks, do not have the wide variety of tasks found in older flightdecks. The
newer generation, two crew aircraft, normally divide the responsibilities so that one crew
member is considered the Pilot Flying (PF) and the other is the Pilot Not Flying (PNF).
The PF controls the aircraft, either manually or using the autopilot, has primary
responsibility of the navigation of the aircraft, and monitors the aircraft systems. The
PNF handles communications, is the primary system monitor, and serves as a backup
for the navigation process. The Captain and First Officer exchange these roles readily.
For this reason it was deemed sufficient to instrument and record workload from only the
pilot in the left seat in the following study. No attempt was made to address crew
performance when PF and PNF duties are exchanged, for a thorough discussion see
Orlady (1982).
7.1.1 SUBJECTS
PILOTS
Eighteen Airline Transport Pilots (ATP), (from American, United, Delta, TWA, and
Eastern) served as subjects in the experiment. Subjects were all male ranging in age
from 44 to 58. Subjects were either currently F.A.R. Part 121 qualified as Captain for the
B-727 or had spent 5 years of duty as Captain for the B-727.
40
7.1.2.1 INDEPENDENT VARIABLE
TEST/RETEST
The method employed in the present study utilizes Test/Retest as a means of
determining the reliability of the various workload measures. This meant that the pilots
involved in the study participated on two separate occasions. The period between the
two simulation test periods was at least 10 days, and was as high as 42 days In one
case.
LOW AND HIGH WORKLOAD LEVELS
There were two different levels of workload: low and high. The low workload flight is a
"nominal' flight, no equipment is MEL, the weather is clear with light winds. The high
workload flight contains Instrument Meteorological Conditions (IMC) and the winds aloft
are stronger. Also in the high workload flight malfunctions are encountered. At the top
of the climb segment the Number three engine fails, and 3 minutes later the "B" system
hydraulic system loses quantity and pressure resulting in a total failure. The autopilot is
INOP in the high workload flight as well.
A table is provided that contains a summary of the workload manipulations in order to
aid the reader in understanding the different task demands for the low and high workload
flights (Table 7.1.2.1-1).
The manipulation of workload, low and high, and flight routes, San Francisco to Stockton
and Sacramento to San Francisco, were counter-balanced across subjects.
PHASES OF FLIGHT
Seven phases of flight were examined in the simulation test:
(a) Takeoff
(b) Climb
(c) Top of Climb (TOC)
(d) Cruise
(e) Top of Descent (TOD)
(f) Approach
(g) Landing
Each flight contained seven measurement "windows" to assess the seven different
phases of flight. The term window is used to give the idea of a momentary examination
of a portion of a well defined phase of flight. Window and phase of flight are used
synonymously. Window is used when referring to experimental design or measurement
period, while phase of flight is used when discussing results. The events which opened
and closed the windows is listed below:
41
Table 7.1 .2. 1-1
*0
4)
.219
o- z
U-)
(I)1 CU a
a)% o~U
> ccE)
U--
o
oe cc
Cu
4) MW
ME.
ChW
L
= ) %- m cUc
J9 0. E
'FMa)
o ~0 cm
) a)a
U-
o o
Eo MI-'Z
42
PHASE OF FLIGHT OPENING EVENT CLOSING EVENT
(a) TAKEOFF E.P.R. > 1.5 Flaps 5 degrees
(b) CLIMB Flaps up 1 Minute later
(c) TOP OF CLIMB 10,000 feet 2 Minutes later
(d) CRUISE 3 Minutes after 2 1/2 Minutes
10,000 feet later
(e) TOP OF DESCENT Throttles to idle 5,500 feet
(f) APPROACH Localizer Outer Marker
Activation
(g) TOUCHDOWN Middle Marker 1 1/2 Minutes
later
A graphic is provided to illustrate the flight scenarios pictorially in order to aid the reader
in understanding the measurement windows (Figure 7.1.2.1-1).
SUBJECTIVE RATING
The Subjective Workload Assessment Technique (SWAT), NASA-Task Load Index
(NASA-TLX), and a simple 1-to-20 point overall workload score were used for subjective
workload assessment.
Half of the subjects used the Subjective Workload Assessment Technique (SWAT)
(Figures 7.1.2.2-1 and 7.1.2.2-2) and the other half used the NASA Task Load Index
(TLX) (Figures 7.1.2.2-3 and 7.1.2.2-4) with the 1-to-20 point overall workload scale
appended to the bottom of the NASA-TLX rating page.
In the current paradigm, the implementation of the subjective rating technique was not
possible in-flight due to the contamination of the other measures. Owing to the
recording of control input activity it was decided to be inappropriate to hand the pilot a
paper and pencil based subjective rating technique. To assess the subjective workload
tools (SWAT and NASA-TLX) half of the subjects received SWAT, the other half received
the NASA-TLX, during post-flight videotape viewing. An attempt made to gather
subjective ratings closer in time to the actual flights. Shortly after landing the pilots were
handed a clipboard with four segments to the flight demarcated: (1) takeoff through top
of climb, (2) Cruise, (3) top of descent through approach and landing, and (4) an overall
rating for the entire flight. Those results are discussed in another publication (Battiste
and Bortolussi, 1988).
Both SWAT and the NASA-TLX require techniques to customize the event ratings so they
can be combined to yield a single 0-to-100 scale for each measurement window.
Utilizing conjoint measurement techniques the 0-to-100 score for SWAT is based on an
interval scale. The technique used to customize the NASA-TLX 0-to-100 score does not
yield a truly interval 0-to-100 scale in a statistical sense, but it will be treated as such in
the analyses.
43
Figure 7.1.2.1-1
a-
:5U
4) L
CLl _)L
Ccc U))
Cu0 ci00c
00
C
U)
0))- a- C
I~ 00
CCc
Da4 In LIZII
444
Figure 7.1.2.2-1
I.ThbmIed
- 1. Ob. hav spere kirm.Irvermugom or over.
kp oum~i ades cur il*equenly or nrwim ad.
- 2± Occuonely how oe
3pm M kemvxloWw or
N. Psydiologca strmelaIcd
-1.Li~econlwion. risk kuslafioorarviely
2Wee
n canbeonyly accommWnoed.
-...2. Moderale m..due to conkision. frustraumo.
or
anselno*=*eabl dds Icwordold. Signlican
comnpenstin Isrequved to muirdaia
adequate performance.
-.3. Kigh lo very kleuwe strmsdue to conluln.m
kOIMulain orenaisly. ugh Ioegnrme deleirnahlon
mid UNl conliol
requnrd.
F I 7V34AP,2f
Figure 7.1.2.2-2
SWAT
27 Different Combinations
LLow (1)
45
Figure 7.1.2.2-3
TLX Rating Scale Definitions
Title Endpoints Descriptions
Mental Low/High How much mental and perceptual activity was required
Demand (e.g. thinking, deciding, calcutating, remembering,
looking, searching, etc.)? Was the task easy or
demanding, simple or complex, exacting or forgiving?
Physical Low/High How much physical activity was required (e.g., pushing,
Demand pulling, turning, controlling, activating, etc.)? Was the
task easy or demanding, slow or brisk, slack or strenuous,
restful or laborious?
Temporal Low/High How much time pressure did you feel due to the rate
Demand or pace at which the tasks or task elements occurred?
Was the pace slow and leisurely or rapid and frantic?
Effort Low/High How hard did you have to work (mentally and physically)
to accomplish your level of performance?
Figure 7.1.2.2-4
I Overall
• workload _ _ I
L.Low High
-J F1167.X RIM,
46
SWAT requires a rating of 1, 2, or 3, corresponding to low, medium, or high on each of
the dimensions (time, mental effort, and stress) for each measurement event. Using
conjoint measurement techniques, SWAT ratings can be converted to the 0-to-100
interval score. SWAT requires a sort, from low to high, of the 27 different cards
containing all the possible combinations of time, mental effort, and stress dimensions.
The card sort is completed before the event scoring begins. The card sort provides the
necessary information for the conjoint scaling solution that allows assigning the 0-to-100
score for the event ratings, 1-1-1 through 3-3-3. A given event rating, such as 2-1-3
(medium rating on time, low rating on mental effort, and high rating on stress), can then
be converted into a score that has a value ranging from 0-to-100.
The NASA-TLX uses six, 20-point low to high, bipolar scales for mental demand, physical
demand, temporal demand, performance, effort, and frustration. The customizing
portion, weighting, of the NASA-TLX was applied after the event scoring. The weighting
is done to establish a priority among the dimensions in a quantitative fashion. The
weights are ranked in order of importance by a forced-choice paired comparison task.
The weights are combined with the event ratings to form the 0-to-100 combined
workload score.
All of the subjective ratings were collected post-flight, utilizing video tape. Each
simulation flight was video taped. The video tape recorded a quad image which
contained: (1) right side profile of Captain, upper left of quad image, (2) left side profile of
First Office, upper right of quad image, (3) left three-quarter view of Flight Engineer
panel, lower left of quad image, and (4) view forward of flight deck from pedestal, lower
right of quad image. Subjects viewed the video tape at the end of a day's session for the
purpose of making subjective ratings. To demarcate the measurement windows small
light emitting diode (LED) lights visible in the video tape were illuminated during the
measurement windows. The LED lights are out of the pilot's field of view during the
actual simulation runs. For the purpose of making subjective ratings the pilots were
asked to attend to their workload when the LED lights where illuminated. When the
measurement window closed, the lights were extinguished, the video tape was stopped
by the experimenter, and the subject was asked to make event ratings. The pilots were
instructed not to refer to previous ratings when making event ratings.
PHYSIOLOGICAL INSTRUMENTATION
Data was collected for horizontal and vertical eye movement, eyeblink rate, heart rate,
and heart rate variability.
The pilots had electrocardiogram (ECG) electrode leads applied to the chest to record
heart rate. Conventional stress-type hospital grade disposable silver-silver chloride
electrodes were used. Since only the peak of the R-wave was relevant, not the complete
ECG wave form morphology, placement was mainly dictated by considerations of
convenience. One lead was placed just above the sternum and another was placed
approximately four centimeters above the waist and ten centimeters to the left of the
sagital plane.
Electrodes to record the electro-occulogram (EOG), including eyeblinks, were placed in
a conventional manner: active and referent just beyond the outer canthus of the left and
right eyes to record horizontal movements, and active and referent above and below the
left eye to record vertical eye movements and eye blinks. Pilots were instrumented with
Beckman 11rmm Silver-Silver Chloride mini-cup electrodes, held on by adhesive collars.
Methyl cellulose was used as the electrode cream. Linked rnastolds leads served as the
ground.
47
Resistance readings were checked for all the leads. Electrode leads were allowed to
have a maximum resistance of 30 Kohms.
All the leads to the electrodes were connected to a Grass Instruments isolator electrode
board (Model IMEB2). The connector box was suspended on the pilot's chest by cords
which were tied around his neck and waste. The subjects reported that the isolator box
did not interfere with their flight deck activities.
A 25 foot long Grass Instruments cable led from this box to the array of Grass amplifiers
and power supply, which were in a 19 inch rack cabinet fastened to the back wall of the
simulator cabin. The heart signal was amplified by a Grass model 7P511 AC amplifier
and the two EOG signals were amplified by two model 7P122 DC amplifiers. These three
physiological signals, along with an event marker signal and the audio from a
microphone on the subject's label, were all recorded on a Hewlett Packard model 3968-A
8 channel FM tape recorder. The tape speed was 3 3/4" per second.
The tape recorder was under computer control. The computer would activate the FM
tape recorder when a measurement window opened, and then stopped the recording
when the measurement window closed.
The data collected during the simulations was subsequently played back in the
laboratory for reduction and analysis.
EYEBLINKS
The analog signals for both vertical and horizontal eye movement were printed out on
strip charts, from which raters "scored" the vertical record to determine the number of
eyeblinks (Figure 7.1.2.2-5). Two scorers were used, and an objective scoring criteria
developed, to insure a between scorer agreement of at least 95%. Determining the time
elapsed between the opening and closing of each window made it possible to compute
eyeblink rate in blinks per minute.
HEART RATE
Interbeat interval is a measure of the time elapsed between heart beats, whereas heart
rate refers to how many beats occur in a minute. The analog heart signal waveform was
examined for inter-beat interval scoring. A voltage sensitive Schmitt trigger, connected
to a MINC (Digital Electronic Corp.) minicomputer, was used to digitize the analog
record by timing and recording the duration of the intervals between successive "R-
waves" of the cardiac signal in milliseconds. For each window all the heart inter-beat
intervals (IB's), in msecs (milliseconds), were saved as computer files for later
processing and analysis.
Heart rate, in beats per minute, can be computed using the following formula: (1000 / R-
R Interbeat interval) * 60. The transformation of the IBI to beats per minute is non-linear.
The ordinal position of the phases of flight ranked by workload would remain the same,
but the transformed scores would not correlate perfectly.
Occasionally heart data was contaminated by movement artifacts. It would result in IBI
values significantly different with the pattern of IBI values collected in the same
measurement window. A simple filter in the software excluded IBI values 33% different
than the low and high IBI values considered to be valid within a measurement window.
48
IjiN~
-- ý;ll Iy w HiIji
qW
.
.. ...
.... .
±.-li
it-
i. :7 ... ..
IiIi j .:¼
2
i2'.''i . If -!
lip li ! . ý: i
01 1 1, i !ý !i i .. .1/ 1 T,ý
: ..
F7Ti
4ý44
14 -4 4 -4 --
71 T- fl-.'7 7 !
ttl If
~~w4
HEART RATE VARIABILITY
A variety of methods have been suggested to index heart rate variability. However, as
there is no consensus agreement in the field upon any one best way, and since the
simple heart rate standard deviation has often been used for this purpose, we elected to
utilize it as a measure of the heart rate variability.
PERFORMANCE DATA
Data from the simulator was collected during the measurement windows as well. Wheel,
column, pedal, and throttle position data was collected in order to compute control input
activity. In addition, altitude over the outer and middle markers, flight director deviation,
glideslope and localizer deviation, and lateral deviation from runway centerline were
collected in the Approach and Landing windows. All performance data was collected at
a rate of 10 Hertz (Figure 7.1.2.2-6).
Root Mean Square error of flight director deviation, as well as localizer and glide slope
deviation could not discriminate between low and high workload, nor did the measures
demonstrate any evidence of reliability. The piloting task in commercial aviation has
large tolerances in the precision required in the flying task (e.g., plus or minus 300 feet at
altitude). Using RMS tracking error measures as indices of workload would require the
pilots to fly with a level of precision not normally required in revenue service.
50
Figure 7.1.2.2-6
ww
LI-
'U z
0 W
z 0
-~ -D
Uie z
0
ocM >
00-
U.. 1- 11
0. %-%L0L
a. cc
0J cr. LU
zccr WW0
00 W
00
0 LL 0
z 0 c.)
0 w w5<
SECONDARY TASK
A secondary task was also implemented to measure pilot spare capacity (spare capacity
is thought to be correlated with workload).
Two measures were derived from the Secondary Task: response time to the positive
probe and probe accuracy (percentage of positive probes responded to correctly). A
constraint was imposed on the time available for subjects to respond. Responses which
took longer than 10 seconds were disregarded because the task was not executed
immediately.
SIMULATOR
The Man-Vehicle System Research Facility (MVSRF) B-727 six degree of freedom motion
base simulator was used to provide Phase II certified fidelity. The MVSRF has a number
of assets, most notably is the ability to simulate the Air Traffic Control (ATO)
environment. The ATC simulation creates realistic levels of communications workload, a
part of the standard commercial transport environment often overlooked in simulation.
SCENARIO
There were two routes flown each session. The routes were Sacramento - to - San
Francisco (SMF-SFO) and San Francisco - to - Stockton (SFO-SCK). Both flights were
flown at 11,000 feet enroute, to ILS approaches and landings.
COUNTER-BALANCE OF CONDITIONS
The presentation of routes is partially counter-balanced across pilots with low and high
workload flights. For half of the subjects the SMF-SFO and SFO-SCK are the low and
high workload flights, respectively. For the other half of the subjects the combination of
52
flights and workload is reversed, SMF-SFO and SFO-SCK are the high and low workload
flights, respectively.
Pilots flew two scenarios, a high and a low workload flight, on both visits. Pilots were
randomly assigned to either: 1) SFO-SCK (high workload) and SMF-SFO (low workload),
or 2) SFO-SCK (low workload) and SMF-SFO (high workload). Half the subjects flew the
high workload scenario first, the other half of the subjects flew the low workload
condition first. On the retest day pilots flew the exact same scenario, departure-arrival
destinations and workload condition, as the previous session but the scenario order was
reversed.
DAILY SCHEDULE
Subject pilots arrived at the MVSRF facility at 9:00 am. Subjects were greeted by the
experimenter and tape recorded instructions regarding the day's activities were played
for them. Subjects then received "differences training" from Preston Suit (Boeing Flight
Crew Training). The differences training involved a discussion of the configuration of the
cockpit. The differences training was facilitated by using full-size color photographs of
the instrument panels in the simulator. After differences training was completed the pilot
was briefed on the routes and weight & balance of the aircraft for the simulation runs.
Following the differences training, tape recorded instructions for the SWAT card sort was
played for the pilot. The experimenter provided further clarification on the technique
used for the SWAT card sort. The card sort required 20 to 60 minutes to complete.
The pilot and experimenter then went to lunch.
Following lunch the pilot was instrumented with the physiological equipment and placed
in the simulator cab. The pilot then departed San Francisco International Airport on
runway 28R and spent time flying the aircraft "around the pattern." The pilot was
encouraged to practice steep turns, pull the throttle back on an engine, and generally
get comfortable with the handling qualities of the simulator. The pilot flew an ILS
approach to a touch and go on 28R at SFO. The pilot again flew "around the pattern" to
another ILS approach and landing on 28R at SFO.
The pilot then flew the two test trials. Following the test runs the instrumentation was
removed from the pilot. Tape recorded instructions on event rating for either SWAT or
the NASA-TLX was then played for the pilot. Then the videotape of the simulation test
runs was played for the pilot in order to obtain subjective event ratings. Only a small
segment (30 seconds) prior to the actual measurement window, and the window itself
was played for the purpose of making the event ratings.
Finally, the subject pilot was thanked for participating.
7.1.4 STATISTICAL ANALYSES
Even though there are numerous statistical comparisons made, each workload
assessment technique is treated as though it was the only dependent variable utilized in
the study. No adjustment was made to the alpha level (probability of a Type I error) for
the various comparisons reported from the simulation effort. This approach, referred to
by Kirk (1982) as a "contrast-wise Type I error rate" may seem to be liberal. The reason
for the contrast-wise Type I error rate is that the statistical effects tested were predicted a
priori, and a stringent Type I error rate is applied throughout. A nominal alpha level of
0.01 was adopted for determining significance for all the analyses. Alpha levels between
53
0.01 and 0.05 were considered as strong trends while alpha levels between 0.05 and
0.10 were considered trends.
Boeing's TLA was utilized to confirm that the manipulations of the pilot's task demands
would, in fact, yield a change in the pilot's workload between the "low" and "high"
workload flights.
It is necessary to establish a decision rule to be able to determine if a given workload
measure has shown validity and reliability. The threshold of the decision rule will most
likely generate an argument as to the appropriateness of the threshold, but the rule is
necessary for a discussion of the "goodness" (validity and reliability) of the measure.
7.1.4.1 VALIDITY ANALYSES
In order claim validity for any of the assessment techniques it was necessary for the
workload measure to discriminate between the low and high workload conditions. A 2 X
2 X 7 repeated measures ANOVA was performed for each workload measure. The
factors of the 2 X 2 X 7 ANOVA were session (1 or 2), workload level (low or high), and
phase of flight (seven measurement windows), respectively. The means represented in
the graphs are from the 2 X 2 X 7 ANOVAs.
An unfortunate artifact of the repeated measures ANOVA approach is the case-wise
deletion of subjects owing to missing data in any of the measurement windows. If a
subject fails to have data in any of the 28 measurement windows (2 sessions containing
2 flights with 7 measurement windows per flight) for a given workload measure then the
entire subject is deleted from the analysis.
All significant F ratios are reported, and the results of the test for a main effect of
workload will be reported whether there is a statistically significant finding or not.
An a priori prediction is that workload would vary across the phases of flight, for either
the low or high workload flight. Although a workload measure may not be able to
discriminate among the periods of high workload in a flight (i.e., takeoff, approach, and
landing), a workload measure should be able to discriminate low from high within a flight
(cruise and landing). Oneway ANOVAs were computed, for both the low and high
workload flights, for each workload assessment technique to determine if the various
phases of flight could be discriminated from one another.
54
appropriate sensitivity of the assessment technique to the manipulation of levels of
workload.
An interaction of test session (day 1 or 2) with another factor (e.g., workload or phase of
flight) could indicate a number of influences including:
(a) instability of the measure,
(b) practice effect,
(c) adaptation to test conditions.
No systematic attempt was made to describe the nature of the Newman-Kuels range
statistic due to complications arising from the large number of comparisons. Our
approach, simply stated, is the more significant differences that were found the more
discriminable the workload measure was thought to be. (A note to the reader: Even the
results for the various body channels of the TLA disassociate, leaving no clear answer in
terms of a prediction for global workload differences between the different phases of
flight. Therefore a detailed interpretation of phase of flight differences for the various
workload measures is beyond the scope of this project.)
7.1.4.3 RELIABILITY ANALYSES
A test/retest methodology was employed so that the reliability of the various measures
could be assessed.
55
7.1.4.4 RELIABILITY DECISION CRITERIA
To demonstrate reliability a workload measure should find significant test/retest
correlations for the various phases of flight. Unfortunately there is variation in the sample
size for the different workload measures. There are sample sizes of 9, 16, and 17 for the
subjective, physiological, and performance measures, respectively. The different sample
sizes influences the critical Pearson coefficient necessary for a significant correlation. It
is therefore very difficult to establish a uniform decision criteria regarding the "goodness"
of workload measures' reliability. A good rule of thumb is that each workload measure
demonstrate positive correlations that are large (nearer to + 1.0 than 0).
For confidence about the inter-rater reliability criteria was established that at least half of
the pilots scores (average test/retest) for the 14 measurement windows should be
significantly correlated with the group means.
7.2 RESULTS
7.2.1 TIMELINE ANALYSIS RESULTS
To compute the percentage of time required, the total time used by a particular channel
(visual, manual, auditory, verbal or cognitive) for the measurement period was divided by
time available.
There was no change in procedures during the takeoff window between the high and low
workload segments in the scenarios built for TLA. Therefore, for all channels, visual,
manual, verbal, auditory and cognitive, the data shown for the takeoff window is identical
for high and low workload.
The data on the visual channel (Table 7.2.1-1) shows that visual demands are higher in
the high workload scenarios starting at the climb window, when the pilot attempts to
engage the autopilot. The higher demand on the visual channel continues through the
top of climb when the engine failure occurred, and cruise when the hydraulic failure
occurred. The visual demands remain higher through descent, approach, and landing
as the pilot is manually flying the aircraft in low visibility for a 15 degree flap landing.
Manual data is computed in TLA for the left hand, right hand, and total manual data (sum
of the two hands). The assumption was made in preparing this data that there are
continuous corrections (about once per second) made with the left hand in the high
workload scenarios, except in the cruise phases when the aircraft is trimmed, then the
correction rate becomes once every 5 seconds. These corrections do not occur in the
low workload scenarios when the autopilot is engaged. The differences in workload
between high and low scenarios in the manual channel (Table 7.2.1-1) occurs starting at
the top of climb with the engine failure. In the high workload scenarios the pilot must
take actions: the engine failure/shutdown checklists are read, rudder is trimmed, and
thrust is adjusted. In cruise, in the low workload scenario, when the autopilot is engaged
there is a very low manual requirement. In the high workload scenario in cruise,
however, the right hand must take actions as a result of the "B" system hydraulics failure.
Descent, approach and landing show a much higher manual requirement in the high
workload scenario as might be expected. The pilot is manually flying the aircraft and
must respond to checklists required with the failure conditions. The manual sum
requirement is over 100% in approach and landing. The pilot is using both right and left
hands at the same time and manual sum reflects this.
The verbal component (Table 7.2.1-1) in these flights was low for the pilot. All
56
Table 7.2.1-1
Timellne Analysis
LOW WORKLOAD
WINDOW VISUAL MANUAL MANUAL VERBAL AUDITORY COGNITIVE
LEFT RIGHT
TAKEOFF 88 65.5 20 5 16 73
CRUISE 30 0 1 0 1.5 32
HIGH WORKLOAD
WINDOW VISUAL MANUAL MANUAL VERBAL AUDITORY COGNITIVE
LEFT RIGHT
TAKEOFF 88 65.5 20 5 16 73
57
communications with ATc and dispatch were handled by the first officer. Pilot verbal
communications were limited to crew coordination type communications. A small
difference occurred in climb because the autopilot did not engage in the high workload
scenarios. At the top of climb the pilot makes the decision to shut down the engine and
asks the first officer to advise ATC of the failure. In cruise the increase in verbal
workload occurs when the hydraulics failure checklist is called for. Though the percent
of communication in the high and low workload scenarios in descent and approach is
the same, the content is not. In the high workload scenario the pilot is handing off more
tasks as he manually flies the aircraft.
For the auditory channei the significant changes between the low and high workload
flights occur at the top of climb and in the cruise windows when the checklists are called
for after the malfunctions occur. In descent and approach we see a slight reversal due
to the pilot taking more of a command role, as discussed in the verbal data.
The cognitive channel is influenced by the number of indicators looked at by the pilot,
and the complexity both of the indicators and the procedures performed. In other words,
how long it takes the pilot to act is based on the number of choices available. In takeoff
and climb the cognitive workload is the same for the high and low scenarios (Table 7.2.1 -
1). In climb the cognitive channel is at 100% due to reconfiguration tasks, thrust
management, heading changes, and completion of the required check lists. In addition,
in the high workload scenario, the autopilot fails to engage. At the top of climb (high
workload) the cognitive channel is high because of the engine failure. In the high
workload flight during cruise the hydraulics failure occurs, and the pilot calls for the
appropriate checklist. In descent, approach, and landing there is a higher cognitive load
in the high workload scenarios as a result of manually flying the aircraft with an increase
in system monitoring and more frequent instrument scanning.
The two low workload scenarios were averaged to yield the low workload scores for the
various body channels. Similarly, the two high workload scenarios were averaged to
yield the high workload scores for the various body channels (Table 7.2.1-1). Using
percentages negates the fact that each of the measurement periods in the simulation
was of varying length (ranging from one minute to over 6 minutes).
7.2.2 SUMMARY OF RESULTS
The task analysis provides a link between the past certification workload assessment
methods and the workload assessment methods tested in the Part-Task simulation. The
comparison of workload assessment methods provides the opportunity to assess the
validity of the "pilot in the loop" methods against an analytic tool, namely TLA. If a new
workload measure agrees with the task analyses, both probably reflect the same
conditions (same task-demands). If the timeline analysis and the workload measure do
not agree, but the measure is shown to demonstrate validity and reliability, then the
measure may reflect a type of workload not accurately quantified by older techniques.
SWAT, NASA-TLX, the 1-to-20 point Overall Workload Score, and Heart Rate all
demonstrated evidence of validity by discriminating workload, and a significant
interaction of workload and phase of flight.
NASA-TLX, the 1-to-20 point Overall Workload Score, Eyeblink, and Heart Rate all
demonstrated evidence of reliability by finding significant test/retest correlations. In
addition, at least 50%/a of the pilots scores correlated significantly with the group mean for
the workload measure.
58
7.2.2.1 SUBJECTIVE RATINGS
SWAT, NASA-TLX, and the 1-to-20 point overall workload score demonstrated evidence
for validity by discriminating between low and high workload. In addition, the NASA-TLX
and the 1-to-20 point Overall Workload Score demonstrated evidence of being reliable
measures.
SWAT
To yield the appropriate 0 to 100 scaling solution for the SWAT ratings, the group scaling
solution was used for the SWAT card sort. The Kendall's coefficient of concordance
comparing the ranks of the card sorts for the various pilots was greater (0.7824) than the
recommended 0.78 for using the group scaling solution.
SWAT discriminated between the low and high workload flights, F(1,8) = 17.20,
(MSe=1313, 9 <.01) (Figure 7.2.2.1-1 and Table 7.2.2.1-1). A workload by phase of
flight interaction was significant as well, F(6,48) =9.58, (MSe = 156, .p < .01).
A strong trend for a main effect for phase of flight discrimination was found,
F(6,48)=2.93, (MSe=298, £p<.02). No main effect was found for a oneway ANOVA
examining phase of flight discrimination for the low workload flight, _F(6,48) = 1.55, ns. A
significant main effect was found for a oneway ANOVA examining phase of flight
discrimination in the high workload condition, _F(6,48)=7.55, (MSe=139, P<.01). A
Newman-Kuels range statistic was computed to determine SWATs ability to discriminate
phase of flight for the high workload flight 5 out of 21 comparisons were significant.
NASA-TLX
To yield the appropriate 0 to 100 workload scaling solution for the NASA-TLX, each
individual's weighting scores were applied to their event ratings.
NASA-TLX discriminated between the low and high workload flights, _F(1,7) = 17.27,
(MSe=436,.p <.01) (Figure 7.2.2.1-2 and Table 7.2.2.1-3). A workload by phase of flight
interaction was significant as well, F(6,42) = 4.69, (MSe= 126, P < .01).
A main effect for phase of flight discrimination was found, F(6,42)=4.19, (MSe=84,
.p<.01). No main effect was found for a oneway ANOVA examining phase of flight
discrimination for the low workload flight, _F(6,42) = 1.75, ns. A significant main effect was
found for a oneway ANOVA examining phase of flight discrimination for the high
workload flight, F(6,48)=7.14, (MSe=68, .p<.01). A Newman-Kuels range statistic was
computed to determine NASA-TLX's ability to discriminate phase of flight conditions for
the high workload flight 5 out of 21 comparisons were significant.
In evaluating the test/retest reliability of NASA-TLX 4 significant correlations, out of a
possible 14, were found (Table 7.2.2.1-4).
Examining inter-rater reliability found 78% of the subject's scores correlated significantly
with the means for the 14 measurement windows.
59
Figure 7.2.2.1-1
Subjective Workload Assessment Technique
(SWAT)
Part Task Simulation
60 - Mean Scores
5 Low workload
5/ . High workload
SWAT 40 /
scores - / ...... 4
30 /
10 -
0-L" 1 I I I II
i aKeort ulmo Top of Crutse Top of Approach Landing
climb descent
(TOC) (TOD)
Table 7.2.2.1-1
Subjective Workload Assessment Technique
(SWAT)
Part Task Simulation Data
Means and Standard Deviations
Window Low-workload flight High-workload flight
Mean SD Mean SD
Takeoff 13.21 (14.77) 18.41 (16.64)
Climb 17.52 (17.53) 24.59 (19.04)
Top of climb 12.51 (10.67) 49.77 (24.54)
Cruise 7.83 (13.69) 44.04 (22.28)
Top of descent 14.67 (9.47) 29.74 (18.04)
Approach 18.46 (11.56) 34.97 (19.05)
Landing 18.53 (10.84) 33.72 (20.94)
MG? f5 H3U
60
Table 7.2.2.1-2
SWAT
Part Task Simulation
£ Test-Retest
Reliability Correlations
Window Low High
Takeoff 0.46 0.83*
Climb 0.44 0.70
Top of climb -0.01 0.80*
Cruise 0.67 0.46
Top of descent -0.18 0.41
Approach 0.10 0.69
Landing -0.0005 0.48
r(7) =.798*
*Significant p<.01 F 116776 MHS
61
Figure 7.2.2.1-2
45 -- Low workload
40 -- . High workload
35 -- ,
TLX 3 -/
scores 25 --
20 -
15 -
10 -
5
0 I *
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOO)
Table 7.2.2.1-3
MII&7 ,. "W
62
Table 7.2.2.1-4
NASA-TLX
Part Task Simulation
Test-Retest
Reliability Correlations
Window Low High
Takeoff 0.79 0.61
Climb 0.41 0.57
Top of climb 0.21 0.56
Cruise 0.47 0.82*
Top of descent 0.87* 0.51
Approach 0.94* 0.47
Landing 0.61 0.83*
r(7) =0.798*
*Significant p < 0.01 F1167.77
H7G
63
1-to-20 POINT OVERALL WORKLOAD SCORE
The 1-to-20 point workload score was analyzed without any sort of transformation of the
event ratings.
The 1-to-20 overall workload score discriminated between the low and high workload
flights _F(1,8)=27.14, (MSe=16, p<.01) (Figure 7.2.2.1-3 and Table 7.2.2.1-5). A
workload by phase of flight interaction was significant as well, F(6,48) = 7.41, (MSe = 5,
.P<.01).
A main effect was found for phase of flight discrimination was found, _F(6,48) =4.15,
(MSe=4.8,.p<.01). No main effect was found for a oneway ANOVA examining phase of
flight discrimination for the low workload flight, _F(6,48) = 2.35, ns. A significant main
effect was found for the oneway ANOVA examining phase of flight discrimination for the
high workload flight, F(6,48)=8.22, (MSe=3,.p<.01). A Newman-Kuels range statistic
was computed to determine NASA-TLX's ability to discriminate phase of flight for the high
workload flight 5 out of 21 comparisons were significant.
64
Figure 7.2.2.1-3
Overall Workload Score
1- to 20-Point
Part Task Simulation
12 - Mean Scores
10 -
Low workload
High workload
8 -/ / K
Overali 6 /
workload /
scores
0I I I
Takeoff Climb T3p of Cruise Top of Approach Landing
climb descent
(TOG) (TOD)
Phase of flight H1167.31,
Table 7.2.2.1-5
""667 17 Mo
65
Table 7.2.2.1-6
66
Figure 7.2.2.2-1
Eyeblink Rate
(Blinks per Minute)
Part Task Simulation
Mean Scores
20
18 /Low workload
16 - High workload
Eyeblinks/ 14 -
Ae
min 12 -- .
10 -
6
or" I I I I I I I
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Table 7.2.2.2-1
67
Table 7.2.2.2-2
68
HEART RATE (INTER-BEAT INTERVAL)
Interbeat interval discriminated between the low and high workload flights,
_(1,F16)-=27.74, (MSe-=2763, .p<.01)(Figure 7.2.2.2-2 and Table 7.2.2.2-3). A workload
by phase of flight interaction was significant as well, _F(6,96) = 5.69, (MSe = 446, P <.01).
There is a strong trend for a main effect of session, day 1 has smaller IBIs than day 2,
f(1,16)=7.81, (MSe=34357, .p<.02). This effect indicates that the pilots do not
experience as much workload on the second session. As was mentioned for the same
interaction found for eyeblink, there may be instability of the measure over time. A more
likely explanation for the interaction is that there is adaptation occurring because the pilot
is experiencing identical conditions during the retest.
A significant main effect for phase of flight discrimination was found, F(6,96) = 23.32,
(MSe=1279, p<.01). Separate oneway ANOVAs were performed and significant main
effects were found for phase of flight discrimination for both the low and high workload
flights, F(6,102)=22.61, (MSe=397, p1<.01) and F(6,90)=15.04, (MSe=454, P <.01),
respectively. Inter-beat interval could discriminate 14 out of 21, and 9 out of 21, phase of
flight comparisons for the low and hiogh workload conditions, respectively.
The blood pressure component was not able to discriminate the difference between low
and high workload, F(1,15) =4.75, ns (Figure 7.2.2.2-4 and Table 7.2.2.2-7).
A main effect for phase of flight discrimination was found, _F(6,90)=8.54, (MSe=54,
,p<.01). Separle oneway ANOVAs were performed and significant main effects were
found for phase of flight discrimination for both the low and high workload flights,
_F(6,102)=7.26, (MSe=23, p<.0I) and F(6,90)=3.59, (MSe=23, .p<.01), respectively.
69
Figure 7.2.2.2-2
Inter-Beat Interval (Msec)
Part Task Simulation
Mean Scores
800 -
Inter-beat 720 -
interval/ - "-
700 --
(Msec)
680
660
640
620
P h a s e o f flig ht ,,I•1c
,C2 , ,
Table 7.2.2.2-3
Inter-Beat Interval (Msec)
Part Task Simulation Data
Means and Standard Deviations
70
Table 7.2.2.2-4
71
Figure 7.2.2.2-3
Inter-Beat Interval
Standard Deviation (Msec)
Part Task Simulation
Mean Scores
6 Low workload
/ . High workload
60 /
Inter-beat
interval 55 -
/
(Msec) /
45 lo,'''•
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Phase of flight 11
'F "
Table 7.2.2.2-5
Inter-Beat Interval
Standard Deviation (Msec)
Part Task Simulation Data
Means and Standard Deviations
Window Low-workload flight High-workload flight
Mean SD Mean SD
Takeoff 41.58 (17.05) 39.04 (15.09)
Climb 41.55 (13.90) 43.99 (10.28)
Top of climb 47.27 (16.93) 50.08 (12.83)
Cruise 43.80 (9.73) 49.09 (14.26)
Top of descent 47.03 (11.67) 50.20 (12.15)
Approach 57.16 (50.45) 49.02 (15.93)
Landing 58.54 (13.80) 61.33 (12.82)
72
Table 7.2.2.2-6
Inter-Beat Interval
Standard Deviation (Msec)
Part Task Simulation
Test-Retest
Reliability Correlations
Window Low High
Takeoff 0.52 0.7*
Climb 0.39 0.24
Top of climb 0.61* 0.31
Cruise 0.16 0.7*
Top of descent 0.68* r(15) = 0.48
Approach 0.78* 0.52
Landing 0.25 0.16
r(16)=.590"7
*Significant p<.01 f1167:?3 ,WS
73
Figure 7.2.2.2-4
8
01 I i i i
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Table 7.2.2.2-7
Power Spectral Analysis (Blood Pressure Component)
Part Task Simulation Data
Means and Standard Deviations
74
The Blood Pressure component could discriminate 7 out of 21, and 1 out of 21, phase of
flight comparisons for the low and high workload conditions, respectively.
The blood pressure component demonstrated poor test/retest reliability, 2 out of 14
possible correlations were significant (Table 7.2.2.2-8).
In assessing inter-rater reliability it was found that 17% of the subjects significantly
correlated with the means for the measurement windows.
POWER SPECTRAL ANALYSIS (RESPIRATION COMPONENT)
The respiration component was able to discriminate the low and high workload
conditions F(1,15) =9.17, (MSe = 10, .p <.01) (Figure 7.2.2.2-5 and Table 7.2.2.2-9).
A main effect for phase of flight discrimination was found, F(6,90)=3.C1, (MSe=16,
.p<.01). The respiration component of the power spectral analysis could not
discriminate among the phases of flight for the low workload condition, F(6,102) = 2.19,
ns. A significant main effect for phase of flight discrimination was found for the high
workload flight, _F(6,90) =3.43 (MSe = 7, .p < .01). Newman-Kuels analyses of the windows
found no ability to discriminate the various phases of flight for either the low or high
workload conditions.
Evaluating test/retest reliability for the respiration component found 4, out of a possible
14, correlations significant (Table 7.2.2.2-10).
In assessing inter-rater reliability measures it was found 28% of the subjects significantly
correlated with the means for the measurement windows.
7.2.2.3 PERFORMANCE MEASURES
Control activity during manual flight path control demonstrated a strong trend for
discriminability between low and high levels of workload. Tremendous attrition of control
activity data occurred in the low workload condition due to autopilot usage. During auto-
flight control activity is not considered a measure of workload because the pilot is no
longer in the control loop. Test/retest reliability was high for the control activity
measures.
75
Table 7.2.2.2-8
76
Figure 7.2.2.2-5
1-r
o-r" I I I I I I
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Phase of flight ,,f ,6C,,
Table 7.2.2.2-9
Power Spectral Analysis (Respiration Component)
Part Task Simulation Data
Means and Standard Deviations
77
Table 7.2.2.2-10
78
Figure 7.2.2.3-1
35 - Low workload
30 - ".. High workboad
25 -- /f-.-. //
"
Wheel 25
control 20 -
inputs/
min 15
5
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Table 7.2.2.3-1
M167 40
79
Table 7.2.2.3-2
80
Inter-rater reliability is not available owing to missing data for the low workload conditions
in which the autopilot is used.
For pedal control activity (rudder inputs) there is no discernable difference between low
and high workload conditions, F(1, 1)= 1.01, ns, (Figure 7.2.2.3-3 and Table 7.2.2.3-5).
A main effect for phase of flight discrimination was found, F(6,6)= 14.73, (MSe= 146,
p<.01). Separate oneway ANOVAs were performed and significant main effects were
found for phase of flight discrimination for both the low and high workload flights,
F(6,6)=16.79 (MSe=26, .p<.01) and F(6,96)=100.65 (MSe=27, P<.01), respectively.
Pedal control activity could discriminate 10 out of 21, and 11 out of 21, phase of flight
comparisons for the low and high workload conditions, respectively.
Test-retest reliability for pedal control activity was not as high as either wheel or column
control activity, 6 significant correlations out of a possible 14 (Table 7.2.2.3-6).
Inter-rater reliability is not available because of missing data for the low workload
conditions in which the autopilot is used.
SECONDARY TASK
There were some implementation problems encountered with the secondary task. A
post hoc examination of the accuracy of probe delivery indicates that ATC personnel did
81
Figure 7.2.2.3-2
Column (Elevator) Control Inputs (per Minute)
Part Task Simulation
Mean Scores
60
5 Low workload
50 / / / .... High workload
40 //
Column
control 30 - - /-
inputs/
min 20 /
//
10
Table 7.2.2.3-3
HIW674 R3Mob
82
Table 7.2.2.3-4
83
Figure 7.2.2.3-3
10 -
5J
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Phase of flight f IC ,R
Table 7.2.2.3-5
M1167 25 H3f
84
Table 7.2.2.3-6
S*Significant
F1167s, R,,,
85
not deliver the probes in a consistent fashion across the different subjects.
SECONDARY TASK (RESPONSE TIME)
Owing to the case-wise deletion of data with missing cell entries, there were no valid
cases left to analyze for testing validity (discriminability of low/high workload or phase of
flight). Even with all the flaws in the data collection there was a pattern emerged for
faster response times in the low workload condition (Figure 7.2.2.3-4 and Table 7.2.2.3-
7).
A statistical analysis of the reliability measures for response time was muddled because
of the discrepant sample sizes due to the flawed probe presentation (Table 7.2.2.3-8).
Similar to Response Time, the statistical analysis of reliability for Probe Accuracy was
muddled because of the discrepant sample sizes for the different phases of flight (Table
7.2.2.3-10).
7.3 RELATIONSHIP OF WORKLOAD MEASURES
A correlation matrix of all the workload measures was computed with the means for the
14 phases of flight (7 windows from the low and high workload flights). Caution should
be ex3rcised in extrapolating any relationship between the workload measures from a
correlation matrix constructed in this fashion. The correlation matrix (Tables 7.3-1 and
7.3-2) allows the reader to compare various measures to determine which measures are
sensitive to the same changes in task demands.
The correlations (test/retest and inter-rater) presented earlier were constructed in an
entirely different fashion. For each workload score, the test/retest correlations were
constructed individually for each phase of flight by correlating the session one and two
scores for all the pilots. For each workload score, the inter-rater correlations were
computed for each pilot by correlating the average of his session one and two scores for
all 14 phases of flight to the group averages for the 14 phases of flight.
Included in the correlation matrix are the results from the Boeing TLA. The reader
should be aware of the difference between the workload measures collected in the Part-
Task simulation and values from the Boeing TLA. The workload measures collected
from the pilots represents averages calculated from a distribution of scores based on
sample sizes ranging from 9 to 18. The mean for each cell is then used for the various
workload measures in the correlation matrix. On the other hand, the Timeline Analysis
represents a micro-motion analysis which yields a single value for each body channel for
the various phases of flight. It is the single value for TLA body channel which is used
along with the means for the various workload measures to compute the correlation
matrix.
7.4 PRINCIPAL COMPONENT ANALYSIS
A Principal Component Analysis was computed on the averages for the 14 phases of
86
Figure 7.2.2.3-4
Secondary Task
(Response Time)
Part Task Simulation
Mean Scores
4.0
time 2.5 -
(sec)
2.0 -
1.5
0 " I I I I I II
Takeoff Climb Top ol Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Table 7.2.2.3-7
87116726 R5
87
Table 7.2.2.3-8
88
Figure 7.2.2.3-5
Secondary Task
(Probe Accuracy - Percent Correct)
Part Task Simulation
100% Mean Scores
90% - Low workload
80% - High workload
Percent 70%-
correct 60% -
50% / .
40%
30%
20%
10%0 F I I I I
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Table 7.2.2.3-9
Secondary Task
(Probe Accuracy - Percent Correct)
Part Task Simulation Data
Means and Standard Deviations
Window Low-workload flight High-workload flight
Mean SD Mean SD
Takeoff 60% (55%) 75% (45%)
Climb 41% (44%) 46% (49%)
Top of climb 61% (34%) 41% (36%)
Cruise 81% (23%) 55% (37%)
Top of descent 84% (15%) 77% (18%)
Approach 67% (27%) 54% (30%)
Landing 82% (39%) 44% (51%)
H"It7 P, R3c
89
Table 7.2.2.3-10
F 1IG7 86 M4fS
90
Table 7.3-1
I-I
CO w0 V-0or-m m Ic
Q)PZ r-COLoM ) )
CO -N - N IC =w U) -
IW O OC) M LncO M M CD
m- "000 q-C Mq O )C
. -E Ip
cc 0C'ý C0C\"CCtmLqlCqJm
C)IzCNM TC U0O)00D0VC C
0 ~ 0 C D C DL C-
- wm ') w '- 0:
0UCD q-"C) C\)CWCo QC >- t4rc
-
>.w
>-
U ozcccJ
~t
mcc.9 -j
«w0 u> < 0
91
Table 7.3-2
C,
C)o
wC
00
0-
V 0 cn
<- CD CYu
m CD)
cc:
'-~0 ~ - u
0 00 t:
IV 0cocr co m
O oro U)
0 j
z vC CJ z o f z z/
cu
c: cc~
?E
C)' U 0)
G) co0 C--
0~-c~CoC
c ~-
.)-
ICTu=3L)(
c 0UU)
.) .C
Cl(-))
Loo6 00 Q) c ) U
0 z CD 6 CD W 0
0 C-)
V c? .0
I?
CD
- .
c C o oC Cý
.flo
Q)UU)j-,:
>~ ~ C L
coCLcQ(
0o>~><< VU))Ccc$2
92 C n ) nC
flight (7 each from the low and high workload conditions) in order to determine common
"underlying" dimensions among the workload measures.
Principal Component Analysis is a form of Factor Analysis that maximizes the variance
accounted for in the solution. Principal Component Analysis (PCA) sorts the measures
into factor loadings by maximizing the amount of variation that can be explained by each
factor, and then calculates a "loading" for each variable on the respective factors. The
loading is a score between + 1 and -1, similar to a correlation coefficient, where the
absolute value indicates the strength of the loading and the sign (plus or minus)
indicates the direction of the relationship.
The "naming" of the factors which emerge from any Factor Analysis can be the subject of
much debate. The investigators in this project developed the following labels:
Caution should be exercised when.evaluating the PCA table because the variables
included in the analysis has an effect on the factors and the loadings. If variables were to
be left out of the analysis it is likely that the factor loading scores would change for the
variables.
The PCA table that follows shows which measures load on common dimensions (Table
7.4-1). Factor loadings less than plus or minus 0.25 are deleted to ease in the reading of
the PCA table.
7.5 DISCUSSION
The intent of our program is to examine existing workload measures to determine the
validity and reliability of the application of these measures to new aircraft certification
under FAR 25.1523. In pursuit of the stated goal each workload measure has been
treated as if it were the only measure being examined in this project. There are two
reasons why we are using this approach. First, it was never our goal to develop a
battery of workload measures to be utilized in commercial aircraft certification. It was our
intent to subject each of the candidate workload measures to the rigors of full fidelity
simulation testing in order to examine the constructs of validity and reliability. Second,
the measures are examined individually because this is conceptually the only practical
manner to assimilate all the data presented.
In order to provide quick look summaries of the data two tables have been compiled.
The first table summarizes the empirical findings of validity and reliability (Table 7.5-1).
93
Table 7.4-1
94
Table 7.5-1
z z z z z z
z
z- -
4 0 <
Lu z z z __ - ML
U-_ - ____ 3nz<W
cc z -JMM 0 w
U) LUWLJ mL~
0L
Z) <__M___ -j_ au 0(
z z z L - - z m LU a:
-j -(n -)U)
CL) U) U) Z)
a: C\J
Lu Lu 0 Lu 0a Ž S 2
U) U) (n c0
mi
a: 0 Lu 0 Lu LO cc40
0N~0 Lu 0
-- CL - CL w
>-
o -12
_____ c) o< Za-U
- - - r Z~~ - 2~
m
fli I-a: cc 0
C W Z, 4 4 ,u d:u0C
0u Wý- cn ýr
-i~~~~U >Uo))< -m o
__0_ 0CC
2 >- - - 0
ULC) U) (n c q:
Lu C'J Lu U)u )W
U) U) '
w 0I0C) L coN ý0 m L U
<U)
a)
0Q w
Z n U) U4ZzEU) o
<.
oC
w 0~~~~C ~Z ~ z- 0, - 0=.
0 U) 0 F0c CCh-
LL ~>
U) 0 C/) Li i.
w 4\ 0 LLu Lr C) 2
0 UU)Lu~~
U) U) a:-
A
0 __
_0 00 L CD
_-__
-a _
T 95 )
The second table presents the rank ordering of the phases of flight for each workload
measure (Tables 7.5-2 and 7.5-3).
7.5.1 DISCUSSION OF VALIDITY AND RELIABILITY RESULTS
SUBJECTIVE
The subjective measures demonstrate validity and reliability, although SWAT was weaker
than the other two measures in terms of reliability. The authors feel that the low
overhead in terms of implementation and data reduction for subjective measures add to
the attractiveness of utilizing these types of measures in an aircraft certification effort.
The Pilot Subjective Evaluation (PSE) technique developed by Boeing could not be
utilized in the current study due to the baseline versus new aircraft comparison inherent
in the PSE. Although the criteria of the present study could not be brought to bear on
the PSE, the dividend of identification of specific subsystems which are influenced by the
functions and factors of FAR 25.1523 Appendix D, is of enormous benefit in an aircraft
certification effort (Fadden, 1982; Fadden and Ruggerio, 1987).
PHYSIOLOGICAL
The physiological measures were perhaps the most disappointing in terms of the ability
to discriminate levels of workload in a reliable manner.
Horizontal EOG was collected in order to analyze eye movement. Eye Movement per
unit of time may be a useful index of workload, but resource limitations did not allow for a
careful reduction of the horizontal EOG data so no analyses were conducted. Eyeblink,
although reliable, could not discriminate the different workload conditions. The lack of a
main effect for eyeblink may be due to the different visual task demands in the phases of
flight inherent in commercial transport aircraft operations. Certain phases of flight can be
characterized by a great deal of chart reading, others by the intense scan of instruments
on an approach, yet others by quick scans at various system instruments (both on the
forward instrument panel and the flight engineer's panel) to diagnose system
malfunctions.
Inter-beat interval (mean Heart Rate) was a fairly robust measure in terms of validity and
reliability. We are still concerned about the ability to tease arousal and workload apart
when using a measure such as Heart Rate. Due to the sensitivity of Heart Rate to
arousal we feel the same pattern of results that were found for Heart Rate might be found
for an observer riding in the jumpseat in the cockpit. No attempt is being made to
impugn the reputation of Heart Rate with regards to it's utility as a workload measure,
rather a word of caution is being advanced.
Heart Rate Variability (standard deviation of the Inter-beat Interval) was not able to
discriminate between the low and high workload conditions. Test/retest reliability was
good, but the utility of reliable measure which cannot discriminate between low and high
workload is questioned.
The Blood Pressure Component of the power spectral analysis could not discriminate
between the low and high workload conditions. A few of the test/retest reliability
correlations were significant, but of no real value since the measure did not demonstrate
discriminability.
96
Table 7.5-2
LflL
C :LO) LO LI
W _j Z ~-J c l
cr.
--- I O r qt C N -0 1 , L Y
LL 0
0~
.
0 -: I- .. .
z
ca 0
U) LO Uc
0 _
U))~)~
o 0
_c It-U-lI
E
a ?.
(n-a
LuU
-Y. 0 ,
CL 0 M i2C 0 ~
97
Table 7.5-3
-J
m N0 (o - m M(0
wJ
>
0
cn )
z a
20 n=
0 0- - -- - - - -
Cu < U
Ac 0 O
0Co 0=
q:
M~ x X1
E z
co
CL 0
'UU
0 0
CL = c
0c"1 C0 11
w 98
,6*
An anomalous finding for the Respiration Component of the power spectral analysis is
hard to reconcile. The a priori prediction for the respiration component is that as
workload goes up the power in the respiration portion of the spectrum should go down.
prn
The finding that the respiration component increases with increasing workload warrants
careful examination of the measure in the next simulation effort.
PERFORMANCE
The discriminability and reliability of the wheel (aileron) and column (elevator) control
input activity performance variables was very good. A number of aircraft state variables
were collected and analyzed for the Part-Task simulation included:
(a) RMS flight director error in the lateral and vertical direction,
(b) Throttle activity,
(c) Altitude error over the middle and inner marker (during the approach and
landing),
(d) Lateral deviation from runway centerline (during the approach and
landing),
(e) Localizer and glideslope deviation for the #1 and #2 receivers.
Only the control input activities demonstrated any validity by discriminating between the
low and high workload flights.
The lack of validity and reliability results for the Secondary Task does not necessarily
mean it is a poor index of workload, rather it points out the unsuitable nature of the
measure for assessing workload in commercial transport operations. In the
implementation of the Secondary Task in the Part-Task simulation a great deal of
secondary variance was generated for Response Time due to the fact that commercial
pilots rarely keep their hand on the PTT switch on the yoke. Therefore longer reaction
times were found for relatively low periods of workload because the piiot had to literally
reach for the PTT switch to complete a response. In a military aircraft, particularly those
designed with a HOTAS concept (hands on throttle and stick), the implementation of this
sort of secondary task would not be subject to the source of secondary variance found
in the Part-Task simulation.
7.5.2 DISCUSSION OF CORRELATION MATRIX
The subjective measures demonstrate a high inter-correlation, indicating that they are
influenced by the same changes in task demands.
The Blood Pressure component's high correlation to control input activity is what would
be expected ifboth measures are tapping some sort of physical component of workload.
A summary of the most notable findings from the correlation matrix include:
(a) The subjective ratings are significantly inter-correlated (r=0.97 being the
smallest correlation).
(b) The negative correlation of inter-beat interval and various indices of
physical workload (e.g., column & pedal control) is significant.
(c) A number of correlations approach significance between IBI and wheel
control inputs, SWAT, NASA-TLX, the 1-to-20 point overall workload score,
and the visual, manual left arid right from the TLA.
99
(d) Some findings are unexpected from the examination of the correlation
matrix. The respiration component of the Power Spectral analysis is
positively correlated with the subjective ratings and control inputs.
100
8.0 WORKSHOP TWO (SEATrLE, WA.)
8.1 OBJECTIVES
Workshop number two was conducted in order to present the results from the Part-Task
simulation, and obtain recommendations for improvements for the Full-Mission
simulation.
Participants were asked to comment on the fidelity of the simulation scenarios and the
applicability of the candidate workload measures to aircraft certification. The Full-
Mission simulation test plan was presented for review. Audience comments were
reviewed and revisions to the test plan were incorporated prior to the Full-Mission
simulation.
8.2 ATTENDEES
Eighty attendees were drawn from a wide cross-section of operational personnel and
potential workload measurement users. University scientists from the first workshop
were invited to attend to help assess the scientific quality of the test design, the results,
and the appropriateness of the conclusions drawn from the data. The remainder of the
attendees were from aerospace industry, government regulatory agencies, military
workload experts, and NASA.
8.4 RESULTS
Aside from presenting the results from the Part-task simulation, the most important
reason for holding Workshop Two was to gather comments and suggestions for
improvements on the simulation testing for the Full-mission simulation. The most
important topics are discussed below:
IMPLEMENTATION OF A WORKLOAD MEASURE
The feasibility of using a decision-tree based a workload scale, in addition to the other
subjective measures taken, was discussed. The possibility of using a "Modified Cooper-
Harper" or the "Bedford Scale " was addressed. The principle motivation was to use a
subjective rating technique with empirical evidence of validity and reliability. The 1-to-20
point overall workload scale used in the Part-Task simulation was an extension of the
NASA-TLX methodology, and did not possess an empirical record of the other subjective
rating methods.
101
IN-FLIGHT RATING
A number of suggestions were made regarding an In-Flight subjective measure to
provide opportunity for a comparison of In-Flight versus Post-Flight subjective
assessment.
The question of which rating scale should be employed in the simulation was discussed.
SWAT scores can be taken verbally using verbal probes was the most feasible.
Although it is feasible to implement any subjective rating verbally during flight, SWAT was
deemed the easiest due to the smallest rating scale (e.g., 3 point: low, medium, and
high). The Bedford and Modified Cooper-Harper scales both contain 10 point rating
scales, while the NASA-TLX contains 20 point rating scales.
The question of when the ratings would be taken In-Flight was also raised. If the ratings
were requested at the end of the measurement windows, and the measurement window
was specifically demarcated, then pilots might modify their flying performance, or in
some fashion affect the other workload measures being collected. Demarcation of the
measurement window had to be avoided otherwise the pilot might change his
performance (i.e., try harder) at specific times because he knows he is being measured.
It was suggested that measurements be taken at variable intervals, both during and
outside of measurement windows. The problem with "random" probing is the lack of
empirical comparison available due to data collected outside of the measurement
windows. It was decided to collect data at the end of the measurement window. The
problem with the "instantaneous" probing at the end of the window is that the event
rating would reflect only the workload at that particular moment and not the workload of
the whole measurement period (i.e., the measurement window). The final decision to
probe for an "instantaneous" rating was based on not wanting the In-Flight probe to
artifact the other workload measures being collected concurrently.
SECONDARY-TASK ADMINISTRATION
It was decided to eliminate the secondary task from the Full-mission simulation. The
design of the secondary task was discussed in length. Flight operations personnel
questioned the implementation of the secondary task in the flight test portion of
certification. It was determined, however, that the secondary task may be valuable even
if it can only be used in simulation. Even though it was handicapped during the part-task
testing by problems that were encountered, response time showed a trend for workload
discriminability.
Too many problems were encountered in the Part-task simulation with the secondary
task. The push-to-talk (PTT) switch used by the pilot to respond to the positive probes
blanked ATC with the switch closure. The feasibility of using a different switch was
discussed. Several other types of "response to a probe" task were suggested (e.g.,
squawking different transponder codes) for use as a secondary task measure. The
suggested tasks lacked the requirements of both positive and negative probes, and a
sufficient number of probes to gather a good base of data. Since data is collected on
the Captain only, the task must also be one that is normally performed by the Captain.
Other aircraft environments (e.g., tactical) can more readily adapt a secondary task
methodology. The reaction to ground threats using voice activated counter-measures
has shown a great deal of promise (Vidulich and Bortolussi, 1988).
102
USE OF AUTOPILOT
For the Full-mission simulation it was decided to dispatch the aircraft with the autopilot
INOP for all the flights in order to be able to interpret the control input activity measures.
In Part-Task simulation testing, the use of the autopilot in the low workload condition was
left to the pilot's discretion, and thus, its use became inconsistent across the subject
population. It was felt that tighter control on the use of autopilot (or preventing its use
altogether) in Full-Mission testing should be required.
INCAPACITATION
In order to manipulate the FAR 25.1523 Appendix D, factors of crew member unavailable
at crew station and incapacitation a feigned incapacitation on the part of the confederate
First Officer was discussed. Discussion centered around the possibility of a negative
reaction by the subject pilot. It might not be immediately obvious that the First Officer's
incapacitation was part of the simulation scenario, and the pilot might interrupt the
simulation to seek aid for the stricken crew member.
In order to selectively manipulate the Captain's task demands similar to the increased
loading of a First Officer's incapacitation it was decided that on one flight the Captain
would be required to tune and talk on the command radio.
WORKLOAD MEASURES TO BE USED IN THE FULL-MISSION SIMULATION
Because of inputs (discussed in the previous section) from the attendees at Workshop
Two the following measures were selected for the Full-Mission simulation tests to be
conducted at NASA-Ames:
SUBJECTIVE
SWAT (In-flight)
SWAT (Post-flight)
Bedford (Post-flight)
PHYSIOLOGICAL
Primary Task
Control Inputs
Wheel
Column
Pedals
Secondary Task
None
103
9.0 FULL-MISSION SIMULATION TESTING
Again the simulation was performed at the NASA-Ames Research Center, Moffett Field,
California. Air-Traffic Control was again simulated to ensure that the fidelity was as close
to real-world conditions as possible.
9.1 METHOD
Similar to the Part-task simulation tests, only Captains were used for the simulation.
Again, no attempt was made to address crew performance from a cockpit resource
management viewpoint.
9.1.1 SUBJECTS
PILOTS
Sixteen Airlinc Transport Pilots (ATP), (from American, United, TWA, and Eastern),
served as subjects in the experiment. Subjects were all male ranging in age from 44 to
58. Subjects were either currently FAR Part 121 qualified as Captain for the B-727 or had
spent 5 years of duty as Captain for the B-727.
FLIGHT CREW
Two confederates participated in the simulation study as the First Officer and Flight
Engineer, respectively. Preston Suit, Flight Crew Training at Boeing Commercial
Airplanes, served as the First Officer in the study. Preston also gave the briefing on
differences training and the routes to be flown in the study. Doranne VonEnde and
Hugh Campion, both qualified flight engineers, served as Flight Engineers in the study.
Both the First Officer and Flight Engineer were cognizant of the workload manipulations
a priori, and attempted to give each pilot similar treatment during the simulation.
9.1.2 EXPERIMENTAL DESIGN
The factors that drove the design included:
104
No effect was found for route flown during the Part-Task simulation. In the present study
there was no attempt made to counter-balance route (SFO-SCK, SMF-SFO, & LAX-SFO-
OAK-SMF) and workload conditions (Nominal, Communication, and Malfunction).
The SFO-SCK flight is a "Nominal" workload flight. The autopilot is MEL, as it is for all the
flights, but otherwise the conditions are ideal. An ILS approach and landing occurs at
SCK.
The SMF-SFO flight is the "Communication" flight. In order to simulate the duties of the
Pilot Flying (PF) in a two crew cockpit in which the PNF is occupied (or out of the cockpit
area) with other duties, the PF must tune and talk on the command radio from takeoff
through landing. Other than the "communication" manipulation the copilot and flight
engineer perform their normal flight duties. An ILS approach and landing occurs at SFO.
The LAX-(SFO)-(OAK)-SMF flight is the "Malfunction" flight. During the flight (between
windows 3 and 4) the crew receives a message (company ccntact using SELCAL) that
SFO has closed because of a power failure, and they are requested to divert to OAK.
The weather at OAK is marginal, going below minimums as ti ie crew approaches the
middle marker. At decision height the runway is not visible and a missed approach is
executed. Passing 2,000 feet on the missed approach the number one engine fails. One
minute after passing 6,800 feet, during the climb to 7,000 feet, the "A" hydraulic system
loses pressure and quantity until complete failure ensues. The crew continues the climb
to their enroute altitude of 7,000 feet for the remainder of the flight to the alternate, SMF.
The crew continues the flight to an ILS approach and landing in good weather at SMF.
A table is provided that contains a summary of the workload manipulations in order to
aid the reader in understanding the different task demands for the low and high workload
flights (Table 9.1.2.1-1).
PHASES OF FLIGHT
Seven phases of flight were examined in the simulation test.
(a) Takeoff
(b) Climb
(c) Top of Climb (TOC)
(d) Cruise
(e) Top of Descent (TOD)
(f) Approach
(g) Landing (or Missed Approach)
Eleven phases of flight were examined in the malfunction flight, the same seven as listed
above plus the four segments listed below.
105
Table 9.1.2.1-1
0 ) LE 0
0.
.u2c
CU 00)
E 4,
0
- )C)L
C/ )
co,
3 .. 0 t
0 0
75 -Q) C/D
CO0
a)
00 > 03
0co .c0)
L- 0)
70 L- L
-u > ELo C_ D
O000
oc
0)C) 0)c
C E-C
o E C
106 .
WINDOW OPENING EVENT CLOSING EVENT
Some changes were made to the length of the windows based upon the testing
experience from the Part-Task simulation. Many of the windows were made a common
length (2 minutes) to facilitate data reduction. The most notable change was the
shortening of the landing window. In the Part-Task simulation the Landing window (1
minute 30 seconds) often found the pilot "sitting" on the runway for 45 seconds to 1
minute, waiting for the measurement period to end. In the Full-Mission simulation the
landing window was shortened (30 seconds), and more accurately reflected only the
activities involved with landing the aircraft.
SIMULATION SCHEDULE
Again a test/retest method was used to evaluate the reliability of the workload measures.
The period between the two simulation test sessions was at least three months for every
subject tested, and much longer in many eases. Pilots flew three scenarios, a Nominal,
Communication, and Malfunction flight on both visits. In the present study no attempt
107
Figure 9.1.2.1-1
-zn
-4 -J
0~0
LL CL
0L wucc
0C*
uCl
0 LI
w -1
IrW CIý
U4I.
03
IL U
zU m
cc
z)tL LU~
WC
I
U)u
108I
Figure 9.1.2.1-2
CC 0 0
S-z_ F- _ z
aa.
0 4c
~jw
040
L~zz
_______00
w~ ~w 4 0
C3 X c 4
LU
a -a. CC .
LL 00
cc6d
0 11
>0
S-J
4>
(fl 0
LIL
~LU _ _ _ _ _ _
ww
logo
was made to counter-balance route and workload conditions (Nominal, Communication,
and Malfunction).
9.1.2.2 DEPENDENT VARIABLES
SUBJECTIVE RATING
SWAT (In-Flight and Post-Flight) and the Bedford rating scale (Figure 9.1.2.2-1) were
used in the Full-Mission simulation.
The In-Flight implementation of SWAT was accomplished by ATC probing the pilot for
event ratings at the end of a measurement period. An example of a probe is, 'rTWA 241
give us your TIME, EFFORT, and STRESS rating." Pilots were instructed to give event
ratings for the workload they were experiencing at the moment they were probed. When
comparing the In-Flight ratings of SWAT to other workload measures it should be kept in
mind that In-Flight SWAT ratings reflect an instantaneous assessment whereas the other
measures reflect workload for the entire measurement period.
The Post-Flight subjective ratings, SWAT and Bedford, were collected using the same
method as was employed in the Part-Task simulation. A video tape of the window was
played for the pilot after all the simulation runs for the session were computed. When the
measurement window was over the tape was stopped and the pilot made their ratings,
first SWAT and then Bedford. When the subject had completed their rating of a single
measurement window they were instructed to turn the page of their rating booklet, and
asked not to refer to previous ratings.
PHYSIOLOGICAL INSTRUMENTATION
The physiological workload measures collected were the same as those used in the
Part-task simulation: Eyeblink rate, Interbeat Interval, Standard Deviation of Interbeat
Interval, Power Spectral Analysis (Blood Pressure and Respiration Component).
PERFORMANCE DATA
The performance measures collected were the same as those used in the Part-task
simulation: control input activity for the wheel, column, and rudder pedals. The
secondary task was dropped from the Full-mission simulation. All performance data was
collected at a rate of 10 Hertz.
9.1.3 PROCEDURES AND EQUIPMENT
SIMULATOR1
The Man-Vehicle System Research Facility's B-727 simulator was again used as the
testbed for the following simulation study.
SCENARIO
There were three routes flown for the Full-Mission simulation. The routes were
Sacramento to San Francisco (SMF-SFO), San Francisco to Stockton (SFO-SCK), and
Los Angeles diverted from San Francisco to a missed approach at Oakland finally
landing in Sacramento (LAX-SFO-OAK-SMF). The two short hop flights, (SMF-SFO) &
(SFO-SCK) were flown at 11,000 feet enroute to the destination, for an ILS approach and
landing. The long flight (LAX-SFO-OAK-SMF) is flown at 31,000 feet, enroute to San
110
Figure 9.1.2.2-1
Ci
IR
0 70
-m .9
06~ ~ -
a,
CD,
E z z
UV)
u~ 0 Ci
Francisco there is a diversion to Oakland, where a missed approach is executed (owing
to weather), followed by a diversion to a landing at the alternate, Sacramento.
Subjects received the same order of routes for session one and two. The presentation
of flights was counter-balanced across pilots and is further discussed in the test order
section. Subjects receive the same order of routes during testing for session one and
two. The presentation of flights was counter-balanced across pilots.
Case-wise deletion still remains an artifact for the repeated measures ANOVA. Any pilot
that has missing data in any cell in the analysis is dropped from the ANOVA.
All significant F ratios are reported, and the results of the test for a main effect of
workload are reported whether there is a statistically significant finding or not.
The same approach will be taken in examining the discriminability of phase of flight by a
workload measure as was done in the Part-Task simulation. Utilizing a paired-
112
comparison approach the various phases of flight were compared to one another using
the Newman-Kuels range statistic.
9.1.4.2 VALIDITY DECISION CRITERIA
For an assessment technique to demonstrate validity it should find a main effect for
workload type. Similar to a main effect of workload, an interaction of workload and
phase of flight should be demonstrated to provide evidence of validity or a workload
measure.
Similar to the Part-Task simulation, no systematic attempt was made to describe the
nature of the Newman-Kuels range statistic due to complications arising from the large
number of comparisons. Our approach, simply stated, is the more significant
differences that were found the more discriminable the workload measure was thought
to be.
9.1.4.3 RELIABILITY ANALYSES
Test/retest was assessed using the same method as the Part-Task simulation. For a
given workload measure, each pilot's scores (session 1 and 2) are paired, then a
correlation is computed using all the pilot score's for each measurement window. The
Nominal and Communication flights will yield seven correlation coefficients each,
corresponding to the seven measurement windows. The Malfunction flight will yield
eleven correlation coefficients corresponding to the eleven measurement windows.
Similar to the Part-Task simulation, the correlation coefficients for test/retest reliability are
based on as many data points as are available. Case-wise deletion is not performed on
pilots with missing data.
Inter-rater reliability was assessed in the same fashion as the Part-Task simulation. For a
given workload measure, the 25 scores (7 windows from Nominal & Communication and
11 windows the Malfunction flight) for the pilot (average session 1 and 2 scores) are
paired with the group mean for the respective measurement windows to yield individual
correlation coefficients. Inter-rater reliability is then expressed as a percentage of the
pilots that show a significant correlation with the group mean.
9.1.4.4 RELIABILITY DECISION CRITERIA
Similar to the question of reliability criteria brought up in the Part-Task simulation, the
exact criterion to determine whether or not a workload measure is reliable can be subject
to much debate. When a large number of statistics are computed the Type I error rate is
inflated proportional to the number of coefficients computed. Again, a comparison-wise
Type I error rate is adopted when examining the correlation coefficients because of the
hypotheses advanced (namely that a measure ought to provide the same pattern of
results with repeated application).
In the present study it was decided that 20% of the coefficients, 5 out a possible 25,
correlations ought to be significant for the workload metric to be considered reliable.
There is nothing magical about the 20% figure, nor is any consideration given to the
phase of flight or workload conditions (Nominal, Communication, or Malfunction) in
which the significant correlations are found.
113
Another index of reliability is the correlation of each pilot's scores (average test/retest) to
the group mean. In order to feel confidence using this index at least half of the subjects
should correlate significantly with the group mean.
9.2 RESULTS
9.2.1 SUMMARY OF RESULTS
SWAT, the Bedford scale, Heart Rate (with qualification), and Control Inputs (Wheel,
Column, & Pedal) demonstrated evidence of validity by discriminating workload, as well
as finding a significant interaction of workload and phase of flight. Heart Rate did not find
a significant main effect of workload, although a strong trend did exist for being able to
discriminate among the levels of workload.
SWAT, the Bedford scale, Eyeblink, Heart Rate, the Blood Pressure Component of the
Power Spectral Analysis and Control Inputs (Wheel, Column, & Pedal) demonstrated
evidence of reliability by finding at least five, out of a possible 25, test/retest correlations
were significant. The above listed measures also demonstrated inter-rater reliability by at
least 50% of the pilots significantly correlated to the group means for the respective
workload conditions.
SWAT (In-Fligh
To yield the appropriate 0 to 100 scaling solution for the SWAT ratings, the prototype
solutions for the appropriate subjects. The Kendall's coefficient of concordance (0.7256)
for the individual's card sorts was less than the recommended .78 for using the group
scaling solution, so prototype solutions were used instead. Two primary dimensions
were found from the prototype analysis: Stress and Time. One subject was found to
have an Effort prototype, but was switched to the next greatest prototype tendency,
namely Stress. Of the 16 subjects in the analysis, 9 found Stress to be the most
important dimension in the card sort, while the remaining 7 found Time to be the most
important dimension.
SWAT (In-Flight) could not discriminate among the three workload flights, F < 1,
although a significant workload by phase of flight interaction was found, F(1 2,168) = 5.11,
(MSe= 192, .p< .01) (Figures 9.2.1.1-1 to 9.2.1.1-4 and Table 9.2.1.1-1).
A comparison of the Nominal-Malfunction flights found a significant interaction of
workload by phase of flight, _F(6,84)=6.49, (MSe=206, .p <.01). A comparison of the
Communication-Malfunction flights found a significant interaction of workload by phase
of flight, F(6,84) = 7.32, (MSe = 191, .p<.01).
A main effect for the phase of flight comparison was found for all three types of workload
flights, F(6,84)=25.59, (MSe=344, .p<.01). A oneway ANOVA found a main effect for
phase of flight significant for the Nominal workload flight, F(6,90) = 10.98, (MSe= 129,
.p <.01). A Newman-Kuels range statistic was computed to determine SWATs ability to
discriminate various phases of flight from one another in the Nominal workload flight, 9
114
Figure 9.2.1.1-1
In-Flight SWAT
Full Mission Simulation
Nominal Flight SFO - SCK
70 -
60 - Mean scores
50 -
T - 1 standard
deviation
SWAT 40 - T
scores 3
30 -- 1
01_
10
I I
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Figure 9.2.1.1-2
In-Flight SWAT
Full Mission Simulation
Communications Fligh4 SMF - SFO
70 -
60 -- Mean scores
50 - I - 1 standard
deviation
SWAT 40 -
scores
30
20
10 -
o .. •
T_.1 _._ __
Takeoff Climb Top of Cruise Top of Approach Landing
climb descenl
(TOC) (TOO)
Phase of flight
115
Figure 9.2.1.1-3
In-Flight SWAT
Full Mission Simulation
Malfunction Flight LAX - (SFO) - (OAK) - SMF
70 - X Mean scores
60 60
--- 1 1 deviation
standard
50 -
SWAT 40 -
scores
30 - .
20 ,
10
F
0
Takeoff Climb T-, Cruise Top of Approach Missed No. 1 'A" Approach Landing
o,•F descent approach engine system
(1OC) (TOD) failure hydraulic
failure
Phase of flight eF 1167.21
Figure 9.2.1.1-4
In-Flight SWAT
Full Mission Simulation
All Flights Mean scores
70 -- Nominal flight
A Communication flight
60 - X Malfunction flight
50 -
40 -
SWAT 30 -
scores
20 -
10 -
0
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 "A" Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight
F 1167 22 R6,s
116
Table 9.2.1.1-1
Table 9.2.1.1-2
SWAT (In-Flight)
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.50 0.44 0.68*
Climb 0.82* 0.36 0.74*
Top of climb 0.50 0.39 -0.07
Cruise 0.05 0.72* 0.40
Top of descent 0.52 0.78* 0.46
Approach 0.60 0.76* 0.48
Landing or 0.49 0.58 r(13) = 0.79*
missed approach
No. 1 engine failure r(13)= 0.40
"A"system hydraulics 0.60
failure Ir(14) = 0.623* I
Approach * Significant p < 0.01 0.53
Landing r(13) = 0.63 ..
117
out of 21 comparisons were significant. A oneway ANOVA found a main effect for phase
of flight significant for the Communication workload flight, _F(6,90)=9.89, (MSe=84,
`p <.01). The ability to discriminate phase of flight conditions for the Communication flight
found 6 out of 21 comparisons significant. A oneway ANOVA found a main effect for
phase of flight significant for the Malfunction workload flight, .F(10,130)=22.47,
(MSe=166, 9 <.01). The ability to discriminate phase of flight conditions for the
Malfunction flight found 30 out of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilots ratings for session one and
two for the separate measurement windows (Table 9.2.1.1-2). For the nominal flight
there was one significant correlation out of a possible seven. For the communication and
malfunction flights there were 3 out of 7, and 3 out of 11, significant correlations,
respectively.
In assessing inter-rater reliability, it was found that 94% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
SWAT (Post-Flight)
The same scaling solution was used for determining the 0-to-100 workload for the In-
Flight event ratings was used for the Post-Flight event ratings. The same, pre-flight, card
sort determined the prototype solutions for the two groups: Time and Stress.
SWAT (Post-Flight) found a main effect among the three workload flights, f(2,28)= 17.21,
(MSe =252, p <.01). In addition, a significant workload by phase of flight interaction was
found F(12,168)=11.52, (MSe=206, .p<.01) (Figures 9.2.1.1-5 to 9.2.1.1-8 and Table
9.2.1.1-3).
A comparison of the Nominal-Malfunction flights found a significant main effect of
workload and an interaction of workload by phase of flight, F_(1, 14) =26.50, (MSe=310,
4p<.01) and _F(6,84)=13.32, (MSe=271, .p<.01), respectively. A comparison of the
Communication-Malfunction flights found a significant main effect of workload and an
interaction of workload by phase of flight, _F(1,14)=13.13, (MSe=218, .P<.01) and
f(6,84) = 12.68, (MSe = 247, p < .01), respectively.
A main effect for phase of flight comparison was found for all three types of workload
flights, F(6,84)=25.81, (MSe=375, .p<.01). A oneway ANOVA found a main effect for
phase of flight significant for the Nominal workload flight, F(6,90)=11.52, (MSe=129,
.p<.01). A Newman-Kuels range statistic was computed to determine SWATs ability to
discriminate various phases of flight from one another in the Nominal workload flight 7
out of 21 comparisons were significant. A oneway ANOVA found a main effect for phase
of flight significant for the Communication workload flight, _F(6,90)=7.24, (MSe=121,
.p<.01). The ability to discriminate phase of flight conditions for the Communication flight
found 7 out of 21 comparisons significant. A oneway ANOVA found a main effect for
phase of flight was significant for the Malfunction workload flight, F(10,130)=28.32,
(MSe=192, p<.01). The ability to discriminate phase of flight conditions for the
Malfunction flight found 33 out of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.1-4). For the nominal flight
there were two significant correlations out of a possible seven. For the communication
and malfunction flights there were 3 out 7, and 5 out of 11, significant correlations,
respectively.
118
Figure 9.2.1.1-5
Post-Flight SWAT
Full Mission Simulation
Nominal Flight SFO - SCK
70 -
60 - 9 Mean scores
5 1 standard
50 -deviation
SWAT 40 -
scores
30
20 -
10
0 T
Figure 9.2.1.1-6
Post-Flight SWAT
Full Mission Simulation
Communications Fliqht SMF - SFO
70
Z Mean scores
60 -- T ±t 1 standard
50 -- deviation
SWAT 40
scores
30 -
20 -
10 -
119
Figure 9.2.1.1-7
Post-Flight SWAT
Full Mission Simulation
Malfunction Flight LAX - (SFO) - (OAK) - SMF
70 - X Mean scores
60 T ±1 standard
0 -deviation
50 -- -
SWAT 40 -
30 ,
20
10 -
0
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 'A" Approach Landing
climb descent approach engine system
(TOC) (TOD) failure hydraulic
lailure
Phase of flight FIIG.29 R7G
Figure 9.2.1.1-8
Post-Flight SWAT
Full Mission Simulation
Mean scores
All Flights * Nominal flight
70 -A Communication flight
X Malfunction flight
60
50
SWAT 40
scores 30 --
20 -
10 --
0
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 'A" Approach Landing
climb descenl or engine system
(TOC) (TOD) missed lailure hydraulic
approach lailure
P hase of flig ht F ,,s
,,1,6.30
120
Table 9.2.1.1-3
Table 9.2.1.1-4
SWAT (Post-Flight)
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.74* 0.44 0.42
Climb 0.59 0.82* 0.41
Top of climb 0.43 -0.05 0.71*
Cruise -0.08 0.31 0.66*
Top of descent 0.15 0.65* 0.29
Approach 0.39 0.60 0.66*
Landing or 0.81* 0.66* r(13) 0.75*
missed approach
No. 1 engine failure 0.44
"A" system hydraulics 0.53
failure r(14) = 0.623* I
Approach * Significant p < 0.01 0.55
Landing 0.69*
121
In assessing inter-rater reliability, it was found that 94% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
122
BEDFORD RATING
No transformation, from event rating to workload score, is necessary for the Bedford
rating. Pilots made ratings, from 1-to-10, low to high, with half ratings (i.e., 3 1/2)
allowed.
Bedford ratings found a main effect for workload, F(2,28)= 12.55, (MSe=I, .p<.01). In
addition, a significant workload flight by phase of flight interaction was found,
_F(12,168) = 10.57, (MSe =0.57, .p< .01) (Figures 9.2.1.1-9 to 9.2.1.1-12 and Table 9.2.1.1-
5).
A comparison of the Nominal-Communication flights found a significant main effect of
workload type, _F(1,15)=8.84, (MSe=0.75, .p<.01). A comparison of the Nominal-
Malfunction flights found a main effect of workload and an interaction of workload by
phase of flight, _F(1,14)=17.96, (MSe=1.4, p<.01) and F(6,64)=18.60, (MSe=0.5,
_p < .01). A comparison of the Communication-Malfunction flights found a significant main
effect of workload and an interaction of workload by phase of flight, F(1,14)=8.71,
(MSe =0.86, .p <.01) and _F(6,84) = 10.99, (MSe = 0.75, p <.01), respectively.
A main effect for phase of flight discrimination was found,._F(6,84) = 27.78, (MSe=0.98,
.p<.01). A oneway ANOVA found a main effect for phase of flight discrimination for the
Nominal workload flight, F(6,90)=15.82, (MSe=0.16, .p<.01). A Newman-Kuels range
statistic was computed to determine the ability of the Bedford ratings to discriminate
various phases of flight from one another in the Nominal workload flight, 11 out of 21
comparisons were significant. A oneway ANOVA found a main effect for phase of flight
discrimination for the Communication workload flight, F(6,90)=5.82, (MSe=0.37,
.p<.01). The ability to discriminate phase of flight conditions for the Communication
flight found 6 out of 21 comparisons significant. A oneway ANOVA found a main effect
for phase of flight discrimination for the Malfunction workload flight, F(10,130)=25.58,
(MSe=0.59, ,p<.01). The ability to discriminate phase of flight conditions for the
Malfunction flight found 35 out of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.1-6). For the Nominal flight
there were two significant correlation out of a possible seven. For the Communication
and Malfunction flights there were 1 out 7, and 1 out of 11, significant correlations,
respectively.
In assessing inter-rater reliability, it was found that 94% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
9.2.1.2 PHYSIOLOGICAL MEASURES
Eyeblink rate and heart rate, measured by inter-beat interval, both demonstrate evidence
of validity by discriminating between the different types of workload. Eyeblinks, heart
rate, and the blood pressure component of the power spectral analysis all demonstrate
evidence for test/retest and inter-rater reliability.
EYEBLINK
Using the same scoring protocol as the Part-Task simulation, a strip chart printout of the
vertical electro-occulogram analog signal was analyzed for eyeblinks. The number of
eyeblinks per minute was then computed to determine eyeblink rate.
123
Figure 9.2.1.1-9
Bedford Rating
Full Mission Simulation
Nominal Flight SFO - SCK
6 - Mean scores
I - 1 standard
5 - deviation
Bedford 4-
rating 3- T T T._T
10-O
0fI I I II
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Phase of flight ,G
F,1167.02
Figure 9.2.1.1-10
Bedford Rating
Full Mission Simulation
Communication Flight SMF - SFO
6 A Mean scores
I t 1 standard
5 - deviation
Bedford 4--
rating T
T T
0 I I II
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOG) (TOO)
Phase of flight F116703
124
Figure 9.2 1.1-11
Bedford Rating
Full Mission Simulation
Malfunction Flight LAX- (SFO) - (OAK) - SMF
x Mean scores
6 ± 1 standard
deviation
5
Bedford T
rating
33-
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 "A" Approach Landing
climb descent approach engine system
(TOC) (TOO) failure hydraulic
failure
Phase of flight al I t6701,,c
Figure 9.2.1.1-12
Bedford Rating
Full Mission Simulation
All Flights Mean scores
SNominal flight
6 A Communication flight
X Malfunction flight
5
Bedford 4
rating
1
0
F
I I I IIIIIII
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 'A" Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight F16,0o4 AG
125
Table 9.2.1 .1-5
Bedford Ratings
Full Mission Simulation Data
Means and Standard Deviations
Window Nominal Communication Malfunction
Mean SD Mean SD Mean SD
Takeoff 3.0 (0.85) 3.3 (1.18) 2.7 (0.76)
Climb 2.5 (0.58) 2.5 (0.59) 2.6 (0.62)
Top of climb 2.1 (0.56) 2.4 (0.80) 1.8 (0.59)
Cruise 1.9 (0.55) 2.4 (0.99) 3.0 (0.85)
Top oi descent 2.7 (056) 2.8 (0.60) 3.0 (0.99)
Approach 2.7 (0.62) 2.9 (0.72) 3.6 (0.83)
Landing or M/A 2.9 (0.72) 3.2 (0.74) 4.7 (1.10)
No. 1 engine failure 5.1 (1.65)
"A"hydraulic failure 4.6 (1.46)
Approach 3.6 (1.26)
Landing 3.6 (0.86)
H116706 R3CD
Table 9.2.1.1-6
Bedford Rating
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.45 0.64* 0.61
Climb 0.64* 0.51 0.31
Top of climb 0.40 0.01 0.18
Cruise 0.26 0.15 0.59
Top of descent 0.17 0.42 0.55
Approach 0.58 0.44 0.29
Landing or 0.66* 0.04 r(13)= 0.16
missed approach
No. 1 engine failure 0.63*
"A"system hydraulics 0.50
failure r(14) = 0.623* 0
Approach * Significant p < 0.01 0.62
Landing r(13)= 0.26
126
Blink rate found a strong trend for a main effect among the three workload flights,
_F(2,28) =5.59, (MSe=33, .p<.012). In addition, a workload by phase of flight interaction
was found, F(12,120) =4.59, (MSe=21,p <.01) (Figures 9.2.1.2-1 to 9.2.1.2-4 and Table
9.2.1.2-1).
A comparison of the Nominal-Malfunction flights found a interaction of workload by
phase of flight, _F(6,60)=7.10, (MSe= 19, go <.01). A comparison of the Communication-
Malfunction flights found a main effect of workload and an interaction of workload by
phase of flight, F(1,10) = 10.63, (MSe =34, p.< .01) and F(6,60) = 5.82, (MSe = 20, .p <.01),
respectively.
A main effect for phase of flight discrimination was found,- F(6,84)=5.21, (MSe=39,
.p<.01). A oneway ANOVA found a strong trend for a main effect for phase of flight
discrimination for the Communication workload flight, _F(6,60) = 2.44, (MSe = 11, p <.04).
There was no ability to discriminate phase of flight conditions for the Communication
flight. A oneway ANOVA found a main effect for phase of flight discrimination for the
Malfunction workload flight, F(10,100)=6.36, (MSe=22, .P<.01). The ability to
discriminate phase of flight conditions for the Malfunction flight found 11 out of 55
comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.2-2). For the Nominal flight
there were four significant correlations out of a possible seven. For the Communication
and Malfunction flights there were 4 out 7, and 6 out of 11, significant correlations,
respectively.
In assessing inter-rater reliability, it was found that 73% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
HEART RATE (INTER-BEAT INTERVAL)
Heart rate was calculated using the same method as the Part-Task simulation, the R-R
interbeat interval.
Inter-beat interval showed a strong trend for a main effect for discriminating among the
three workload flights, _F(2,28)=2.98, (MSe=9024, p<.07). In addition, a significant
workload by phase of flight interaction was found, _F(12,168)=4.03, (MSe=465, P<.01)
(Figures 9.2.1.2-5 to 9.2.1.2-8 and Table 9.2.1.2-3).
In addition, as was found for the Part-Task simulation, a strong trend for a main effect of
session (day 1 faster than day 2) F(1,14)=8.30, (MSe=56079, p <.01). The slowing of
the heart rate is thought to reflect a "learning effect" from the test to the retest portion of
the study. A comparison of the Nominal-Communication flights found a main effect of
session (day 1 faster than day 2), F(1,14) = 10.03, (MSe =37251, p <.01). Strong trends
for session main effects were found for separate comparisons of the Nominal-
Malfunction and Communication-Malfunction flights, F(1,14)=6.87, (MSe=38558,
p <.02) and F(1,14) = 7.88, (MSe = 37779, p <.01), respectively.
A comparison of the Nominal-Malfunction flights found a strong trend for a main effect of
workload and an interaction of workload by phase of flight, _F(1,14) =5.83, (MSe=6126,
p<.03) and F(6,84)=4.38, (MSe=527, p<.01), respectively. A comparison of the
Communication-Malfunction flights found a strong trend for a main effect of workload
and an interaction of workload by phase of flight, _F(1,14) = 4.16, (MSe =10609, p <.06)
and _F(6,84) = 6.34, (MSe =394, p <.01), respectively.
127
Figure 9.2.1.2-1
20
-
Eyeblinks/15 - -III 0 I
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Figure 9.2.1.2-2
Eyeblink Rate (Blinks per Minute)
Full Mission Simulation
Communications Flight SMF - SF0
30 - Mean scores
T1lstandard
25 bdeviation
20
Eyeblinks/ 15
mig
10 fit
5
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOO)
128
Figure 9.2.1.2-3
20 -
S15 -- •.
Eyeblinks/
min
10 -
"5
_ _ _ _ _ _
0 _ _ _T _ _ _ _ __"_
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 "A' Approach Landing
climb descenl approach engine syslem
(TOC) (TOD) failure hydraulic
Figure 9.2.1.2-4
25 -A Communication flight
X Malfunction flight
20
Eyeblinks/ 15
min
0
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 'A' Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight F675
129
Table 9.2.1.2-1
Eyeblink Rate (Blinks per Minute)
Full Mission Simulation Data
Means and Standard Deviations
Window Nominal Communication Malfunction
Mean SD Mean SD Mean SD
Takeoff 13.6 (6.0) 16.3 (7.3) 16.4 (7.7)
Climb 12.2 (10.2) 14.3 (9.2) 10.4 (7.2)
Top of climb 13.4 (9.8) 12.3 (6.4) 13.2 (8.4)
Cruise 14.3 (8.2) 16.3 (9.3) 15.6 (7.3)
Top of descent 11.8 (7.9) 13.8 (8.0) 14.2 (7.7)
Approach 12.2 (7.4) 12.6 (7.4) 8.3 (5.9)
Landing or M/A 13.4 (5.8) 14.4 (9.0) 6.6 (4 9)
No. 1 engine failure 11.2 (5.3)
"A"hydraulic failure 18.0 (10.6)
Approach 14.4 (10.3)
Landing 15.5 (12.6)
H116?07 R4db
Table 9.2.1.2-2
Eyeblink Rate
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.74* 0.82* 0.78*
Climb 0.73 0.62 0.96*
Top of climb 0.91" 0.71 0.77*
Cruise 0.94* 0.82* 0.62
Top of descent 0.88* 0.90* 0.78*
Approach 0.01 0.83* 0.94*
Landing or 0.31 0.59 0.62
missed approach
No. 1 engine failure 0.52
"A"system hydraulics 0.77*
failure
Approach r(9) .735* 0.41
Landing *Signiiicant p .01 0.29 ,- ,
130
Figure 9.2.1.2-5
Inter-Beat Interval (Msec)
Full Mission Simulation
Nominal Flight SFO - SCK
760 -
740 -- . 0 Mean scores
620 -
600
0; "
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Figure 9.2.1.2-6
131
Figure 9.2.1.2-7
700
Inter-beat 680 -
interval
(Msec) 660 -
640 -
620 -
600
0a' I i I 1 I I I I I
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 "A" Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight 1167.38
3
Figure 9.2.1.2-8
132
Table 9.2.1.2-3
HU167 08 H4dO
Table 9.2.1.2-4
"133
A main effect for phase of flight discrimination was found,_F(6,84) = 25.04, (MSe = 2889,
p<.01). A oneway ANOVA found a main effect for phase of flight discrimination for the
Nominal workload flight, _F(6,84)=17.09, (MSe=633, .1<.01). A Newman-Kuels range
statistic was computed to determine SWATs ability to discriminate various phases of
flight from one another in the Nominal workload flight, 6 out of 21 comparisons were
significant. A oneway ANOVA found a main effect for phase of flight discrimination for
the Communication workload flight, F(6,84)=21.36, (MSe=504, .P<.01). The ability to
discriminate phase of flight conditions for the Communication flight found 11 out of 21
comparisons significant. A oneway ANOVA found a main effect for phase of flight
discrimination for the Malfunction workload flight, F(10,140) = 21.79, (MSe = 725, .P<.01).
The ability to discriminate phase of flight conditions for the Malfunction flight found 23 out
of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.2-4). Seven, out of seven
possible, correlations were found to be significant for the Nominal and Communication
flights. Eleven, out of the eleven possible, correlations were found to be significant for
the Malfunction flight.
In assessing inter-rater reliability, it was found that 87% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
HEART RATE VARIABILITY (IBI STANDARD DEVIATION)
Heart rate variability is the standard deviation of the R-R interbeat interval for a given
measurement period.
The standard deviation of IBI found no main effect among the three workload flights,
F(2,28) = 1.10, ns (Figures 9.2.1.2-9 to 9.2.1.2-12 and Table 9.2.1.2-5).
A significant session by phase of flight interaction was found, F(6,84)=8.64, (MSe= 159,
p <.01), indicating instability of the measure's sensitivity to different task demands over
time. A comparison of the Nominal-Communication flights found an interaction of
session by phase of flight, F(6,84)=5.94, (MSe=130, •p<.01). A comparison of the
Nominal-Malfunction flights found an interaction of session by phase of flight,
_F(6,84)=7.74, (MSe=143, p<.01). A comparison of the Communication-Malfunction
flights found an interaction of session by phase of flight, _F(6,84)=6.81, (MSe=140,
p <.01), respectively.
A main effect for phase of flight discrimination was found, F(6,84)=3.76, (MSe=274,
.p< .01). A oneway ANOVA found a main effect for phase of flight discrimination for the
Communication workload flight, _F(6,84)=3.92, (MSe=82, .p<.04). The ability to
discriminate phase of flight conditions for the Communication flight found 1 out of 21
comparisons significant. A oneway ANOVA found a main effect for phase of flight
discrimination for the Malfunction workload flight, F(10,140) =3.45, (MSe = 114, .P<.01).
The ability to discriminate phase of flight conditions for the Malfunction flight found 3 out
of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.2-6). No significant correlation
were found for the Nominal or Communication flights. For the Malfunction flight there
was 1 significant correlation out of a possible 11.
In assessing inter-rater reliability, it was found that 33% of the subjects scores were
134
Figure 9.2.1.2-9
Inter-Beat Interval
Standard Deviation (Msec)
Full Mission Simulation
60 .... Nominal Flight SFO - SCKl
* Mean scores
55
1 1 standard
50 - deviation
Inter-beat 45 -
interval
(Msec) 40 -
35 -
30
0T I I
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Phase of flight H,,1675 R5(
Figure 9.2.1.2-10
Inter-Beat Interval
Standard Deviation (Msec)
Full Mission Simulation
Communication Flight SMF - SFO
A Mean scores
55 -±1 standard
50 -deviation
Inter-beat 45 -
interval
(Msec) 40 -
35
30
135
Figure 9.2.1.2-11
Inter-Beat Interval
Standard Deviation (Msec)
Full Mission Simulation
Malfunction Flight LAX - (SFO) - (OAK) - SMF
60 .
55
50 - _/
Inter-beat 45 -
interval
(Msec) 40 -
35 -
30 --
30 0 II I I
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 "A" Approach Landing
X Mean scores climb descent approach engine system
(TOC) (TOD) failure hydraulic
t 1 standard failure
deviation Phase of flight H"16.32
Figure 9.2.1.2-12
Inter-Beat Interval
Standard Deviation (Msec)
Full Mission Simulation
All Flights
60
55 -
50
Inter-beat 45 ,
interval
(Msec) 40 - Mean scores
- 0 Nominal flight
35 -- - A Communication flight
- X Malfunction flight
30 --
30
01 I I II II II
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 "A" Approach Landing
climb descenl or engine syslem
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight H11G733 ,4,b
136
Table 9.2.1.2-5
Inter-Beat Interval
Standard Deviation (Msec)
Full Mission Simulation Data
Means and Standard Deviations
Window Nominal Communication Malfunction
Mean SD Mean SD Mean SD
Takeoff 41.93 (14.73) 40.27 (18.07) 39.03 (13.91)
Climb 47.87 (17.52) 44.70 (15.52) 46.93 (19.50)
Top of climb 48.87 (15.77) 48.60 (15.13) 45.40 (13.80)
Cruise 50.70 (18.41) 51.90 (18.28) 49.47 (20.32)
Top of descent 45.73 (12.42) 49.03 (14.58) 45.27 (16.63)
Approach 51.53 (17.99) 46.90 (16.64) 44.47 (15.50)
Landing or M/A 50.63 (17.18) 54.33 (21.94) 45.83 (13.07)
No. 1 engine failure 53.53 (15.79)
"A"hydraulic failure 54.13 (23.71)
Approach 49.33 (15.45)
Landing 56.80 (16.65)
H116709 R4h
Table 9.2.1.2-6
Inter-Beat Interval
Standard Deviation (Msec)
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flighl
Takeoff 0.01 0.46 0.51
Climb -0.04 0.32 0.31
Top of climb 0.21 0.28 0.21
Cruise 0.15 0.48 0.69*
Top of descent -0.07 0.27 0.62
Approach 0.25 0.17 0.51
Landing or 0.17 0.38 0.15
missed approach
No. 1 engine failure 0.35
"A"system hydraulics 0.60
failure
Approach 0.58
Landing r(13)=.641 - -0.09
Significant p .0
"137
significantly correlated with means for the 25 measurement windows.
POWER SPECTRAL ANALYSIS
Power spectral analyses were computed for the blood pressure (0.05 to 0.15 Hz) and
respiration (0.20 to 0.40 Hz) components using a fast fourier transform of the inter-beat
interval information.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.2-8). No significant correlations
were found for the Nominal flight. ;-or the Communication and Malfunction flights there
were 2, out of a possible 7, and 3, out of a possible 11, significant correlations.
In assessing inter-rater reliability, it was found that 80% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
RESPIRATION COMPONENT
The respiration component found no main effect among the three workload flights, F < 1
(Figures 9.2.1.2-17 to 9.2.1.2-20 and Table 9.2.1.2-9).
A main effect for phase of flight discrimination was found,_F(6,84) = 6.81, (MSe=0.07,
.p<.01). A oneway ANOVA found a strong trend for a main effect for phase of flight
discrimination for the Nominal workload flight, F(6,84)=2.51, (MSe=0.036, .P1<.03). A
oneway ANOVA found a main effect for phase of flight discrimination for the
Communication workload flight, F(6,84)=3.03, (MSe=0.06,.P<.01). A oneway ANOVA
found a main effect for phase of flight discrimination for the Malfunction workload flight,
F(10,140)=2.99, (MSe=0.01, p<.01). Although the separate oneway ANOVAs found a
138
Figure 9.2.1.2-13
Figure 9.2.1.2-14
139
Figure 9.2.!.2-15
5
o L- +__
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 'A" Approach Landing
climb descent approach engine system
(TOC) (TOD) failure hydraulic
Figure 9.2.1.2-16
140
Table 9.2.1.2-7
Table 9.2.1.2-8
Power Spectral Analysis
(Blood Pressure Component)
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.54 0.93* 0.96*
Climb 0.28 0.18 0.61
Top of climb 0.59 0.16 0.44
Cruise 0.51 0.38 0.86*
Top of descent -0.18 0.59 0.48
Approach 0.17 0.66* 0.21
Landing or 0.32 0.50 0.19
missed approach
No. 1 engine failure 0.53
"A"system hydraulics 0.69*
failure
Approach 0.43
Landing [r(131--.641 0.48
*Significant p1.014
141
Figure 9.2.1.2-17
Figure 9.2.1.2-18
12-
10 - Mean scores
Power T :±1 standard
spectral 8 - deviation
analysis
respiration 6 -
component
(Msec) 2 4
2-
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOO)
Phase of flight FI167.4G
R&I
"142
Figure 9.2.1.2-19
12
10 - X Mean scores
Power T ±1 standard
spectral 8 -- / deviation
analysis 6
respiration
component 4T
(MseC) 2
2
Takeoff Climb Top of Cruise Top of Approach Missed No.1 *A" Approach Landing
climb descenl approach engine system
(TOC) (TOD) failure hydraulic
failure
Phase of flight F,117.40 ,W
Figure 9.2.1.2-20
1 Nominal flight
Power 1A Communication flight
spectral 10 - X Malfunction flight
analysis
respiration 8 -
component
(Msec)2 6 -
2
0 F Takeoff Climb Top of Cruise Top of Approach Landing No. 1 "A" Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
143
Table 9.2.1.2-9
M1167 11 R460
Table 9.2.1.2-10
144
significant main effect for phase of flight discrimination, the Newman-Kuels range statistic
could not discriminate any of the phases of flight from one another within the individual
flights.
Test-retest reliability was computed by comparing the pilot's rati, igs for session one and
two for the separate measurement windows (Table 9.2.1.2-10). No significant
correlations were found for the Nominal, Communication, or Malfunction flight.
In assessing inter-rater reliability, it was found that 47% of the subjects scores were
significantly correlated with means for the 25 measurement windows.
9.2.1.3 PERFORMANCE MEASURES
Control input activity was measured by calculating flight control position changes over
time. The position of the flight controls was collected at a rate of 10 Hz. If the position of
a flight control (i.e., wheel, column, or pedals) moved more than 2.5% of the total throw
available then the input activity counter was incriminated. Each control input activity
index is computed over time, so that control activity is expressed in inputs per minute.
Following normal convention, wheel input activity controls roll, column input activity
controls pitch, and pedal input activity controls yaw.
145
Figure 9.2.1.3-1
20 T T-
10
0 t +
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOD)
Phase of flight F,,67l, R7G
Figure 9.2.1.3-2
60 - A Mean scores
±1 deviation
standard
contrl
Wheel 50 --
40
control
inputs/ 4
min 30
20
10
0 I
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOO)
Phase of flight
F1167 16 RiG
146
Figure 9.2.1.3-3
20 -
10
0
Takeoff Climb Top of Cruise Top of Approach Missed No. I "A- Approach Landing
climb descent approach engine system
(TOC) (TOD) failure hydraulic
Figure 9.2.1.3-4
Wheel (Aileron) Control Inputs (per Minute)
Full Mission Simulation
All Flights
Mean scores
70 -- Nominal flight
A Communication flight
60 - X Malfunction flight
Wheel 50
control
inputs/ 40
min 30 /
20 -
10
0
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 "A" Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight ri1167,s R,
147
Table 9.2.1.3-1
11111.7 12 FI.oD
Table 9.2.1.3-2
148
of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.3-2). For the Nominal flight
there was three significant correlation out of a possible seven. For the Communication
and Malfunction flights there were 2 out 7, and 5 out of 11, significant correlations,
respectively.
In assessing inter-rater reliability, it was found that all of the subjects scores (100%) were
significantly correlated with means for the 25 measurement windows.
COLUMN (ELEVATOR) CONTROL INPUTS
Column control input activity found a main effect among the three workload flights,
F(2,30)=20.17, (MSe=27, .p<.01). A workload by phase of flight interaction was found
as well, F(12,180)= 18.88, (MSe=27, .p<.01) (Figures 9.2.1.3-5 to 9.2.1.3-8 and Table
9.2.1.3-3).
A comparison of the Nominal-Communication flights found a main effect of workload and
an interaction of workload by phase of flight, F(1,15)=27.60, (MSe=24, .p<.01) and
_F(6,90)=23.27, (MSe=26, p<.01), respectively. A comparison of the Nominal-
Malfunction flights found a significant main effect of workload and an interaction of
workload by phase of flight, F(1,15)=33.39, (MSe=29, .p<.01) and F(6,90)=28.04,
(MSe=31,.p<.01), respectively. A comparison of the Communication-Malfunction flights
found a strong trend for an interaction of workload by phase of flight, F(6,90) = 2.58,
(MSe = 24, .p< .01).
A main effect for phase of flight discrimination was found, F(6,90)= 169.83, (MSe= 103,
p< .01). A oneway ANOVA found a main effect for phase of flight discrimination for the
Nominal workload flight, F(6,90)=209.36, (MSe=24, .p<.01). A Newman-Kuels range
statistic was computed to determine SWATs ability to discriminate various phases of
flight from one another in the Nominal workload flight, 10 out of 21 comparisons were
significant. A oneway ANOVA found a main effect for phase of flight discrimination for
the Communication workload flight, F(6,90)=99.34, (MSe=22, Pp<.01). The ability to
discriminate phase of flight conditions for the Communication flight found 10 out of 21
comparisons significant. A oneway ANOVA found a main effect for phase of flight
discrimination for the Malfunction workload flight, _F(10,150)=79.92, (MSe=45, .P<.01).
The ability to discriminate phase of flight conditions for the Malfunction flight found 22 out
of 55 comparisorns significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.3-4). For the Nominal and
Communication flights none of the measurement windows showed a significant
correlation. For the Malfunction flights there were 3, out of 11 possible, significant
correlations.
In assessing inter-rater reliability, it was found that all of the subjects scores (100%) were
significantly correlated with means for the 25 measurement windows.
PEDAL (RUDDER) CONTROL INPUTS
Pedal control input activity found a main effect among the three workload flights,
F(2,30) = 26.43, (MSe = 15, p < .01). A workload by phase of flight interaction was found
149
Figure 9.2.1.3-5
Column (Elevator) Control Inputs (per Minute)
Full Mission Simulation
Nominal Flight SFO - SCK
70 -- Mean scores
60 I- I -1 deviation
standard
Column 50 -
control 40 -
inputs/
min 30
20
10
1 T
-
Takeoff Climb Top of Cruise Top of Approach Landing
climb descenl
(TOC) (TOO)
Phase of flight
F 1167 06 R$s
Figure 9.2.1.3-6
20 -
10
0
Takeoff Climb Top of Cruise Top of Approach Landing
climb desceni
(TOC) (TOD)
Phase of flight
F1167 07 15G
150
Figure 9.2.1.3-7
70
60 - X Mean scores
I± 1 standard
Column 50deviation
control 40 -
inputs/
rain 30 --
20 -
a10
0
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 "A" Approach Landing
climb descent approach engine system
(TOC) (TOD) failure hydraulic
failure
Phase of flight failure F ,
Figure 9.2.1.3-8
70 -0 Nominal flight
A Communication flight
60 - X Malfunction flight
50 --
Column
control 40 -
inputs/
min 30 -
20 --
10 6I I ,A
0
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 'A' Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight I,1670D5
R,,
151
Table 9.2.1.3-3
Column (Elevator) Control Input
Full Mission Simulation Data
Means and Standard Deviations
Window Nominal Communication Malfunction
Mean SD Mean SD Mean SD
Takeoff 8.1 (4.4) 7.9 (3.5) 6.7 (2.9)
Climb 2.3 (2.7) 2.0 (1.7) 1.9 (2.1)
Top of climb 3.5 (3.0) 2.4 (2.0) 0.8 (1.0)
Cruise 0.7 (1.0) 1.2 (1.6) 0.3 (0.4)
Top of descent 0.8 (0.6) 1.0 (1.1) 0.3 (0.4)
Approach 4.7 (3.8) 4.8 (5.4) 8.4 (6.6)
Landing or M/A 50.0 (12.1) 33.6 (11.3) 31.0 (14.1)
No. 1 engine failure 10.0 (5.3)
"A"hydraulic failure 3.0 (2.0)
Approach 8.1 (6.3)
Landing 47.7 (17.4)
H1167 13 ri3CO
Table 9.2.1.3-4
Column (Elevator) Control Inputs
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.54 0.53 0.41
Climb 0.39 0.28 0.75*
Top of climb 0.33 0.03 0.32
Cruise -0.19 0.03 -0.31
Top of descent -0.11 0.17 0.66*
Approach 0.28 0.52 0.72*
Landing or 0.01 -0.52 0.43
missed approach
No. 1 engine failure 0.55
"A" system hydraulics 0.38
failure r(14) - 0.623*
Approach * Significant p K 0.01 0.44
Landing 0.31 '"
152
as well, F(12,180) = 12.59, (MSe= 17, p<.01) (Figures 9.2.1.3-9 to 9.2.1.3-12 and Table
9.2.1.3-5).
A comparison of the Nominal-Communication flights found a strong trends for both a
main effect of workload and an interaction of workload by phase of flight, F(1,15) =7.26,
(MSe = 12, .p <.01) and _f(6,90) = 2.84, (MSe = 17, .p< .01), respectively. A comparison of
the Nominal-Malfunction flights found a significant main effect of workload and an
interaction of workload by phase of flight, F(1,15)=41.13, (MSe=19, p <.01) and
F(6,90)= 16.46, (MSe=22, p <.01), respectively. A comparison of the Communication-
Malfunction flights found a for main effect of workload and an interaction of workload by
phase of flight, , F(1,15)=23.70, (MSe=14, .p<.01) and F(6,90)=19.07, (MSe=12,
p < .01), respectively.
A main effect for phase of flight discrimination was found,..F(6,90) =37.96, (MSe= 106,
p < .01). A oneway ANOVA found a main effect for phase of flight discrimination for the
Nominal workload flight, F(6,90)=30.63, (MSe=36, p<.01). A Newman-Kuels range
"statistic was computed to determine SWATs ability to discriminate various phases of
flight from one another in the Nominal workload flight, 11 out of 21 comparisons were
significant. A oneway ANOVA found a main effect for phase of flight discrimination for
the Communication workload flight, F(6,90)=32.19, (MSe=22, .p<.01). The ability to
discriminate phase of flight conditions for the Communication flight found 10 out of 21
comparisons significant. A oneway ANOVA found a main effect for phase of flight
discrimination for the Malfunction workload flight, F(10,150) = 13.93, (MSe=27, Dp<.01).
The ability to discriminate phase of flight conditions for the Malfunction flight found 18 out
of 55 comparisons significant.
Test-retest reliability was computed by comparing the pilot's ratings for session one and
two for the separate measurement windows (Table 9.2.1.3-6). For the Nominal there
were two significant correlations out of seven. For the Communication flight none of the
measurement windows showed a significant correlation. For the Malfunction flights there
were 3, out of 11 possible, significant correlations.
In assessing inter-rater reliability, it was found that all of the subjects scores (100%) were
significantly correlated with means for the 25 measurement windows.
Another Principal Component Analysis (PCA) was performed on the data from the Full-
Mission simulation (Table 9.4-1). The interpretation of the factor loadings in any sort of a
factor analysis should be done with caution.
153
Figure 9.2.1.3-9
60 - Mean scores
Pedal 50 T t 1 standard
control _L deviation
inputs/
min 30 - TT
2 -
10 --
Figure 9.2.1.3-10
20 -
10 - N I I V
Takeoff Climb Top of Cruise Top of Approach Landing
climb descent
(TOC) (TOO)
Phase of flight F,,67.24 RG
154
Figure 9.2.1.3-11
20 -
10
0 A A A
Takeoff Climb Top of Cruise Top of Approach Missed No. 1 'A" Approach Landing
climb descent approach engine system
(TOC) (TOD) failure hydraulic
Figure 9.2.1.3-12
010L
Takeoff Climb Top of Cruise Top of Approach Landing No. 1 'A" Approach Landing
climb descent or engine system
(TOC) (TOD) missed failure hydraulic
approach failure
Phase of flight
f 116726 R4rs
155
Table 9.2.1.3-5
Pedal (Rudder) Control Input
Full Mission Simulation Data
Means and Standard Deviations
Window Nominal Communication Malfunction
Mean SD Mean SD Mean SD
Takeoff 19.70 (10.2) 15.20 (8.1) 13.50 (8.1)
Climb 0.10 (0.3) 0.00 (0.0) 0.00 (0.0)
Top of climb 0.03 (0.1) 0.00 (0.0) 0.00 (0.0)
Cruise 0.00 (0.0) 0.00 (0.0) 0.00 (0.0)
Top of descent 0.00 (0.0) 0.03 (0.1) 0.03 (0.1)
Approach 0.10 (0.4) 0.20 (0.2) 0.10 (0.4)
Landing or M/A 13.70 (12.9) 11.90 (10.2) 1.60 (5.0)
No. 1 engine failure 2.20 (2.6)
"A"hydraulic failure 0.05 (0.1)
Approach 0.80 (1.7)
Landing 11.30 (16.0)
Table 9.2.1.3-6
Pedal (Rudder) Control Inputs
Full Mission Simulation
Test-Retest
Reliability Correlations
Window Nominal flight Communication flight Malfunction flight
Takeoff 0.57 0.42 0.59
Climb 0 0 0
Top of climb 0 0 0
Cruise 0 0 0
Top of descent 0 0 0
Approach 0.73* 0 1.0*
Landing or 0.66* -0.26 0.77*
missed approach
No. 1 engine faikure 0.84*
"A"system hydraulics -0.10
failure
Approach r(14) 0.623* 0.57
Landing Significant p 77 .0.01
156
Table 9.3-1
cc0 a
cr 00
CjL
0C~LO
ClZ
CYu
c: LO
0 :U C 0CO)c
r L;
00E~ c 'J
CM~~-
O0 r00
C
SQ
.CO OE CR*-- C q 7-
.
C.1. Cc
a)
-
0
a)0
a) >CD- c
00U - 'o CIO< m c
-L C5 C_C',
~'a"COCJ.LACC\0 0.o
:3
c
0) 0 O0) C = < 0E
.-
c ) t
Q rl- 0 .- -0 cc
qcl ORC\! ix 0 -t: 5 u- > c L
157 o C
The above factor loading matrix has been rearranged so that the columns
appear indecreasing order of variance explained by the factors. The rows
have been rearranged so that for each successive factor, loadings greater
than 0.500 appear first. Loadings less than 0.25 have been blanked.
H 1167.
158
Our interpretation of the common underlying factors can be summarized as follows:
(a) Measures thought to reflect physical workload load highest on Factor 1 in
the PCA,
(b) Measures reflecting mental workload load highest on Factor 2. Factors 3
and 4 do not account for much of the variance in the PCA,
(c) It should be noted that Wheel and Column control inputs load on Factor 3
in addition to Heart Rate Variability.
9.5 DISCUSSION
In addition to the criteria of validity and reliability, this contract effort has used replication
in order to further give confidence to the interpretation of the results. The Full-Mission
simulation utilized the lessons learned from the Part-Task simulation in order to once
again subject the candidate workload measures to rigorous empirical scrutiny.
As was done with the Part-Task simulation, summaries of the results for the workload
measures are provided. The first table summarizes the empirical findings of validity and
reliability (Table 9.5-1). The second table presents the rank ordering of the phases of
flight for each workload measure (Table 9.5-2).
9.5.1 DISCUSSION OF VALIDITY AND RELIABILITY RESULTS
To summarize the results of two comprehensive simulation studies in a few pages is
difficult. Volume I was written with the intent of providing as much of the empirical results
from the simulation studies as possible. It is an overwhelming task to attempt to
integrate the results in a manner that would state a clear "winner" as to which is the best
workload measure available. Again, we suggest that the reader examine the results for
the studies one workload measure at a time. The studies were designed in such a
fashion that the measures could be considered as if it were the only dependent variable
in the simulation effort. In this fashion a reader can evaluate a given workload measure
for validity and reliability, and determine if the results were replicated from the Part-Task
to Full-Mission simulation.
SUBJECTIVE
The Full-Mission simulation examined SWAT both In-Flight and Post-Flight to determine if
there was a difference for probe timing. Although no main effect of probe timing was
found (In-Flight versus Post-Flight) a significant interaction of probe timing and workload
flight (Nominal, Communication, and Malfunction) was found. The overall look to the
probe timing difference is that In-Flight ratings tend to be lower than Post-Flight ratings
until there are periods of high workload (malfunctions for example) then the trend
reverses. The reader is directed to a paper by Corwin (1989) for a more detailed
discussion of the In-Flight versus Post-Flight results from the Full-Mission simulation.
Both SWAT (In-Flight and Post-Flight) and the Bedford rating scale demonstrated
evidence of validity and reliability. The Bedford scale was the first time a single, uni-
dimensional, scale had been used in isolation in either of the simulation studies. (The 1-
to-20 point Overall Workload Scale was used in conjunction with the NASA-Task Load
Index in the Part-Task simulation.)
The Bedford rating scale, similar to the Modified Cooper-Harper or McDonnell Workload
scale, requires no data reduction as do SWAT or the NASA-TLX. The elimination of the
overhead associated with the "customization" of either SWAT or the NASA-TLX makes
159
Table 9.5-1
< ) U) U) U) o0 0o
-L0 z__8m
z 89>-Z K8w
ccU 0 W CJ W CO W
uj 0)
a ' 0
U) OD O
o0 Z <
0 ) 0U > cc a
CC Z L 0 0 *'t (0) < o.
) X) WC) wOCW
a. wn
cn w m Z 3:
cc0F 0c 0-0 8
LU~~L 3 ~ )Wf " ~ w = a_ CL
U) U) >0L U
hU
U)W 11-ouxx
ILJ
<
U) U) CO U *)
0 0 0 c
U) nU U) P:
CI 80- F
4, LOUZ L(L3 wz w .. (
zU <
0e~,jLLL
UU))c
Q'D QL Q n D~i 4k
- (1
0 ~ 0*- 0F- O- 0-
> U) C60
Table 9.5-2
4 UO L LO LO CD-
0 fl- I~ 0oQ -NNCiI )c N 0o v .t c aD))L r-
w ":-~w w
0. C. (L
z z z
"D1 -I LOr DMI-N Or .01ol or
0 0 0
w LOU w
W ýT (D CO-C\J LO r- LU Lf W\ oC tr- 0Dr ML
z z z
0 0 0
Co Lo LOLOLO V
w w w
0 i0 ui C
0 0~ 00C) L
U) U)t
J -r olr -- )(DL
UUCj' 1 Cj 't- U)
o M - (L - '1 .'J
m It CD -Lr Or 0-
LO U0 LO U) _J L) LOU) t
O _LL >-cr
0OwL zw
0 M~
_5 0 O LO- 0 - U' LO c 0 LO L0 4 a 0 0
0.
E 0
' w~(
-C Z CI
o~C Co)co
- - - - )- - - - -
0D
CDc CL
U-C)
U0 .0 '10 H. U-cl0~ I0
O ~ 0 E) 0 EL
w wi w Cu
4: D O.uO'Oi55 (no 37
z. ~ EE E 5"l
0. (cz oO- 0.~O-
m 0.
CL 0 0 -- Ou- <--:
161
the interpretation of the data a much more straight forward exercise. The dividend of the
subjective techniques based on multiple bipolar ratings (SWAT & the NASA-TLX) is the
ability to examine the underlying causes of workload fluctuation by examining the
individual bipolar ratings. The examination of the underlying influences on workload may
or may not be better addressed in a aircraft certification effort by the Pilot Subjective
Evaluation (PSE) technique (Fadden, 1982; Ruggiero and Fadden, 1987). No empirical
evidence exists comparing PSE with SWAT or the NASA-TI X. Yet, the PSE directly
addresses the Functions and Factors of FAR 25.1523 Appendix D, while both SWAT and
the NASA-TLX address generic underlying factors.
PHYSIOLOGICAL
Eyeblink rate was found to be a reliable measure, but could not discriminate among the
various workload conditions. Again the reader is reminded of the different visual tasks
required for piloting a commercial transport aircraft, some which may cause a decrease
in eyeblink rate while others cause an increase. This may be the reason discriminability
of the workload conditions was not found. Eyeblink rate may be of more utility in
examining workload in aircraft environments that do not require as much reading and
head turning to scan system instruments, such as tactical fighter aircraft.
Mean Heart Rate again demonstrated phenomenal reliability: all 25 test/retest correlation
coefficients were significant. Unfortunately mean Heart Rate could not discriminate
among the three workload flights (Nominal, Communication, and Malfunction). The lack
of discriminability was surprising. In an attempt to understand the lack of discriminability
Post Hoc a careful examination was made of the changes in Heart Rate within the
measurement windows (Metalis et. al., 1989). It appears that Heart Rate, whether
influenced by arousal and/or workload, is extremely sensitive to shifts in task demands
in an acute fashion. That is, the phasic changes associated with Heart Rate occur quite
quickly then return to a baseline quickly. The interesting point to note is the length of the
measurement period is important when considering workload. A measurement window
whiich is long may have changes in Heart Rate due it increases in task demands mixed
together with periods of rather low workload (low task demands).
Heart Rate Variability again demonstrated poor discriminabil',t, among the various
workload conditions. Heart Rate Variability did not demonstrate evidence of reliability
either.
The Blood Pressure and Respiration components of the Power Spectral Analysis did not
demonstrate discriminability in the Full-Mission simulation. The Blood Pressure
component did however demonstrate reliability, but a reliable measure that cannot
discriminate among different workload levels does not have much utility.
The anomalous findings for the Respiration component in the Part-Task simulation,
increase 'in the Respiration component with increasing workload, was not replicated in
the Full-Mission simulation.
PERFORMANCE
Since both workload measures which accompany the Secondary Task were abandoned,
only the control input activity measures were examined in the Full-Mission simulation.
As was demonstrated in the Part-Task simulation, control input activity for the Wheel and
Column demonstrated both validity and reliability. As was seen in the Part-Task
simulation Pedal activity is extremely low when maneuvering at altitude. Even with the
162
low values for some of the phases of flight Pedal activity could discriminate between
workload flights (Nominal, Communication, and Malfunction). The low values in some
measurement windows had a detrimental influence on the reliability coefficients for those
phases of flight.
9.5.2 DISCUSSION OF CORRELATION MATRIX
The most notable relationship change from the Part-Task to the Full-Mission simulations
is the significant correlation of Heart Rate (IBI) to most of the other measures (SWAT in-
flight & post-flight, Bedford, Wheel & Stick control inputs, and both the components from
the power spectral analysis: Blood Pressure & Respiration).
The three subjective rating methods are highly inter-correlated. An interesting finding is
that the highest correlation is among the two post-flight measures SWAT (post-flight) and
Bedford (r=0.98). The correlation of SWAT in-flight and post-flight is smaller (r=0.94)
than the correlation between the two post-flight measures.
The anomaly of a positive correlation of the respiration component (power spectral
analysis) to other workload measures in the Part-Task simulation was not replicated in
the Full-Mission simulation.
9.6 GENERAL DISCUSSION
The contract effort has attempted to identify suitable workload assessment techniques
for aircraft certification by reviewing the literature (Fact Matrices), find consensus among
experts (two Workshops), and empirical testing (two simulation studies).
The experiments we do today, if successful, will need
replication and cross-validation at other times under other
conditions before they can become an established part of
science, before they can be theoretically interpreted with
confidence. (Campbell and Stanley, 1963)
The empirical requirements for sound research put forth by Campbell and Stanley have
been complied with in the simulation tests conducted as part of this contract. A
test/retest methodology was employed in the simulation tests to determine the reliability
of each measure under identical test conditions. In order to replicate and cross-validate
the results two simulation tests were conducted. The second simulation test included
more severe workload manipulations (e.g., Missed Approach; "A," as opposed to "B,"
hydraulics system failure; enroute diversion), as well as simply replicating the conditions
of the first simulation test.
163
10.0 CONTENTS OF VOLUME TWO
Perhaps the most important product of this contract is Volume Two of the Final Report.
A series of guidelines were developed to aid in the evaluation of workload assessment
certification programs for commercial transport aircraft.
The guidelines were complied based on the data and experience obtained in this
research contract from the literature review, part-task simulation testing, full-mission
simulation testing, and the two workshops, These guidelines include:
(a) Evaluation criteria for assessment techniques,
(b) Workload assessment techniques guidelines,
(c) Guidelines for task scenario development.
It is the purpose of Volume II of the final report to present specific guidelines and
recommendations for evaluating workload certification plans. No attempt is being made
to provide a "cookbook" for the generation of an aircraft workload certification plan. An
emphasis is placed upon the transient nature of workload assessment. In a few years,
many of the current state-of-the art workload measures may become obsolete. The
contents of Volume II are designed to allow for the evaluation of current, and yet to be
developed, workload assessment techniques.
Volume Two contains guidelines and recommendations for evaluating the validity,
reliability, and applicability of proposed workload certification plans. The specific areas
addressed for evaluation criteria include:
(a) Validity,
(b) Reliability,
(c) Applicability.
The workload assessment techniques are broken down by domain area:
(a) Subjective,
(b) Performance,
(c) Physiological,
(d) Analytical.
Advantages and liabilities of the techniques employed in the simulation studies (as
reported in Volume One) are discussed. Previous work reported by others using the
various assessment techniques is documented for the reader as well. Additionally, data
derived evidence is given for the recommendation of specific valid, reliable, and
applicable workload measures.
Finally, the process of evaluating the scenario description is itemized:
164
11.0 RECOMMENDATIONS FOR FUTURE WORK
The present work does not presume to have put forth the definitive empirical
investigation of workload. What the present work has accomplished is a rigorous testing
of the validity and reliability of existing workload measures in a commercial transport
aircraft environment. It is hoped that this work will serve as a reference to personnel who
must evaluate the adequacy of candidate workload measures in certification plans for
FAR 25 type approval.
Follow-on research should identify the points (minimum and maximum) for each
measure where crew performance breaks down. Peak workloads have attracted a fair
amount of attention due to ever increasing demands being imposed on flight crews.
Current design practices, however, have resulted in reduced crew workload in virtually
every segment of flight. Currently there is a need to explore low workload. In
commercial aviation, underload (long periods of low task-demands) should be further
explored to identify the affect it can have on crew performance. Ifthe crew habituates to
a lower task demands (experiences a reduction of available capacity) due to underload,
moderate levels of task difficulty can suddenly be experienced as high workload.
Another pressing issue is to develop a technique which will allow the results of workload
assessment from different pilots to be evaluated on a common scale. The development
of an "absolute" workload scale may not be so much a question of workload
methodology as it is psychometrics.
Finally, the influence of workload on performance needs to be addressed in a definitive
fashion. Intuitively it is easy to understand how periods of high workload can cause
performance to drop off. The subtle influence of workload on performance and error
rate is perhaps the behavioral link which stimulates the interest in workload in an aviation
environment.
165
REFERENCES
166
Metalis, S. A., Biferno, M. A., and Corwin, W. H. (1989), "Is Heart Rate a Valid, Reliable,
and Applicable Index of Pilot Workload in Commercial Transport Aircraft?"
Proceedings of the Fifth International Aviation Psychology Symposium.
Columbus, Ohio.
Miller, K. M. (1976), "Timeline Analysis Program (TLA-1), final report," Boeing Document
D6-42377-5, Prepared for National Aeronautics and Space Administration,
Langley Research Center (NASA-CR-144942).
Mulder, G. (1979), "Sinus Arrythmia and Mental Workload," In (Ed. Moray, N.) Mental
Workload: Its Theory and Measurement. Plenum Press, New York.
Orlady, H. W., (1982), "Flight Crew Performance When Pilot Flying and Pilot Not Flying
Duties are Exchanged," (NAS2-10060, Contractor Report 166433), Ames
Research Center: Moffett Field, CA.
"Ruggiero, F. T., and Fadden, D. M. (1987), "Pilot Subjective Evaluation of Workload
During a Flight Test Certification Program," Proceedings of the AGARD
Conference on Methods to Assess Workload, (AGARD-CPP-282).
Vidulich, M., and Bortolussi, M. (1989), "A Dissociation of Objective and Subjective
Workload Measures," Proceedings of the Thirty-Second Annual Meeting of the
Human Factors Society, 1471-1476.
167
APPENDIX A
168
DISPATCH RELEASE PACKETS
DISPATCH RELEASES
SFO-SCK
FLIGHT PATH WEATHER
Takeoff SFO 28R SFO CLR 20 59/45 2505 995
Quiet 9 Departure SCK CLR 20 62/40 2910+ 15 990
REBAS Intersection SMF CLR 15 60/52 1905 998
Direct Modesto VOR
Landing SCK 29R
ROUTE DATA WEIGHT & BALANCE
Distance 106 NM OEW 101,600 lbs.
Trip Fuel 5,600 lbs. Payload 23,400 lbs.
Altitude 11,000 ft. ZFW 125,000 lbs.
ETE 0 + 27 Fuel 14,000 lbs.
Alternate SMF TOGW 139,000 lbs.
Distance 57 NM CG 24%
Fuel 2,500 Ibs. Est. LGW 133,400 lbs.
EIE 0+ 14
Holding (30 Min.) 3,700 lbs.
Total Fuel Req. 11,800 lbs.
Fuel on board 14,000 lbs.
Reserves 2,200 lbs.
NOTAMS MEL
SCK RWY 291 Closed Autopilot Inop.
SMF-SFO
FLIGHT PATH WEATHER
Takeoff SMF 16 SMF CLR 15 60/52 1905 998
Direct SAC VOR SFO CLR 20 59/45 2505 995
Risti 2 departure OAK CLR 10H 58/48 CLM 990
CEDES Intersection
Landing SFO 28L
ROUTE DATA WEIGHT & BALANCE
169
ETE 0+ 10
Holding (30 Min.) 3,700 lbs.
Total Fuel Req. 11,000 lbs.
Fuel on board 14,000 lbs.
Reserves 3,000 lbs.
NOTAMS MEL
None Autopilot Inop.
LAX-(SFO)-(OAK)-SMF
FLIGHT PATH WEATHER
Takeoff LAX 25R
Ventura 9 Departure LAX CLR 15 60/40 CLM 990
RZS Transition SFO CLR 20 59/45 2505 995
J-501 BSR SMF CLR 15 60/52 1905 998
BSR profile descent
SFO
ROUTE DATA WEIGHT & BALANCE
Distance 290 NM OEW 101,600 lbs.
Trip Fuel 11,000 lbs. Payload 23,400 lbs.
Altitude FL31 0 ZFW 125,000 lbs.
ETE 0 + 55 Fuel 25,000 lbs.
Alternate SMF TOGW 150,000 lbs.
Distance 85 NM CG 21%
Fuel 4,500 lbs. Est. LGW 139,000 lbs.
ETE 0+ 10
Holding (30 Min.) 3,700 lbs.
Total Fuel Req. 19,200 lbs.
Fuel on board 25,000 lbs.
Reserves 5,800 lbs.
NOTAMS MEL
None Autopilot lnop.
170
APPENDIX B
171
FULL-MISSION SIMULATION EXAMPLE SCENARIO
UPDATE 10/02/87
SUNDAY, JANUARY 25, 1987 TWILIGHT
START LEG 1850
HH:MM:SS CAPTAIN
RUNWAY 24R.
00:00:15 TAKEOFF
*DEPRESSES TOPS OF RUDDER PEDALS WITH FEET.
*HEARS BRAKE LEVER RELEASE.
*SENSES AIRPLANE START TO ROLL.
*PLACES RIGHT HAND ON THRUST LEVERS.
*PLACES LEFT HAND ON OUTBOARD GRIP OF CONTROL
WHEEL.
*ADVANCES THRUST LEVERS FOR INITIAL ACCELERATION' AND
ALLOWS EVEN ENGINE SPOOLUP.
*LOOKS AT EPR INDICATORS FOR EVEN ENGINE
ACCELERATION.
*CONTINUES THRUST LEVERS TO APPROXIMATE TAKEOFF
SETTING
*CHECK EPR INDICATORS FOR TAKEOFF BUG SETTING
*KEEPS RIGHT HAND ON THRUST LEVERS AS F/O ADJUSTS.
*LOOKS THROUGH LEFT FRONT WINDOW ALONG RUNWAY
CENTERLINE.
*STEERS AIRPLANE ALONG RUNWAY CENTERLINE
WITHRUDDER PEDALS.
*MAINTAINS LIGHT FORWARD PRESSURE ON COLUMN.
*KEEPS WINGS LEVEL.
*HEARS F/O, "80 KNOTS."
*LOOKS AT AIRSPEED DISPLAY.
*SAYS, "CHECK."
*CONTINUES LOOKING OUT FORWARD WINDOW.
00:00:38 ROTATION
*HEARS F/O, "V ONE."
*MOVES RIGHT HAND FROM THRUST LEVER TO
172
CONTROL WHEEL.
*HEARS F/O, "ROTATE."
*LOOKS AT AIRSPEED.
(SEES CORRELATION WITH F/O REPORT).
*BEGINS TO APPLY BACK FORCE ON CONTROL WHEEL.
*ROTATES TO LIFTOFF ATTITUDE.
*COMPLETE ROTATION TO DESIRED ATTITUDE WITH
REFERENCE TO ATTITUDE INDICATOR.
*SENSES LIFT OFF.
*CHECKS ALTIMETER AND RATE OF CLIMB INDICATOR
FOR POSITIVE RATE OF CLIMB.
*HEARS F/O, "V TWO."
*HEARS F/O, POSITIVE RATE OF CLIMB."
*CONFIRMS INCREASING ALTITUDE AND RATE OF CLIMB.
173
00:01:51 LEVEL AT 2500 FEET
*LEVELS AIRPLANE AT 2500 FEET USING ADI,
ALTIMETER AND VERTICAL SPEED INDICATOR.
*ADJUSTS THRUST TO MAINTAIN REQUIRED AIRSPEED.
*SETS ELEVATOR TRIM.
00:02:12 FLAPS 2
*CHECKS FLAP INDICATOR AT 2.
*CHECKS AIRSPEED AT 190 AND ACCELERATING.
*CALLS, "FLAPS UP."
*CONTINUES PITCH ADJUSTMENT TO MAINTAIN 500
TO 1000 FPM.
*SETS ELEVATOR TRIM
00:02:26 PASSING LOS ANGELES VORTAC RADIAL 300.
*HEARS LAX TWR," 103, CONTACT LAX DEPARTURE
ON 125.2.
*SEES F/O RETUNE RADIO'
*HEARS F/E CALL DEPARTURE.
*HEARS LAX DEP," 103, PROCEED DIRECT GORMAN,
CLIMB TO FL310."
*HEARS F/O, " 103, DIRECT GORMAN, FL310.
*COMMANDS F70, "TUNE GORMAN ON #1 NAV RADIO."
*STARTS RIGHT TURN AND SETS 322 ON COURSE SELECTOR.
*SEES AIRPLANE ROLL INTO A RIGHT TURN.
00:02:35 FLAPS UP
*CHECKS FLAP INDICATOR AT 0.
*CHECKS AIRSPEED AT 210 KNOTS AND ACCELERATING.
*CHECKS LEADING EDGE FLAPS TRANSIT (AMBER) LIGHT OUT
PRIOR TO EXCEEDING 210 KNOTS.
*CHECKS FLAP INDICATOR AT UP.
*MOVES THROTTLES UNTIL 3 EPRS SHOW CORRECT SETTING
FOR CLIMB POWER.
*HEARS F/O, "FLAPS UP."
*ADJUSTS PITCH ATTITUDE TO ACCELERATE TO 250 KNOTS.
*SETS ELEVATOR TRIM.
174
00:03:40 AFTER TAKEOFF CHECKLIST
*CALLS "AFTER TAKEOFF CHECKLIST."
*HEARS F/E, "IGNITION, OFF".
*HEARS F/E, "NO SMOKING AND SEAT BELT, OFF."
*HEARS F/E, "ANTI-ICE CLOSED, GEAR UP AND OFF, FLAPS
UP, NO LIGHTS."
*HEARS F/E, "AUTO PACK TRIP SWITCH, CUT OUT."
*HEARS F/E, "HYDRAULICS, PRESSURE AND QUANTITY
NORMAL."
*HEARS F/E, "PRESSURIZATION, CHECKED AND SET".
*HEARS F/E, "AFTER TAKEOFF CHECKLIST COMPLETE."
00:04:40 CALLS DISPATCH
*TELLS F/O, "GIVE DISPATCH TIMES."
*PRESSES NO.1 VHF RECEIVER SWITCH ON.
*HEARS F/O, " 103, SAN FRANCISCO, 103 PUSH
BACK XXXX, OFF AT XXXX, ESTIMATE SAN FRANCISCO XXX
*PRESSES NO.1 VHF RECEIVER SWITCH OFF.
00:04:45 ENTERING ICING CONDITIONS
*OBSERVES AIRCRAFT APPROACHING CLOUDS.
*OBSERVES TAT BELOW 10 DEGREES C.
*TURNS ON ENGINE IGNITION AND DIRECTS F/O
TURN ON ENGINE ANTI-ICE.
*OBSERVES EPR DROP ALL 3 ENGINES.
*OBSERVES STABLE ENGINE OPERATION.
175
ON NUMBER 3. SWITCHING AIR SOURCE TO NUMBER 2."
*CAPT SAYS, "ROGER, LET IT COOL AND ATTEMPT A RESET.
COMPLETE THE CHECKLIST."
*F/E ACKNOWLEDGES.
176
NAV RADIO
*TURNS TO NEW COURSE OF 304 DEGREES
*SEES VOR/LOC LIGHT IS GREEN
*TELLS F/O, "SET EPR BUGS TO CRUISE EPR."
*RETARDS THRUST LEVERS SLIGHTLY.
177
ALTERNATE REMAINS SMF. OAKLAND WEATHER IS 200 FT
SCA17ERED 400 FT OVERCAST, VISIBILITY ONE MILE IN FOG,
TEMPERATURE 55, DEWPOINT 54, ALTIMETER 29.86.
*HEARS F/O, "UNDERSTAND, WE'LL DIVERT TO OAK."
178
*SAYS, "WE'RE DIVERTING TO OAKLAND DUE TO
SAN FRANCISCO POWER FAILURE. ESTIMATE ARRIVAL IN
OAKLAND AT
*HEARS FLIGHT")rENDANT, "OK, THANKS, I'LL SPREAD THE
WORD."
*ROTATES MICROPHONE SELECTOR TO VHF2 POSITION.
179
*HEARS FLIGHT ATTENDANT, "FORWARD."
*SAYS, "STARTING DESCENT. ESTIMATING OAKLAND -----
*ROTATES MICROPHONE SELECTOR TO VHF 2 POSITION.
180
*DECIDES MODERATE TURBULENCE IS POSSIBLE
*CALLS, "SEAT BELT AND START SWITCHES ON
*HEARS F/O, "SWITCHES ON".
181
AT 10,000."
*HEARS OAK APP, "103, RADAR CONTACT, MAINTAIN
10,000."
*HEARS F/O, 103, ROGER."
182
*"RADAR ALTIMETER", F/O "CHECKED".
*"FLIGHT INSTRUMENTS, FDS, AND RADIOS", F/O "SET AND
CROSS CHECKED".
"*"GO-AROUND EPR AND VREF" F/O "BUGS SET.
*HEARS F/E, "FUEL, SET FOR LANDING."
*HEARS F/E, "HYDRAULICS, PRESSURE AND QUANTITIES
NORMAL".
*HEARS F/E, "PRESSURIZATION AND COOLING DOORS",
*SET'.
*HEARS F/E, "CIRCUIT BREAKERS, CHECKED".
*HEARS F/E, "DESCENT APPROACH CHECKLIST COMPLETE."
00:52:25 FLAPS 2
*CALLS, "FLAPS 2."
*HEARS F/O, "FLAPS 2."
*HEARS GEAR WARNING HORN.
*F/E PULLS HORN CUTOUT SWITCH.
*SETS ELEVATOR TRIM.
*SEES FLAP INDICATOR AT FLAPS 2 AND ANNUNCIATOR LIGHT
GREEN.
*ADJUSTS PITCH ATTITUDE AND SLOWS AIRPLANE TO
190 KNOTS.
00:52:45 FLAPS 5
*SEES AIRSPEED IS AT 190 KNOTS.
*CALLS, "FLAPS 5."
183
*HEARS F/O, "FLAPS."
*SETS ELEVATOR TRIM.
*SEES FLAP INDICATOR AT FLAPS 5 AND ANNUNCIATON
LIGHT GREEN.
*SLOWS AIRPLANE TO 160 KNOTS.
00:53:00 LEVEL OFF AT 2500 FEET
*SEES ALTITUDE ALERT LIGHT EXTINGUISH.
*LEVELS AIRPLANE AT 2500 FEET USING ADI, ALTIMETER AND
VERTICAL SPEED INDICATOR.
*PLACES FD ALTITUDE HOLD SWITCH ON.
*ADVANCES THRUST TO MAINTAIN 150 KNOTS.
*SETS ELEVATOR TRIM.
00:53:15 FLAPS 15
*SEES AIRSPEED IS AT 160 KNOTS.
*CALLS, "FLAPS 15."
*HEARS F/O, "FLAPS 15."
*SETS ELEVATOR TRIM.
*SEES FLAP INDICATOR AT FLAPS 15 AND ANNUNCIATOR
LIGHT GREEN.
*SLOWS AIRPLANE TO 150 KNOTS.
00:53:30 ILS APPROACH CLEARANCE
*HEARS APPROACH," 103, CLEARED TO INTERCEPT
LOCALIZER, MAINTAI'2500 UNTIL ESTABLISHED
INBOUND. CLEARED FOR ILS 29, CONTACT OAKLAND TOWER
127.2 AT THE MARKER.
*HEARS F/O, 103, TOWER 127.2 AT THE MARKER.
00:54:00 LOCALIZER ALIVE
*HEARS F/O, "LOCALIZER ALIVE."
*SEES LOCALIZER MOVING TOWARD CENTER OF HSI SCALE.
00:54:05 STARTS TURN TO INTERCEPT LOCALIZER
*SETS HEADING CURSOR TO 293 DEGREES.
184
00:55:15 SETS MISSED APPROACH DATA
*REVIEWS MISSED APPROACH PROCEDURE ON APPROACH
CHART.
*SETS HEADING CURSOR ON 260 DEGREES.
*TELLS F/O, "ALTITUDE SELECT 4000."
*HEARS F/O, "4000."
185
GENERATOR."
*CAPT DIRECTS F/O TO DECLARE AN EMERGENCY
*F/O DECLARES EMERGENCY AND REQUESTS RADAR
VECTOR DIRECT TO SMF
186
01:01:30 LEVEL 4000 FEET, FLAP RETRACTION COMPLETE
*SEES ALTITUDE ALERT LIGHT EXTINGUISH
*LEVELS AIRPLANE AT 4000 FEET USING ADI, ALTIMETER AND
VERTICAL SPEED INDICATOR.
*HEARS F/O, "FLAPS UP".
01:03:00 INTERCEPT 110 DEGREE COURSE SAUSALITO VORTAC
*SEES OUTBOUND HEADING FROM SAUSALITO VORTAC IS
290 DEG."
*SETS HEADING AND COURSE CURSORS ON HSI FOR NEW
COURSE OF 290 DEGREE.
187
*F/E CALLS, "STANDBY RUDDER SWITCH, ON".
*CAPT TURNS STANDBY RUDDER SWITCH ON.
*F/E CALLS, "SYSTEM A PUMP SWITCH, OFF".
*CAPT ACKNOWLEDGES.
*F/E CALLS, "AUTOPILOT ELEVATOR SERVO SYSTEM B".
*CAPT PLACES AUTO PILOT Sl SERVO TO B POSITION.
*F/E CALLS, "SYSTEM A FLUID SHUTOFF SWITCH, CLOSE".
*F/E READS, "REVIEW SYSTEM "A" LOSS ADVISORY ITEMS.
*CAPT SAYS, "CONTINUE".
*F/E READS, "CHECK WEATHER, CROSSWIND LIMIT 19 KNOTS.
OBSERVE YAW DAMPER LIMITATION."
*F/E READS, "PLAN MANUAL GEAR AND ALTERNATE FLAP
EXTENSION."
*F/E READS, "PLAN FLAPS 15 LANDING, USE VREF+ 15 KNOTS.
OUTBOARD AND GROUND SPOILERS, NOSE WHEEL STEERING
IS INOPERATIVE."
*F/E READS, "DO NOT OPEN GROUND INTERCONNECT AFTER
LANDING."
*CAPTAIN ACKNOWLEDGES.
01:18:45 CLEARANCE TO 2600 FEET
*HEARS CENTER," 103, DESCEND AND MAINTAIN
2600 FEET".
*HEARS F/0, "BOEING 727, OUT OF SEVEN FOR 2600.
*SEES F/O SET 2600 FEET IN ALTITUDE SELECT WINDOW
*TELLS F/O TO HAVE CENTER REQUEST THAT SMF HAVE
EMERGENCY EQUIPMENT STANDING BY DUE TO ENGINE AND
HYDRAULIC FAILURE, AND TO RELAY NUMBER OF
PASSENGERS ABOARD AND TOTAL FUEL ON BOARD.
01:18:50 DESCENT
*CAPT CALLS, "DESCENT AND APPROACH CHECKLIST FOR
"A" SYSTEM FAILURE DOWN TO FLAP EXTENSION, AND
REVIEW ONE ENGINE INOPERATIVE DESCENT AND
APPROACH CHECKLIST."
*F/E CALLS, "PRESSURIZATION AND COOLING DOORS, SET."
*F/E CALLS, "SEAT BELT, ON"
*F/E CALLS, "ANTI-ICE"
*CAPT RESPONDS, "NOT REQUIRED".
*F/E CALLS, "ALTIMETERS, SET AND CROSS CHECKED".
*CAPT AND F/O RESPOND, "SET AND CROSS CHECKED".
*F/E CALLS, "FLIGHT INSTRUMENTS, RADIOS, AND FDS,
SET AND CROSS CHECKED".
*CAPT AND F/O RESPOND, "SET AND CROSS CHECKED".
*F/E CALLS, "FLAP INHIBIT SWITCH, INHIBIT".
*CAPT OBSERVES F/O PUT SWITCH TO INHIBIT
*F/E CALLS, "GO-AROUND EPR AND VREF + 15, SET BUGS".
*CAPT SETS HIS AIRSPEED BUG
*F/E CALLS, "FUEL, SET FOR LANDING".
*F/E CALLS, "CIRCUIT BREAKER, CHECK".
*F/E CALLS, "ALTERNATE FLAP MASTER SWITCH, ON".
*CAPT OBSERVES F/O TURN SWITCH ON".
188
*F/E CALLS, "FLAP LEVER, 15".
*CAPT OBSERVES F/O PLACE FLAP HANDLE TO 15
*F/E CALLS, "COMPLETE TO FLAP AND GEAR EXTENSION.
THE ONLY ITEM OF INTEREST ON THE ENGINE INOP
CHECKLIST IS TURNING THE RIGHT PACK SWITCH OFF.
I'LL DO IT NOW. WE SHOULD USE FLAPS 15 FOR LANDING.
*CAPT, "ROGER, TURN IT OFF, ROGER, FLAPS 15
FOR LANDING".
*F/E CALLS, "STANDBY FOR MANUAL GEAR EXTENSION"
*CAPT, "STANDBY".
189
01:22:20 LOCALIZER CAPTURE
*F/O CALLS, "LOCALIZER ALIVE."
*SEES LOCALIZER IN HSI AND ADI IS CENTERED.
*SEES SPEED IS AT 150 KNOTS.
*CAPT REDUCES POWER TO SLOW TO VREF + 15
*CAPT TRIMS RUDDER AND STABILIZER
*F/O CALLS, "GLIDE SLOPE ALIVE".
190
01:25:00 TOUCHDOWN AND ROLLOUT
*SEES TOUCHDOWN
*THRUST LEVERS IDLE.
*HEARS SPEEDBRAKE LEVER MOVE TO AFT POSITION.
*SENSES AUTO BRAKES ARE WORKING.
*SEES GROUND SPEED IS DECREASING.
*CONTROLS AIRPLANE ALONG RUNWAY CENTERLINE AND
LOWERS NOSE GEAR TO RUNWAY SURFACE.
*RAISES REVERSE LEVERS TO INTERLOCK STOP.
*FEELS INTERLOCK RELEASE.
*MOVES REVERSE LEVERS TO AFT LIMIT STOP.
*HEARS F/O, "70 KNOTS."
*MOVES REVERSE LEVERS FORWARD AND DOWN OUT OF
REVERSE THRUST.
*SEES ENGINES AT IDLE.
*PRESSES FEET ON TOP OF BRAKE PEDALS.
*CONTINUES FINAL TAXI MANUALLY.
*HEARS TOWER," 103, HOLD ON THE RUNWAY
STAY WITH ME."
191
APPENDIX C
1,
192
FULL-MISSION SIMULATION ATC SCRIPTS
FREQUENCIES:
OTHER TRAFFIC:
WEATHER:
SMF ATIS: INFORMATION ALPHA. CLEAR, VISIBILITY 15, WIND 190/15,
TEMPERATURE 60, DEW POINT 52, ALTIMETER 29.98. LANDING AND
DEPARTING RUNWAY 16.
SFO ATIS: INFORMATION BRAVO. CLEAR, VISIBILITY 20, WIND 250/5,
TEMPERATURE 59, DEW POINT 45, ALTIMETER 29.95. LANDING AND
DEPARTING RUNWAYS 28.
MALFUNCTIONS:
XX352 IS AT TAKEOFF END OF RUNWAY 16. THEY HAVE A CLEARANCE TO SFO VIA
DIRECT SAC, RISTI 2 ARRIVAL, CEDES, SFO. MAINTAIN 11,000. DEPARTURE
CONTROL FREQUENCY 124.5, SQUAWK 6512.
193
CLOSE - FLAPS FIVE
OPEN - FLAPS UP
WINDOW TWO: CLOSE - ONE MINUTE
LATER
SAC DEP: XX352, LEAVING 10,000, DO NOT EXCEED 300 KTS. FOR
SPACING
I
CLOSE WINDOW TWO XX352 GIVE TIME EFFORT STRESS
OAK CENTER: DL334, CLEARED TO SFO VIA THE RISTI 2 ARRIVAL, MAINTAIN
11,000
DL334: CLEARED VIA THE RISTI 2, MAINTAIN 11,000, DL334
194
OAK CENTER: DL334 TRAFFIC, 2 O'CLOCK, 6 MILES, VFR 9,500
DL334: DL334, LOOKING
CESSNA 37UL OAKLAND CENTER, THIS CESSNA 37U, WE'RE VFR AT 7,500,
REQUESTING ADVISORIES TO FAT
195
TW789: ROGER, BAY ON 134.5, TW789
OAK CENTER: N56M, TRAFFIC (XX) O'CLOCK, (XX) MILES,
(DIRECTION), ABOVE YOU
N56M: IN SIGHT, 56M
OAK CENTER: PS751, DESCEND AND MAINTAIN 11,000, SFO ALTIMETER 29.95
PS751: DOWN TO 11,000, ALTIMETER 29.95, PS751
OAK CENTER: NAVY441, DESCEND AND MAINTAIN 12,000, NGZ ALT. 29.93
OAK CENTER: UA388, OAKLAND CENTER, ROGER, CLEARED TO SFO, VIA RISTI
2 ARRIVAL, MAINTAIN 11,000
UA388: TO SFO, VIA THE RISTI 2, MAINTAIN 11,000, UA388
196
OAK CENTER: XX352, CONTACT BAY APPROACH ON 134.5
197
BAY APP: DL334, (XX) MILES FROM BRIJJ, MAINTAIN 3,000 UNTIL
ESTABLISHED ON THE LOCALIZER, CLEARED FOR ILS RUNWAY
28R APPROACH. CONTACT THE TOWER AT BRIJJ
198
OPEN - LOCALIZER ALIVE
WINDOW SIX:
CLOSE - PASSING LOM
199
BAY APP: XX352 CONTACT TOWER 120.5
(after XX352 calls)
200
SCRIPT 2 - SFO to SCK
FREQUENCIES:
OTHER TRAFFIC:
WEATHER:
XX247 IS AT TAKEOFF END OF 28L. THEY HAVE A CLEARANCE TO SCK VIA QUIET 8
TO REBAS, MOD, SIMMS, SCK. MAINTAIN 11,000. DEPARTURE FREQUENCY 120.9
SQUAWK 3642.
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
201
SFO TOWER: XX247, CONTACT BAY DEPARTURE (120.9)
(after XX247 calls)
OPEN - FLAPS UP
WINDOW TWO:
CLOSE - ONE MINUTE LATER
PA454: BAY, THIS IS PA454 OFF OAK, OUT OF 1,000 FOR FL230
BAY DEP: PA454, THIS IS BAY DEPARTURE ROGER, RADAR CONTACT
CP422: ROGER
202
N24X: IN SIGHT, THANK YOU, 24X
CP422: SWITCHING
BAY DEP: XX247, CONTACT SCK APPROACH ON 125.1
SCK APP: XX247 SCK APPROACH, ROGER
BARON22B: SCK APPROACH, THIS IS BARON 22B, JUST OFF LODI, OUT OF
700 CLIMBING, REQUESTING ADVISORIES TO FAT
SCK APP: BARON22B, ROGER, SQUAWK 4133
N22B: SQUAWKING 4133, 22B
SCK APP: BARON22B, RADAR CONTACT 6 NORTHEAST LODI, SCK ALT.
29.90
203
BARON2B: ROGER, 29.90
SCK APP: MAC12378, CONTACT TRAVIS APPROACH, 289.1
SCK APP: AF70287, CONTACT CASTLE APPROACH, 257.9
204
SCK APP: CITATION 37L, CLIMB AND MAINTAIN FL230
CITATION37L: CLIMB AND MAINTAIN FL230, CITATION 37L
SCK APP: CITATION 37L, CONTACT OAKLAND CENTER, 124.2
CITATION37U: OAKLAND CENTER ON 124.2, CITATION 37U
SCK APP: CHEROKEE 2370J, CLIMB AND MAINTAIN 5,000
N2370J: OUT OF 2,700 FOR 5,000, CHEROKEE 70J
205
C0225: THAT'S AFFIRM WE HAVE FOXTROT
SCK APP: XX247, DEPART MOD HEADING 320 AND INTERCEPT THE 29R
LOCALIZER
206
N905T: IN SIGHT, 905T
SCK APP: XX241, 6 MILES FROM JOTLY, CLEARED FOR ILS 29R
APPROACH
XX241: CLEARED FOR THE APPROACH, XX241
207
CLOSE WINDOW SIX XX247 GIVE TIME EFFORT STRESS
C0225: SCK TOWER, C0225 WITH YOU JUST OUTSIDE THE MARKER
208
SCRIPT 3 - LAX to SFO to OAK to SMF
FREQUENCIES:
WEATHER:
LAX ATIS: INFORMATION DELTA. CLEAR, VISIBILITY 15. WIND CALM,
TEMPERATURE 60, DEW POINT 40, ALTIMETER 29.90. LANDING 24L,
DEPARTING 24R.
SFO ATIS: INFORMATION ECHO. MEASURED CEILING 500 OVERCAST, VISIBILITY
1 FOG, WIND 280/5, TEMPERATURE 59, DEW POINT 55, ALTIMETER
29.87. LANDING AND DEPARTING RUNWAYS 28.
MALFUNCTIONS:
NO AUTOPILOT
15 MINUTES INTO FLIGHT - "A"SYSTEM WARNING LIGHT ACTIVATED
PASSING OAK LMM and 1,500 ft. - #1 ENGINE SHUT DOWN
AFTER MISSED APPROACH AT OAK and LEAVING 6,800 ft. - "A" SYSTEM FAILURE
209
XX103 IS AT TAKEOFF END OF 24R. THEY HAVE A CLEARANCE TO SFO VIA
THE VENTURA 9 DEPARTURE DIRECT RZS J501 BIG SUR, BIG SUR PROFILE
DESCENT TO SFO. MAINTAIN 30, EXPECT FL310 5 MINUTES AFTER
DEPARTURE. DEPARTURE FREQUENCY IS 125.2, SQUAWK 3254.
OPEN - FLAPS UP
WINDOW TWO:
CLOSE - ONE MINUTE LATER
LAX DEP: XX103, TURN RIGHT HEADING 280, RECEIVING VTU PROCEED
DIRECT. CLIMB AND MAINTAIN FL230
LAX DEP: XX108, FLY HEADING 280, JOIN J501, RESUME YOUR OWN
NAVIGATION CLIMB AND MAINTAIN FL230, CONTACT LAX
CENTER ON 135.5
XX108: JOIN J501, FL230, CHANGING, XX108
TW344: DEPARTURE, THIS IS TW344 OUT OF 700 FOR 3,000
210
LAX DEP: TW344 LAX DEPARTURE, RADAR CONTACT. CLIMB AND
MAINTAIN 15,000
TW344: 15,000, TW344
UA1 22: LAX DEPARTURE UA1 22 WITH YOU OUT OF 800 FOR 3,000
LAX DEP: UA122 LAX DEPARTURE, RADAR CONTACT, TURN RIGHT
HEADING 010 VECTORS PMD, CLIMB AND MAINTAIN 15,000
UA1 22: RIGHT TO 010 AND UP TO 15,000, UA122
LAX CENTER: XX103, TRAFFIC (XX) O'CLOCK, (XX) MILES, SOUTHEAST BOUND
BELOW YOU
LAX CENTER: XX103, CLIMB AND MAINTAIN FL310, CONTACT LAX CENTER ON
125.65
LAX CENTER: AL260, DESCEND AND MAINTAIN 15,000, THE BUR ALTIMETER
29.89, CONTACT LAX CENTER ON 135.5
211
AL260: 15,000 AND LAX ON 135.5, AL260
LAX CENTER: XXl08, SAY YOUR MACH NUMBER
XX108: .80
LAX CENTER: ROGER
LAX CENTER: ASPEN 72, VERIFY ALTITUDE
LAX CENTER: DACO 11, TURN LEFT HEADING 090, DIRECT PMD WHEN
RECEIVING
LAX CENTER: AA600, DESCEND AND MAINTAIN FL250, CLEARED DIRECT FLW,
DIRECT LHS
AA600: DOWN TO 250, DIRECT FLW DIRECT LHS, THANK YOU, AA600
212
LAX CENTER: PS1 282 CLEAR OF TRAFFIC
PS1 282: THANK YOU
LAX CENTER: XX1 08, CONTACT OAK CENTER ON 133.7
XX108: 133.7, GOOD NIGHT
(company will call XX103 via SELCAL and tell them to divert to OAK due to a power
failure at SFO) (1 to 1 1/2 min. after 3 closes)
213
LAX CENTER: XX103 CONTACT OAK CENTER ON 133.7
OAK CENTER: XXi03 ROGER, CLEARED TO THE OAK AIRPORT VIA THE SFO
RUNWAY 28 BSR PROFILE DESCENT. DEPART MENLO HEADING
340 AND INTERCEPT THE ILS RUNWAY 29 FINAL APPROACH
COURSE. MAINTAIN FL 310 FOR NOW
214
OPEN - THROTTLES AT IDLE
WINDOW FIVE:
CLOSE - TWO MINUTES LATER
215
BAY APP: XX108, DESCEND AND MAINTAIN 4,000
XX108: ROGER, DOWN TO 4,000, XX108
BAY APP: XX108, TURN LEFT HEADING 310 AND INTERCEPT THE
LOCALIZER
XX108: 310 FOR THE INTERCEPT XX108
BAY APP: XXl08, DESCEND AND MAINTAIN 3,000
XX108: 3,000, XX108
BAY APP: PS1492, REDUCE SPEED TO 180 KTS
PS1492: BACK TO 180, PS1492
BAY APP: WPM34 CONTACT BAY APPROACH ON 346.0
BAY APP: XX103, DESCEND AND MAINTAIN 4,000
BAY APP: XX108, (XX) MILES FROM MARCE, MAINTAIN 2,500 UNTIL
ESTABLISHFD ON THE LOCALIZER, CLEARED FOR THE ILS 29
APPROACH
BAY APP: XX103, (XX) MILES FROM MARCE, MAINTAIN 2,500 UNTIL
ESTABLISHED ON THE LOCALIZER, CLEARED FOR THE ILS 29
APPROACH
BAY APP: XX108, CONTACT TOWER ON 127.2
XX108: CHANGING
N693X: BAY THIS IS N693X, WEVE DECIDED TO GO TO SJC
216
N693X: HEADING 100, ROGER
BAY APP: N693X CLIMB AND MAINTAIN 5,000
OAK TOWER: XX103, THE RVR IS LESS THAT 1,000 REQUEST INTENTIONS
217
CLOSE WINDOW SEVEN XX103 GIVE TIME EFFORT STRESS
218
OAK CENTER: XXl03, OAK CENTER ROGER
OAK CENTER: XX108, CONTACT SAC APPROACH ON 125.6
XX108: ROGER, 125.6, SO LONG
OAK CENTER: PS1 254, CLIMB AND MAINTAIN FL 230
PS1254: 230,1254
OAK CENTER: P1670, CONTACT THE CENTER ON 135.45
P1670: 135.45,670
SAC APP: XX103, TURN LEFT HEADING XXX AND INTERCEPT THE 34
LOCALIZER DESCEND AND MAINTAIN 3,000
SAC APP: N498J, TRAFFIC NO LONGER A FACTOR, TURN RIGHT HEADING
140, PROCEED DIRECT MOD, RESUME YOUR OWN NAVIGATION
N498J: DIRECT MOD, THANK YOU, N498J
219
SAC APP: XXl03, (XX) MILES FROM LANEE, MAINTAIN 3,000 UNTIL
ESTABLISHED ON THE LOCALIZER, CLEARED FOR THE ILS 34
APPROACH
SAC APP: XXi08, CONTACT SMF TOWER ON 125.7
XX108: CHANGING
SAC APP: N498J, CLIMB AND MAINTAIN FL 230
UA553: CHANGING
(after XXi03 lands)