[go: up one dir, main page]

Academia.eduAcademia.edu

Assessing the Written Communication Skills of Medical School Graduates

2000, Advances in Health Sciences Education

Advances in Health Sciences Education 9: 47–60, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands. 47 Assessing the Written Communication Skills of Medical School Graduates JOHN R. BOULET∗, THOMAS A. REBBECCHI, ELIZABETH C. DENTON, DANETTE W. MCKINLEY and GERALD P. WHELAN Research and Evaluation, Educational Commission for Foreign Medical Graduates (ECFMG), 3624 Market Street, Philadelphia, PA 19104-2685, USA (∗ author for correspondence, e-mail: jboulet@ecfmg.org) Abstract. The ECFMG Clinical Skills Assessment (CSA ) was developed to evaluate whether graduates of international medical schools (IMGs) are ready to enter graduate training programs in the United States. The patient note (PN) exercise, conducted after a 15-minute interview with a standardized patient (SP), is specifically used to assess a candidate’s ability to summarize and synthesize the data collected. On a yearly basis, approximately 75,000 patient notes are reviewed and scored by physician raters. Recent changes to the PN scoring rubric, combined with enhancements to quality assurance procedures, mandate that additional evidence be provided to support the intended use of PN scores. The purpose of this study was to further investigate the psychometric adequacy of PN scores. Generalizability analyses suggest that while variability in PN ratings can be attributed to the choice of rater, candidate scores are reproducible over the 10-encounter CSA. The relationship of PN scores with other related ability measures and select candidate characteristics provides additional evidence to support the validity of the written exercise. Key words: certification tests, reliability, standardized patient, validity, written communication Introduction The ability of physicians to communicate with patients and other health professionals, both verbally and in writing, is a fundamental skill. Recently, the role of doctor-patient communication skills in patient care has been highlighted in the literature, emphasized in training programs, and studied extensively (Rollnick et al., 2002; Shapiro, 1999; van Dalen et al., 2002). However, while the verbal communication between physician and patient has been linked to treatment compliance, satisfaction, and medical outcomes (Williams et al., 1998), comparatively little research has been focused on written communication and its role in the management of patients. Charting errors, including the provision of inaccurate information and documentation inconsistencies, can not only have a negative impact on health care outcomes, but can also be an important factor in any malpractice litigation. Furthermore, illegible medical records can lead to added financial costs and inferior patient care (Weber, 2002). Since progress notes are used to document medical history and physical examination, and eventually to imple- 48 JOHN R. BOULET ET AL. ment treatment regimes, it is imperative that the recorded information is readable, comprehensive, and intelligible. While there are many ways for health professionals, including physicians, to document the information gathered during patient or client encounters, use of the SOAP (subjective, objective, assessment, plan) format is presently common (Grace-Farfaglia and Rosow, 1995; Larimore and Jordan, 1995; Sleszynski et al., 1999). Within this framework, the physicians document what the patient told them (subjective: chief complaint, history of present illness, past medical history), what they saw in the examination (objective: significant positive and negative physical findings), the assessment (problem list, diagnoses), and the plan (treatment, further diagnostic tests). The information can be written on paper, transcribed through dictation, or machine-entered via keyboard or hand-held device. The strength of the SOAP format is its ease of use and its general adoption and acceptance by a wide variety of health care practitioners (e.g., clinical nutritionists, dietitians, chiropractors, occupational therapists, allopathic/osteopathic physicians). Although the documentation of clinical activities is important, especially for physicians, the proper assessment of charting skills can be difficult, timeconsuming, and expensive. Expert review of charts is problematic in that the “true” nature of the patient complaint is often unknown, making the evaluation of accuracy highly subjective. Unfortunately, without secondary patient interviews, the correctness and truthfulness of the physician’s documentation cannot be assessed with great precision (Cradock et al., 2001). Standardized assessments, using either real or simulated patients, can, however, overcome this problem. Here, the patient complaint is fixed, and the physician must document a known medical history and associated physical symptoms. As a result, inaccurate or spurious information can be easily spotted. Furthermore, when more than one individual is being assessed, meaningful score comparisons can be made. These benefits, combined with the overall importance of written communication skills in medical practice, suggest that the development and validation of standardized assessment methods is essential. A number of studies have been undertaken to develop methods to evaluate the documentation skills of medical students, residents and physicians (Boulet et al., 1998b; Crossley et al., 2001; Howell et al., 2000). Many of these assessments have used some form of post-encounter exercise that is part of a larger standardized patient (SP) assessment. Typically, the physician gathers data from the patient (e.g., medical history, physical examination results) and summarizes these findings, often including a differential diagnosis, on a structured assessment form. Scoring these exercises, while somewhat dependent on the assessment form, is generally straightforward. Where the patient condition is known, as would be the case for an examination that utilizes standardized patients, one of the simplest methods is to review patient notes for content. For example, based on SOAP classification, experts can easily determine a list of keywords/phrases that, given the patient history and presenting complaint, would be expected to be found on ASSESSING WRITTEN COMMUNICATION IN MEDICAL 49 a well-documented note. Several investigators have used some form of analytic method for measuring achievement (e.g., keyword matching) and found that not only physicians but also raters with other medical backgrounds (e.g., nurses, billing clerks) could provide scores that were sufficiently reproducible (Friedman et al., 1997). Unfortunately, while analytic scoring can produce reliable scores, it is difficult, if not impossible, to assess important written communication skills such as organization and clarity using this method. In addition, scores are typically based solely on the content of the note, ignoring superfluous or erroneous information and potentially egregious actions. From a score validity perspective, the use of holistic scoring is often preferred (Boulet et al., 2000; Slater and Boulet, 2001). Here, trained experts read the summaries and provide ratings based not only on medical content but also on traits such as interpretability, logic and thought processes. Provided that the scoring rubrics are well-defined, and the raters have sufficient knowledge of the content domain, the scores based on holistic rating can be both reliable and valid (Regehr et al., 1998). Since the pioneering work of Harden and Gleeson (1979), the use of Objective Structured Clinical Examinations (OSCEs) and standardized patient assessments has increased dramatically. These types of examinations are now commonly used by licensing and certification bodies to assess the professional competencies of examinees. The Educational Commission for Foreign Medical Graduates (ECFMG) administers a clinical skills assessment (CSA) as part of the certification requirements for graduates of international medical schools. Certification, which includes the assessment of clinical skills, is meant to ensure that graduates of medical schools outside of the United States and Canada are ready to enter graduate medical education programs in the United States. The Medical Council of Canada (MCC) also evaluates clinical skills as part of the licensure requirements for physicians wishing to practice in Canada (Medical Council of Canada, 2002). Finally, the National Board of Medical Examiners (NBME) has administered SP cases to thousands of examinees over the past decade in preparation for inclusion of a clinical skills component into the United States Medical Licensing Examination (Friedman et al., 1999; Hallock, 2002). While the psychometric adequacy of these types of assessments has been studied extensively, validation is an ongoing process, requiring regular accumulation of evidence to support the use and interpretation of test scores. This is especially true if adjustments or modifications are made to the content, administration or scoring of any embedded exercises. Purpose The post-encounter patient note (PN) exercise has been a fundamental component of the ECFMG Clinical Skills Assessment since its inception in 1998. This exercise allows candidates to summarize and interpret the data collected in a clinical encounter with a standardized patient. Although analytic scoring was attempted in earlier field trials, the initial implementation of CSA utilized holistic ratings by 50 JOHN R. BOULET ET AL. trained physician raters. Originally the ratings were provided on a 1 to 4 scale, ranging from unacceptable to superior. While valid and moderately reproducible scores could be obtained using this rubric, the analysis of quality assurance (QA) data (inter-, intra-rater statistical summaries) suggested that potential modifications to the scoring rubric, rater training, and feedback mechanisms could provide for more reliable scores. Furthermore, based on focus group sessions, the raters indicated that, at a broad level, it was easier to differentiate performance along three dimensions (i.e., unacceptable, superior, or somewhere in between) than four. Based on this feedback, and a number of pilot studies, the patient note scoring rubric was modified. All patient note raters (PNRs) were retrained and the new rubric was implemented for live scoring at the end of March 2001. The specific purpose of this investigation was to examine the psychometric properties of patient note scores obtained using the modified scoring anchors. Methods M EASUREMENT TOOLS Clinical skills assessment (CSA). The Educational Commission for Foreign Medical Graduates is responsible for the certification of graduates of international medical schools (IMGs), those individuals who attended medical school outside the United States or Canada, who wish to enter graduate training programs in the United States. The CSA, unlike the other certification requirements, is performance based. Candidates must demonstrate their clinical skills in a high-fidelity simulated environment. This is accomplished by training people to portray patients with common clinical conditions. These individuals, known as standardized patients (SPs), can consistently and accurately model the complaints, histories, and mannerisms of real patients. Most SPs employed by ECFMG are trained to portray more than one patient. Similarly, most cases used in the examination can be performed by more than one SP. Candidates must evaluate a series of 10 SPs, interacting as they would with actual patients. They are instructed to gather relevant patient data, perform focused physical examinations as needed, and summarize the data in the form of a clinical note. Scores for data gathering (DG) are based on case-specific checklists completed by the SPs following the 15-minute encounter. These checklists include questions that should be asked and physical examination maneuvers that should be performed. Following the patient encounter the candidates are required to summarize their findings in a patient note (PN). The DG and PN component scores are combined to form an Integrated Clinical Encounter (ICE) composite score. The SPs also evaluate the communication skills of the candidates. Interpersonal skills (IPS) are evaluated along four dimensions: interviewing and collecting information, counseling and delivering information, personal manner, and rapport. Spoken English proficiency (ENG) is also evaluated in every encounter. These ASSESSING WRITTEN COMMUNICATION IN MEDICAL 51 evaluations are combined to form a doctor-patient communication composite (COM). Patient note scoring rubric. Patient note raters are specifically trained to provide scores for individual clinical encounters (cases). Typically, each rater is qualified to rate several cases. To generate note scores, each PN is rated by a single rater who is individually trained for that case. Although multiple ratings of individual notes are often done for quality assurance reasons, only a single rating of each note is used to produce a total PN score. The notes are distributed so that a minimum of three different raters provides ratings for each candidate’s set of notes. On average, between 8 and 9 different raters provide scores for each individual candidate (mean = 8.6, min = 6, max = 10). Candidates typically produce 10 notes, one for each encounter, over the course of the assessment. Three performance levels (unacceptable, acceptable, and superior) are defined on the scoring rubric. In addition, these performance levels are described for each of the patient note components (medical history and physical examination, differential diagnosis, diagnostic work-up) (see Appendix A). To use this modified scoring rubric the rater must first determine, based on the written descriptions, whether the note is unacceptable, acceptable or superior. This judgement is based on a holistic overview of the adequacy of the candidate’s written information in each section of the note. Once this initial performance level is determined the rater must then settle on an appropriate gradation within the preliminary category. For example, if a note is first judged to be acceptable, the rater can assign a score from 4 (just above unacceptable) to 6 (closer to superior). As a result, patient note scores can range from 1 (clearly unacceptable) to 9 (clearly superior). Post-CSA questionnaire. All candidates are asked to provide information regarding the logistics of exam administration, prior medical training, and medical school characteristics. Over 98% of the candidates complete the questionnaire. The candidates are encouraged, but not required, to identify themselves. Over 95% of the candidates supply their unique identifier, allowing CSA performance data to be linked to individual survey responses. Criterion variables. Performance on the United States Medical Licensing Examination (USMLE) Steps 1 and 2, and the Test of English as a Foreign Language (TOEFL) were used as criterion measures. Step 1 (basic science) assesses whether a candidate can understand and apply important concepts of the sciences basic to the practice of medicine. Step 2 (clinical science) assesses whether a candidate can apply medical knowledge and understanding of clinical science necessary for patient care. The TOEFL measures the ability of nonnative speakers of English to use and understand North American English. 52 JOHN R. BOULET ET AL. S AMPLE Candidates. There were 7,375 CSA administrations between April 2001 and March 2002. Within this cohort, there were 6,225 first-time takers. To eliminate potential confounding due to repeat testing, only first-time takers were used in the analyses. The majority of the candidates were male (58.3%). Based on citizenship at medical school, the largest candidate cohorts were United States (23.5%), India (20.6%), and Pakistan (6.7%). Asians constituted 43.6% of the first-time administrations, followed by Whites (33.2%). Almost one quarter of the sample (n = 1458) was US citizens who attended medical schools outside of the United States, Canada and Puerto Rico. For the majority of candidates (76.0%), English was not their native language. The average age of the study population was 33.3 years (min = 22.5, max = 67.0, SD = 5.3). Patient notes. For first time CSA administrations there were 61,497 PN ratings.1 Standardized patients. Approximately 56% of the encounters were with female standardized patients. The distribution of SP encounters by self-declared ethnicity was as follows: Asian (0.5%), Black (52.2%), Hispanic (9.5%), Indian (0.5%) and White (37.4%). Cases. There were 101 different cases used in the one-year time frame. The cases were fairly evenly distributed based on categorizations of primary reason for visit (Abdominal, 19.9%; Chest, 19.7%; Constitutional, 20.0%; Neurological, 19.7%; Miscellaneous, 20.7%) acuity (Acute, 31.6%; Sub-Acute, 36.8%; Chronic, 31.6%) and patient age category (18–44, 36.6%; 45–64, 39.7%; 65+, 23.8%). Patient note raters. In the time period studied, 43 PN raters provided scores. All patient note raters (PNRs) are physicians. They must be licensed to practice medicine, certified by a specialty board of the American Board of Medical Specialties (ABMS) or the American Osteopathic Association (AOA), and have experience in medical education. The majority of the raters were male (60%). Just over 50% of the raters indicated that internal medicine was their primary specialty. The next largest cohorts were emergency medicine (13%) and family medicine (13%). A NALYSIS Generalizability theory (Brennan, 2001) was used to estimate variance components and provide measures of the reproducibility of PN scores. The design was a persons by raters nested in cases (p × [raters: cases]). Here, notes for each examinee are rated individually by a set of up to 10 different raters, with each rater specifically assigned to a particular case or cases. Although each individual’s score is obtained from multiple raters, each note is only rated once for operational scoring purposes. Pearson correlations were used to quantify the strength of the relationships between ASSESSING WRITTEN COMMUNICATION IN MEDICAL 53 Table I. Estimated variance components for patient note ratings (all first time takers) Estimate % of total variance Person Case Rater(Case) Error 0.24 0.05 0.21 0.97 16.1 3.4 14.3 66.2 Generalizability (ρ 2 ) Dependability () 0.71 (0.31) 0.65 (0.36) Standard error of measurement. PN scores and both internal (other CSA component scores) and external performance measures. Mean PN scores, stratified by various candidate background variables, were calculated. Effect sizes (Prentice and Miller, 1998) are provided as a measure of the degree and meaningfulness of any group-based differences. The SAS system software was used for all analyses (SAS Institute, 1989). Results D ESCRIPTIVE STATISTICS The mean PN rating, over the 61,497 notes, was 5.4 (SD = 1.2, min = 1, max = 9). The mean candidate PN score, averaged over all encounters taken, was 5.4 (SD = 0.6, min = 3.0, max = 7.4). G ENERALIZABILITY ANALYSES Variance components for patient note ratings are presented in Table I. The relatively small estimate of variance attributable to the cases indicates that individual tasks (i.e., summarizing the information from the patient encounter) tend to be of relatively equal difficulty. The non-zero Rater(Case) [rater nested in case] variance component suggests that case mean scores can fluctuate as a function of which PNR provides the ratings. The generalizability coefficient (ρ 2 ), which does not take into account variations in case difficulty or rater stringency, was 0.71, indicating a moderate consistency in PN ratings over cases. The dependability coefficient was 0.65 (SEM = 0.36). C ORRELATIONAL STUDIES Investigating the associations between PN scores and external variables can provide information regarding the degree to which these relationships are consistent 54 JOHN R. BOULET ET AL. Table II. Relationship of PN scores with external and internal variables Corelation External variables USMLE Step 1 (basic science) USMLE Step 1 (# attempts) USMLE Step 2 (clinical science) USMLE Step 2 (# attempts) Test of English as a Foreign Language (TOEFL) Internal variables Spoken English proficiency (ENG) Interpersonal Skills (IPS) Doctor-Patient Communication (COM) Data Gathering (DG) 0.22 –0.15 0.28 –0.19 0.32 0.29 0.39 0.40 0.51 with the construct underlying the proposed test interpretations. Similar information can be obtained by investigating the internal structure of the test scores. Correlations of PN scores with external and internal (other CSA components) variables are presented in Table II. The correlations provided in Table II indicate that the ability to interpret and synthesize the data gathered in the patient encounter is related to overall medical ability. These correlations are not corrected for attenuation and therefore underestimate what the relationships would be if measurement error was not present. Moderate correlations between the PN score and USMLE Step 1 (basic science) and Step 2 (clinical science) scores suggest that, while somewhat different abilities are being measured, candidates with stronger basic and clinical science backgrounds are better able to summarize the information gathered in the patient encounter. The negative correlation with USMLE Step attempts also supports this premise in that individuals who fail basic science or clinical science, and have to repeat these tests, would tend to be of lower ability. The moderate correlation between TOEFL scores and PN scores was expected in that TOEFL contains a mandatory essay and, therefore, there should be some overlap in the constructs being measured. Here, lower writing abilities are associated with lower PN scores. The correlations between the other CSA component scores and the PN score were also informative. There was 26% shared variance between data gathering and PN scores. If relevant information cannot be solicited from the patient, the task of synthesizing and interpreting the data would definitely be more difficult, and average scores would be expected to be lower. Doctor-patient communication ratings, including the spoken English and interpersonal elements, were moderately related to PN scores. 55 ASSESSING WRITTEN COMMUNICATION IN MEDICAL Table III. Comparison of mean PN scores for select candidate cohorts English as a native language US citizen at medical school Pass CSA Yes No Effect size 5.53 5.44 5.51 5.34 5.37 4.72 0.32 0.12 1.50 Table IV. Comparison of candidate PN scores by background variables Patient note time was sufficient Introductory medical charting course Previous experience with standardized patients Language of instruction at medical school (English) Specialized clinical skills course Yes No Effect size 5.41 5.39 5.54 5.48 5.36 5.34 5.39 5.36 5.23 5.40 0.12 0.00 0.31 0.42 –0.07 The comparison of PN scores by group membership variables can also provide information to support the interpretation of test scores. Mean performance on the PN exercise by select candidate cohorts is presented in Table III. As expected, candidates with English as a native language obtained significantly higher PN scores than those who did not. Since English speaking and writing abilities are related, and organization and quality of information are two of the traits measured via the PN exercise, one would anticipate that native language speakers would be better able to summarize the data gathered in the encounter. Although candidates cannot fail CSA based solely on PN performance, those individuals who did not meet standards obtained significantly lower scores. P OST- CSA QUESTIONNAIRE The candidates were specifically asked whether the 10-minute time limit was adequate for the patient note exercise. Over 76% of the candidates indicated that the time was sufficient. A comparison of mean PN scores by relevant background variables is presented in Table IV. Candidates who had had previous experience with standardized patients or who had attended medical schools where the language of instruction was English scored significantly higher on the PN exercise. Interestingly, candidates who claimed to have taken a specialized clinical skills course performed less well than those who did not. 56 JOHN R. BOULET ET AL. Discussion All clinical skills evaluations, regardless of purpose, structure, content or scoring, must be designed so that reliable and valid assessment scores and/or decisions can be made. One part of the validation process involves gathering examinee data and investigating whether scoring patterns and data associations make sense. Although numerous studies have been done to investigate the psychometric properties of CSA component scores (Ayers and Boulet, 2001; Boulet et al., 1998a, 2001, 2002) recent changes to the PN scoring rubric and rater training regimen mandate that additional data be collected and analyzed. While changes to the scoring rubric were primarily based on rater feedback, and were tested in pilot studies, there is still a need to collect new evidence to support the use of the PN exercise and to ensure that potential sources of irrelevant variance are not compromising test score interpretations. The generalizability analysis indicated that moderately reproducible PN scores could be obtained using the existing training methods and 9-point rating rubric. The values obtained were comparable to those reported elsewhere (Boulet et al., 1998), and acceptable for this type of performance assessment. The non-zero rater(case) component suggests that the choice of rater (for a given case) may have some impact on candidate scores. That is, the relative stringency, or leniency, of raters for a particular case is not equivalent. Nevertheless, the PNs (n = 10) for a given candidate are distributed to an average of more than eight different raters. Therefore, any so-called “hawk” and “dove” effects would tend to cancel out. It should also be noted that CSA pass/fail decisions are not based solely on the PN, but a composite that also includes the data gathering scores. As a result, measurement error will be further minimized. The reproducibility of the PN scores could be enhanced through changes in the rating design (e.g., multiple ratings of each note), or score adjustments based on some form of rating scale analysis (Australian Council for Educational Research, 1998). However, an inspection of the variance components suggests that, consistent with previous research (Swanson and Norcini, 1989; van der Vleuten et al. 1991), reproducibility gains would best be achieved by increasing the number of tasks (PNs), not the number of raters per given task. Currently, ECFMG is addressing potential measurement concerns through enhanced quality assurance procedures, including inter- and intra-rater score comparisons, additional rater training, benchmark note comparisons, and regular rater feedback. Obtaining evidence for the validity of PN scores can take many forms. Analyses of the relationship of PN scores to variables external to the assessment, combined with analyses exploring the internal structure of the test, can both be used to make judgments concerning validity (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999). The moderate correlations between PN scores and both basic science and clinical science examination scores were expected. For the ASSESSING WRITTEN COMMUNICATION IN MEDICAL 57 PN exercise, candidates are required to document the pertinent positive and negative findings, provide a list of plausible differential diagnoses, and generate an initial diagnostic management plan. Applying the knowledge and understanding of key concepts of basic biomedical science (USMLE Step 1) and clinical science considered essential for the provision of care under supervision (USMLE Step 2) should be related, at least positively, to the aforementioned PN tasks. Likewise, since TOEFL requires listening, structure, writing, and reading skills, one would expect that candidates with higher scores would be better able to record, interpret and organize information for the PN exercise. The moderate correlations between PN scores and various other CSA component scores suggest some overlap in, or interrelationships amongst, the traits being measured. Given that it would be very difficult to write a proper note if there were flaws in the data gathering process, the relatively strong correlation between PN and data gathering scores was predictable. Overall, both the external and internal test score relationships provide additional evidence for the validity of the PN scores. Responses to the post-CSA questionnaire provided additional data to support the validity of PN ratings. Candidates who perceived the time limits to be adequate performed significantly better on the task. It would be expected that these individuals would be more proficient at synthesizing the information from the encounter and summarizing it on the note. Unfortunately, the perceived adequacy of the time limit may simply reflect comfort with the task, not necessarily actual time pressures. Therefore, additional studies are needed to investigate the relationship between actual time spent composing the clinical summaries and the resultant scores. Candidates who completed their medical training in English outperformed candidates who had been instructed in a language other than English. Here, it would be expected that the writing skills of those candidates who were taught in English would be better than those who were not educated in English, resulting in higher PN scores. Finally, those candidates who had had previous SP experience scored significantly higher on the PN exercise. Familiarity with the task of interviewing simulated patients would be expected to diminish anxiety and subsequently lead to improved data gathering skills. If this added information finds its way onto the patient note, scores would be expected to increase. Interestingly, candidates who claimed to have taken a specialized clinical skills training course actually performed less well on the PN exercise than those who did not. While there are a number of possible explanations for this seemingly contradictory result, it is likely that those candidates who chose to take a course were initially of much lower ability than those who did not. In fact, candidates who took a specialized course had significantly lower USMLE Step and TOEFL scores, and significantly more USMLE Step attempts. Overall, the direction and moderate strength of the relationships between candidate PN scores and variables related to prior training provides additional evidence to support the validity of the written exercise. The PN exercise is an important part of the assessment of the readiness of medical school graduates to enter graduate training programs. The ability to inter- 58 JOHN R. BOULET ET AL. pret and synthesize medical information, and present this data in a reasoned and rational manner, is certainly a skill that is required for effective medical practice. As such, it is important that medical school graduates be assessed in this domain. Furthermore, given that skill deficiencies in this area can have profound consequences, both for the candidate in a high-stakes testing situation and for the patient in real-life medical encounters, it is imperative that any resultant proficiency measures are reproducible and valid. Although some improvements to the assessment process could be entertained, the ECFMG CSA PN exercise currently provides an effective means of assessing the written communication skills of medical school graduates. Appendix A Patient Note Scoring Rubric Note component Patient note scoring anchors Unacceptable 1 Medical history and physical examination • • • Acceptable 2 3 4 Note is disorganized with elements inappropriately interspersed, information is ambiguous, legibility is poor Information is inaccurate, spurious or rote, such that a medical reader would have difficulty grasping the nature of the case Significant positives or negatives are omitted or inappropriately detailed findings obscure key elements • • • Superior 5 6 7 Organization of note shows generally ordered approach to case, with little ambiguity, and acceptable legibility Contains some inaccurate, spurious, or rote information, but a medical reader would be able to understand the nature of the case. Provides enough information for adequate patient treatment. Significant positives or negatives are included by may be inappropriately detailed • • • 8 9 Organization of note reflects ordered approach to case, information is clear and legible Information is accurate and relevant and presented in a way that makes clear to a medical reader the nature of the case Significant positive and negative elements of history and physical are recognized as such by inclusion in note Differential diagnosis • Differential diagnosis is inconsistent with findings or supporting findings are lacking • Differential diagnosis generally consistent with findings, but some supporting findings may be lacking • Findings are correctly interpreted as reflected in a list of reasonable differential diagnoses Diagnostic work-up • Proposed work-up is inconsistent with diagnoses being entertained or is rote (“shotgunning” or is oblivious to any cost containment • Proposed work-up is consistent with diagnoses, but may be rote or oblivious to cost containment • Proposed work-up is consistent with diagnoses being entertained, reasonably specific to case and reflects awareness of cost containment Descriptors of Traits Measured in the Patient Note Organization: Clear portrayal of patient problem; order of assessment and plan are reasonable. Quality of information: Information presented with appropriate detail and includes significant positive and negative elements of history and physical. Interpretation of data: Correct interpretation of data gathered is reflected in reasonable differential diagnoses. ASSESSING WRITTEN COMMUNICATION IN MEDICAL 59 Egregious/dangerous actions: Avoids diagnostic management plans that could result in harm or expensive, non-indicated diagnostic tests. Legibility: Easily read with little effort required. Note 1 Although each CSA administration involves 10 patient encounters, there are a few cases (e.g., pre-employment physical examination) that do not require a written summary. Therefore, for 6,225 candidates, there will be fewer than 62,250 PNs. References American Educational Research Association, American Psychological Association, National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: Author. Australian Council for Educational Research (1998). ACER ConQuest: Generalized Item Response Modelling Software. Melbourne, Australia: Author. Ayers, W.R. & Boulet, J.R. (2001). Establishing the validity of test score inferences: Performance of 4th-year U.S. medical students on the ECFMG Clinical Skills Assessment. Teaching and Learning in Medicine 13(4): 214–220. Boulet, J.R., Ben David, M.F., Ziv, A., Burdick, W.P., Curtis, M. & Peitzman, S. et al. (1998a). Using standardized patients to assess the interpersonal skills of physicians. Academic Medicine 73(10 Suppl.): 94–96. Boulet, J.R., Friedman Ben-David, M., Hambleton, R.K., Burdick, W.P., Ziv, A. & Gary, N.E. (1998b). An investigation of the sources of measurement error in the post-encounter written scores from standardized patient examinations. Advances in Health Sciences Education: Theory and Practice 3: 89–100. Boulet, J.R., Friedman Ben-David, M., Ziv, A., Burdick, W.P. & Gary, N.E. (2000). The use of holistic scoring for post-encounter written exercises. In D. Melnick (ed.), Proceedings of the Eighth Ottawa Conference on Medical Education and Assessment (pp. 254–260). Philadelphia, PA: National Board of Medical Examiners. Boulet, J.R., McKinley, D.W., Norcini, J.J. & Whelan, G.P. (2002). Assessing the comparability of standardized patient and physician evaluations of clinical skills. Advances in Health Sciences Education: Theory and Practice 7: 85–97. Boulet, J.R., van Zanten, M., McKinley, D.W. & Gary, N.E. (2001). Evaluating the spoken English proficiency of graduates of foreign medical schools. Medical Education 35(8): 767–773. Brennan, R.L. (2001). Generalizability Theory. New York: Springer-Verlag. Cradock, J., Young, A.S. & Sullivan, G. (2001). The accuracy of medical record documentation in schizophrenia. Journal of Behavioral Health Services & Research 28(4): 456–465. Crossley, G.M., Howe, A., Newble, D., Jolly, B. & Davies, H.A. (2001). Sheffield Assessment Instrument for Letters (SAIL): Performance assessment using outpatient letters. Medical Education 35(12): 1115–1124. Friedman Ben-David, M.F., Boulet, J.R., Burdick, W.P., Ziv, A., Hambleton, R.K. & Gary, N.E. (1997). Issues of validity and reliability concerning who scores the post-encounter patientprogress note. Academic Medicine 72(10 Suppl. 1): 79–81. Friedman Ben-David, M.F., Klass, D.J., Boulet, J., De Champlain, A., King, A.M. & Pohl, H.S. et al. (1999). The performance of foreign medical graduates on the National Board of Medical Examiners (NBME) standardized patient examination prototype: A collaborative study of the NBME and the Educational Commission for Foreign Medical Graduates (ECFMG). Medical Education 33(6): 439–446. 60 JOHN R. BOULET ET AL. Grace-Farfaglia, P. & Rosow, P. (1995). Automating clinical dietetics documentation Journal of the American Dietetic Association 95(6): 687–690. Hallock, J.A. ECFMG and the Challenges Facing International Medical Graduates (2002). Reporter 11(8): 2–3. Washington, DC: Association of American Medical Colleges. Harden, R.M. & Gleeson, F.A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education 13(1): 41–54. Howell, J., Chisholm, C., Clark, A. & Spillane, L. (2000). Emergency medicine resident documentation: Results of the 1999 American Board of Emergency Medicine in-training examination survey. Academic Emergency Medicine 7(10): 1135–1138. Larimore, W.L. & Jordan, E.V. (1995). SOAP to SNOCAMP: improving the medical record format. Journal of Family Practice 41(4): 393–398. Medical Council of Canada (2002). Qualifying Examination Part II, Information Pamphlet. Ottawa, Canada: Author. Prentice, D.A. & Miller, D.T. (1998). When small effects are impressive. In A.E. Kazdin (ed.), Methodological Issues and Strategies in Clinical Research (pp. 163–173). Washington, DC: American Psychological Association. Regehr, G., MacRae, H., Reznick, R.K. & Szalay, D. (1998). Comparing the psychometric properties of checklists and global rating scales for assessing performance on an OSCE-format examination. Academic Medicine 73(9): 993–997. Rollnick, S., Kinnersley, P. & Butler, C. (2002). Context-bound communication skills training: Development of a new method. Medical Education 36(4): 377–383. SAS Institute, Inc. (1989). SAS/STAT User’s Guide: Version 6 (4th ed.), Cary, NC: Author. Shapiro, J. (1999). Correlates of family-oriented physician communications. Family Practice 16(3): 294–300. Slater, S.C. & Boulet, J.R. (2001). Predicting holistic ratings of written performance assessments from analytic scoring. Advances in Health Sciences Education: Theory and Practice 6(2): 103– 119. Sleszynski, S.L., Glonek, T. & Kuchera, W.A. (1999). Standardized medical record: A new outpatient osteopathic SOAP note form: Validation of a standardized office form against physician’s progress notes. Journal of the American Osteopathic Association 99(10): 516–529. Swanson, D.B. & Norcini, J.J. (1989). Factors influencing reproducibility of tests using standardized patients. Teaching and Learning in Medicine 1(3): 158–166. van Dalen, J., Kerkhofs, E., Verwijnen, G.M., Knippenberg-Van Den Berg, B.W., van Den Hout, H.A., Scherpbier, A.J. et al. (2002). Predicting communication skills with a paper-and-pencil test. Medical Education 36(2): 148–153. van der Vleuten, C.P., Norman, G.R. & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education 25(2): 110–118. Weber, D.O. (2002). Charting a course toward legible medical records: Perfect paperwork can mean financial savings, better patient care. Physician Executive 28(1): 8–13. Williams, S., Weinman, J. & Dale, J. (1998). Doctor-patient communication and patient satisfaction: A review. Family Practice 15(5): 480–492.