Correlation between clinicians-assigned weights to findings and their diagnostic odd ratio; case of congestive heart failure
© The Author(s). 2016
Received: 22 May 2016
Accepted: 18 September 2016
Published: 23 September 2016
Incorrect estimation of pretest probability and misinterpretation of test results can change post-test probability in medical decision making. The aim of this study was to evaluate how physicians assess weight of findings of congestive heart failure (CHF) and how much their estimation is correlated with findings’ Diagnostic Odd Ratio (DOR).
The participants were asked to answer a questionnaire based on a scenario of a patient having dyspnea. Eighteen findings in 3 categories including: history, examination and radiographic findings were inserted along a column and a row as a matrix. The respondents had to compare each finding in the column with all other findings in the row and insert a mark in boxes below the findings of the row that had greater weight compared to the finding in the column. The weight of each finding was considered as total number of “marked boxes” in front of that finding. DOR of findings was calculated using their positive and negative likelihood ratios (LRs) based on current best evidence. Findings ranked in the order of their DOR and were compared with the ranking in the order of participants-assigned weights. We examined correlation between average weights assigned by physicians and DOR of findings. In subgroup analysis correlations between average weights assigned by physicians and DOR of history, examination and radiographic findings were examined.
Seventy five physicians completed the questionnaire. Correlation between ranking in the order of findings’ DOR and ranking in the order of clinicians-assigned weights was significant (p-value = 0.005 r = 0.64). In contrast correlations between participants-assigned weights and DOR of history, examination and radiographic findings were positive but non- significant (r = 0.181, p-value = 0.7, r = 0.343, p-value = 0.506 and r = 0.219, p-value = 0.723 respectively).
Our result show that although correlation between clinicians-assigned weights and DOR of entire findings was significant, correlations between clinicians-assigned weights to the different categories of findings and their DOR were not significant. Reevaluating probabilistic reasoning by emphasis on using LRs can make pretest probability estimating and interpretation of test results more objective and would ultimate in more precise and homogenous post-test probabilities.
KeywordsCongestive heart failure Evidence based medicine Likelihood ratio
Diagnostic errors are multi-factorial and could be categorized into 3 types: 1. Patient-related errors: due to atypical manifestation of disease. 2. Health service-related errors: due to defects in health services 3. Cognitive Errors or physician-related errors: due to mistakes in data collecting, lack of knowledge or impaired reasoning [1–3]. In order to decrease cognitive errors, probabilistic diagnostic reasoning based on Bayes theorem, which is also called as threshold approach, was integrated into evidence-based medicine (EBM) to provide the best and the most relevant approach for clinical practice . According to the threshold approach there are 3 steps in diagnostic process: pretest probability estimating, using likelihood ratio (LR) of tests and post test probability estimating. Primary estimation of a disease’s probability that is called pretest probability is combined with LR of diagnostic tests. The outcome is post-test probability .1 Error in pretest probability estimating or using LR of tests leads to error in estimation of post-test probability and subsequent decision(s). Overestimation of the pretest probability results in unnecessary or invasive treatments and underestimating the prior probability leads to misuse of diagnostic tests or not treating the patient [6, 7]. Estimation of pretest probability depends on clinician’s intuition and judgment about the disease and disease’s prevalence [8–13]. The first component makes estimation of pretest probability subjective that leads to heterogeneity and controversy over the resulted post-test probability [6, 13].
Besides the studies that have indicated on the importance of the second step (using LR of tests) in probabilistic reasoning, several studies have indicated on the importance of the first step or pretest probability estimation by the physicians, residents and medical students [9, 12, 14, 15] and their difficulties in estimating both pretest probability and likelihood ratio of tests [6, 7, 16–18]. In these studies clinicians had to approach to a scenario and estimate the likelihood of some differential diagnoses. This approach could detect the discrepancies in estimation of pretest probability of disease but not address the details and underlying causes.
Owing to high prevalence and burden of congestive heart failure (CHF), a patient with CHF was chosen. The aim of this study was to evaluate how experts and newly graduated physicians estimate weight of clinical and radiographic findings in approaching a patient suspected of CHF and how much their estimation correlated with findings’ actual weight based on their LR.
Study population and design
This cross-sectional study was done during the international CHF symposium, which was held for general practitioners, internists, cardiologists and pulmonologists in Tehran by Shahid Beheshti University of Medical Sciences. Participants were asked to answer a pre-prepared questionnaire. They had to complete the questionnaire based on their estimation of weight of findings in diagnosis of CHF.
Clinicians-assigned weights to each clinical finding
The questionnaire had two parts. The first part included questions regarding the participants’ specialty, academic affiliation and number of years working as a clinician. The second part consisted of a scenario of a patient having dyspnea who was admitted to emergency department to be examined for CHF and a matrix with 18 findings in 3 categories including: 1.History findings: history of heart failure, myocardial infarction, coronary artery disease, paroxysmal nocturnal dyspnea, orthopnea, edema, dyspnea on exertion. 2. Examination findings: third heart sound, jugular venous distension, abdominojugular reflux, rales, any murmur, lower-extremity edema. 3. Radiographic findings: pulmonary venous congestion, interstitial edema, alveolar edema, cardiomegaly, and pleural effusions in addressing the CHF.
The matrix was a table that was used to determine the relative priority of findings. In this matrix all the findings were listed in the first column as reference findings. The same findings were repeated in a row as comparative finding. A box to make a mark was assigned below each finding of the row in front of the reference findings. The participants had to make a mark below those findings that had lesser weight in comparison with the reference finding so the reference finding had greater weight in comparison with the marked findings in the row and lesser weight in comparison with findings with empty boxes. The weight of each reference finding is calculated by total number of “marked boxes” in front of that finding.
We checked accuracy and precision of all the questionnaires in terms of weighing of the findings. As an example if in a completed questionnaire the finding A had greater weight in comparison with the finding B and the finding B had greater weight compared to the finding C, so the finding A had to have greater weight in comparison with the finding C.
History of Heart failure
Coronary artery disease
Dyspnea on exertion
Third heart sound
Jugular venous distension
Pulmonary venous congestion
History of Heart failure
Sensitivity, specificity and likelihood ratio of each clinical finding
We obtained positive and negative LRs (LR+ and LR- respectively), Sensitivity and Specificity of each finding from current best evidence. Database of Medline was searched through Pubmed search engine using following key words :“left sided heart failure”[Mesh] OR “congestive heart failure”[Mesh] combined with “diagnostic accuracy” OR “physical examination” OR “medical history taking” OR “sensitivity and specificity” OR “Bayes Theorem” to identify potentially-relevant articles. A systematic review published by JAMA in 2005 has reported the Sensitivity, Specificity and LRs of each finding [19, 20]. These reported characteristics were used as reference and clinicians- estimated weights were compared with them.
We analyzed correlation between average weights assigned by physicians and DOR of findings. In subgroup analysis correlations between average weights assigned by physicians and DOR of history finding, examination finding and radiographic findings were examined separately. Correlations between the average weights assigned by faculty members and non-faculty members, specialists and subspecialists and expert and novice physicians were analyzed separately using independent sample t-tests. (Expert: was defined as a physician who had clinical experience more than 6 years. Novice: Was defined as a physician who had clinical experience of 6 years or less.)
Seventy five completed questionnaires out of 200 questionnaires were returned (37.5 %). Thirty six (48 %) out of 75 were expert and 39 (52 %) were novice, 68 (90 %) were specialist and seven were subspecialists, and also 27 (36 %) were faculty member while 48 (64 %) were non-faculty member.
Findings of CHF and their characteristics, ranking in the order of findings’ calculated DOR and in the order of the average of clinicians-assigned weights
Coronary artery disease
Paroxysmal nocturnal dyspnea
Dyspnea on exertion
Third heart sound
Jugular venous distension
Pulmonary venous congestion
History of coronary artery disease and history of myocardial infarction ranked as the 2 least important findings by participants, whereas according to the ranking in the order of findings’ DOR, 2 findings with the least importance were the History of coronary artery disease and Dyspnea on exertion (Table 1).
Correlation between weights assigned by different groups of physicians to the findings of CHF and DOR of different categories of findings
Categories of findings
Groups of physicians
Correlation coefficient (spearman’s rho)
Correlation between the weights assigned by different groups of physicians to the different categories of findings of CHF
Expert with novice (correlation coefficient, p-value)
Faculty member with non-faculty member (correlation coefficient, p-value)
ρ * = 0.937, P-value =0.002
ρ * = 0.827, P-value = 0.023
ρ * = 0.830, P-value = 0.041
ρ * = 0.296, P-value = 0.569
ρ * = 0.727, P-value = 0.164
ρ * = 0.589, P-value = 0.296
Significant correlations have been demonstrated between weights assigned by experts and novices to the history and examination findings and between faculty members and non-faculty members to the history findings.
While managing a patient, physicians seek future findings in accordance with and following to previous findings. When all the findings are presented simultaneously, clinicians compare each finding with all the findings of all groups (history, examination and radiographic findings). Giving primacy to the findings in this situation would be easier than the situation that clinicians face a particular group of findings and make comparison between the findings of that specific group. As in our subgroup analysis correlations between clinicians-assigned weights and DOR of different categories of findings were not statistically significant. These results suggest that clinicians’ estimation of weight of findings of each group is not correlated with their actual value based on findings’ LRs. This lack of correlation could be interpreted as mismanagement of different steps of diagnostic process based on Bayes theorem including pretest probability estimation and test interpretation that could be ended in wrong estimation of post-test probability. These result are in accordance with previous studies in this area [6, 10, 12, 16–18, 22].
Estimation of weight of radiographic findings in this study was as the second step of the threshold approach in diagnostic process. In regard to the importance of this part of diagnosis [23–25] incorrect estimation of weight of radiographic findings and lack of correlation between clinicians-estimated weights and DOR of these findings distort the estimation of post-test probability. But this step is affected by the estimation of primary probability that is somewhat subjective and more complicated, because according to Bayes theorem, estimation of primary probability depends on prevalence of diseases in clinical setting and also clinicians’ intuition about patient. While it has been shown that estimation of primary probability by physicians is usually not exact and there is a great deal of variation among them in terms of estimation of primary probability. This variation has been demonstrated among different expertise levels and different medical conditions [9, 12, 14, 15, 26]. These studies have been focused on the estimation of primary probability of a particular disease as main issue while the present study has focused on underlying cause of incorrect estimation of pretest probability and it has been clarified that physicians are not able to weigh clinical findings accurately according to their category and their position in the diagnostic process.
Surprisingly, there was a significant correlation between the weights assigned by experts and novices to the history and examination findings whereas correlation between the weights assigned by these two groups of clinicians to the radiographic findings was not statistically significant. However weights assigned by these groups of clinicians were not correlated with the DOR of different groups of findings (Table 3). Although Allen et al. have reported that experts in comparison with novices are more successful in producing hypothesis, choosing appropriate reference and solving inconsistencies as result of their past knowledge and experience , these items indicate their superiority in making differential diagnoses. If these qualifications were not combined with updated evidence may cause errors in estimating actual weight of findings. As a result the same mistake is made in ruling in or out of a diagnosis among expert and novice clinicians that is equal to making mistakes in estimation of pretest probability as well as one study showed that experience did not decrease the variance of primary probability estimation .
Another result of this study was the differences between faculty members and non-faculty members. Although there was not any significant correlation between the weights assigned by faculty members and non-faculty members and DOR of different groups of findings, correlations between the weights assigned by faculty members and DOR of history and examination findings were stronger than non-faculty members (r(FM) = 0.419 vs. r(NFM) = - 0.102 and r(FM) = 0.758 vs. r(NFM) = -0.251 respectively).3 On the other hand correlation between weights assigned by non-faculty members and DOR of the radiographic findings was stronger than correlation between weights assigned by faculty members and DOR of radiographic finding (r(NFM) = 0.616 vs. r(FM) = 0.236). Although the sample size was low, it can be hypothesized that due to educational curriculum in teaching hospitals faculty members mostly emphasize on the value of findings of history taking and physical examination in their medical students training. Non-faculty members do not have this position and since most of their work time is spent in their private office, they are mostly dependent on laboratory and radiographic findings and may ignore the value of history and examination finding in their medical practice.
According to our study there was no difference between experts and novices in assigning weight to clinical findings but it cannot be a satisfactory reason to ignore other differences between experts and novices e.g. experts try to use several diagnostic approaches like pattern recognition in their diagnosis [27, 28]. Response rate to our questionnaire was 37.5 % that decreases the power of the study. Participants’ unfamiliarity with different design of this questionnaire might be a cause. Negative or non-significant correlations demonstrated between different groups of clinicians and different categories of findings would be due to low sample size of each group of clinicians. It is not obvious whether it is possible to generalize the result of this study to other clinical conditions of other specialties.
We concluded that although correlation between clinicians-assigned weights and DOR of entire findings was significantly positive, correlation between the weights assigned by clinicians to different categories of findings of CHF and their DOR was not statistically significant. Reevaluating probabilistic reasoning by emphasis on using LRs of clinical and para-clinical findings can make pretest probability estimation and interpretation of test results more objective and ultimate in more precise and homogenous post-test probability.
As a conclusion, there is a significant correlation between weights assigned by experts and novices to the history and examination findings and between faculty members and non-faculty members to the history findings.
Also, a significant positive correlation between the weights that clinicians assigned to the findings of CHF and DOR of findings. This means that clinicians were able to rank the findings were presented simultaneously, in an acceptable order whereas the situation of practice is completely different.
Our survey encountered some limitations that deserve comments. The findings were not categorized so the clinicians did not have opportunity to use their EBM skills. Using clinical scenarios, like which had been used in our questionnaire, might make experts use their prior knowledge based on text books instead of using best current evidence or their clinical experience.
TP: True positive, TN: true negative, FP: false positive, FN: false negative, PPV: positive predictive value, NPV: negative predictive value, Sen: sensitivity, Spe: specificity
FM: faculty member
NFM: non-faculty member
Congestive heart failure
Diagnostic Odd Ratio
Negative predictive value
Positive predictive value
No funding body.
Availability of data and materials
Available on request.
AS Study design, Data analysis and interpretation, Quality control of data and tables, Manuscript editing, Manuscript review. FSF Manuscript preparation, Manuscript editing, Manuscript review. AK Study design, Data analysis and interpretation. FSK Data acquisition, Quality control of data and tables, Statistical analysis, Manuscript preparation, Manuscript editing, Manuscript review. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Graber ML, Franklin N, Gordon R. Diagnostic error in internal medicine. Arch Intern Med. 2005;165(13):1493.View ArticlePubMedGoogle Scholar
- Graber M, Gordon R, Franklin N. Reducing diagnostic errors in medicine: what’s the goal? Acad Med. 2002;77(10):981–92.View ArticlePubMedGoogle Scholar
- Croskerry P. The importance of cognitive errors in diagnosis and strategies to minimize them. Acad Med. 2003;78(8):775–80.View ArticlePubMedGoogle Scholar
- Elstein AS, Schwarz A. Evidence base of clinical diagnosis: Clinical problem solving and diagnostic decision making: selective review of the cognitive literature. BMJ Br Med J. 2002;324(7339):729.View ArticleGoogle Scholar
- Pauker S, Kassirer J. Clinical decision analysis by personal computer. Arch Intern Med. 1981;141(13):1831.View ArticlePubMedGoogle Scholar
- Houben P, van der Weijden T, Winkens B, Winkens R, Grol R. Pretest expectations strongly influence interpretation of abnormal laboratory results and further management. BMC Fam Pract. 2010;11(1):13.View ArticlePubMedPubMed CentralGoogle Scholar
- Moayyeri A, Soltani A. Towards evidence-based diagnosis in developing countries: the use of likelihood ratios for robust quick diagnosis. Ann Saudi Med. 2006;26(3):211.PubMedGoogle Scholar
- Dolan JG, Bordley DR, Mushlin AI. An eualuation of clinicians’ subjective prior probability estimates. Med Decis Making. 1986;6(4):216–23.View ArticlePubMedGoogle Scholar
- Cahan A, Gilon D, Manor O, Paltiel O. Probabilistic reasoning and clinical decision-making: do doctors overestimate diagnostic probabilities? QJM. 2003;96(10):763–9.View ArticlePubMedGoogle Scholar
- Attia JR, Nair BR, Sibbritt DW, Ewald BD, Paget NS, Wellard RF, et al. Generating pre-test probabilities: a neglected area in clinical decision making. Med J Aust. 2004;180(9):449–54.PubMedGoogle Scholar
- Chambers DW, Mirchel R, Lundergan W. An investigation of dentists’ and dental students’ estimates of diagnostic probabilities. J Am Dent Assoc. 2010;141(6):656–66.View ArticlePubMedGoogle Scholar
- Phelps MA, Levitt MA. Pretest probability estimates: a pitfall to the clinical utility of evidence-based medicine? Acad Emerg Med. 2004;11(6):692–4.View ArticlePubMedGoogle Scholar
- Allen VG, Arocha JF, Patel VL. Evaluating evidence against diagnostic hypotheses in clinical decision making by students, residents and physicians. Int J Med Inform. 1998;51(2):91–105.View ArticlePubMedGoogle Scholar
- Guyatt G, Bass E, Brill-Edwards P, Holbrook A, Jaeschke R, Elizabeth Juniper M, et al. Users ‘Guides to the medical literature: III. How to use an article about a diagnostic test: I B. What are the results and will they help me in caring for my patients? J Am Med Assoc. 1994;271(9):703–7.View ArticleGoogle Scholar
- Cahan A, Gilon D, Manor O, Paltiel O. Clinical experience did not reduce the variance in physicians’ estimates of pretest probability in a cross-sectional survey. J Clin Epidemiol. 2005;58(11):1211–6.View ArticlePubMedGoogle Scholar
- Bianchi MT, Alexander BM, Cash SS. Incorporating uncertainty into medical decision making: an approach to unexpected test results. Med Decis Making. 2009;29(1):116–24.View ArticlePubMedGoogle Scholar
- Gul N, Quadri M. The clinical diagnostic reasoning process determining the use of endoscopy in diagnosing peptic ulcer disease. J Coll Physicians Surg Pak. 2011;21(9):548.PubMedGoogle Scholar
- Berner ES, Graber ML. Overconfidence as a cause of diagnostic error in medicine. Am J Med. 2008;121(5):S2–S23.View ArticlePubMedGoogle Scholar
- Straus SE, Richardson WS, Glasziou P, Haynes RB. Evidence-based medicine: how to practice and teach EBM. 2005.Google Scholar
- Zehtabchi S, Brandler ES. Does this patient have congestive heart failure? Ann Emerg Med. 2008;51(1):87–90.View ArticlePubMedGoogle Scholar
- Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129–35.View ArticlePubMedGoogle Scholar
- Graber MA, Bergus G, Dawson JD, Wood GB, Levy BT, Levin I. Effect of a patient’s psychiatric history on physicians’ estimation of probability of disease. J Gen Intern Med. 2000;15(3):204–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios. BMJ. 2004;329(7458):168–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Grimes DA, Schulz KF. Refining clinical diagnosis with likelihood ratios. Lancet. 2005;365(9469):1500–5.View ArticlePubMedGoogle Scholar
- Akobeng AK. Understanding diagnostic tests 2: likelihood ratios, pre‐and post‐test probabilities and their use in clinical practice. Acta Paediatr. 2007;96(4):487–91.View ArticlePubMedGoogle Scholar
- Mitchell AM, Garvey JL, Chandra A, Diercks D, Pollack CV, Kline JA. Prospective multicenter study of quantitative pretest probability assessment to exclude acute coronary syndrome for patients evaluated in emergency department chest pain units. Ann Emerg Med. 2006;47(5):447. e1.View ArticleGoogle Scholar
- Lyman GH, Balducci L. Overestimation of test effects in clinical judgment. J Cancer Educ. 1993;8(4):297–307.View ArticlePubMedGoogle Scholar
- Brooks LR, Norman GR, Allen SW. Role of specific similarity in a medical diagnostic task. J Exp Psychol Gen. 1991;120(3):278.