ClinicalBench


ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for effectively and comprehensively evaluating the clinical diagnostic capabilities of LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalBench consists of 8 clinical diagnostic tasks. We ensure that ClinicalBench does not have data leakage. We evaluate the clinical diagnostic capabilities of LLMs in two dimensions. The task dimension measures the performance of each model in different tasks, while the department dimension evaluates the performance difference of each model across various medical specialties.



Model Institution Department Guide Clinical Diagnosis Imaging Diagnosis Overall
Acc DIFR PD DB DD FD PT TP DWR CDR Acceptability Avg.

ClinicalAgent


We propose ClinicalAgent, an End-to-End Clinical Agent Aligned with Real-World Multi-Departmental Clinical Diagnostic Practices. ClinicalAgent covers the entire process starting from the moment a patient enters the clinic and ending when the patient is discharged, which includes six key steps: 1) department guide; 2) preliminary consultation; 3) laboratory examination; 4) imageological examination; 5) final consultation; 6) medical treatment.



Model Institution Automatic Score Human Score GPT-4o Score
DWR CDR Acceptability Avg.

ClinicalMetrics


We propose four novel metrics to precisely measure the effectiveness of LLMs in department guidance and their clinical diagnostic capabilities. These metrics are Department Win Rate (DWR), Department Instruction Following Rate (DIFR), Comprehensive Diagnostic Accuracy (CDA), and Acceptability.