ClinicalBench

ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for effectively and comprehensively evaluating the clinical diagnostic capabilities of LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalBench consists of 8 clinical diagnostic tasks. We ensure that ClinicalBench does not have data leakage. We evaluate the clinical diagnostic capabilities of LLMs in two dimensions. The task dimension measures the performance of each model in different tasks, while the department dimension evaluates the performance difference of each model across various medical specialties.

Model	Institution	Department Guide		Clinical Diagnosis						Imaging Diagnosis	Overall
Model	Institution	Acc	DIFR	PD	DB	DD	FD	PT	TP	Imaging Diagnosis	DWR	CDR	Acceptability	Avg.

ClinicalAgent

We propose ClinicalAgent, an End-to-End Clinical Agent Aligned with Real-World Multi-Departmental Clinical Diagnostic Practices. ClinicalAgent covers the entire process starting from the moment a patient enters the clinic and ending when the patient is discharged, which includes six key steps: 1) department guide; 2) preliminary consultation; 3) laboratory examination; 4) imageological examination; 5) final consultation; 6) medical treatment.

Model	Institution	Automatic Score				Human Score	GPT-4o Score
Model	Institution	DWR	CDR	Acceptability	Avg.	Human Score	GPT-4o Score

ClinicalMetrics

We propose four novel metrics to precisely measure the effectiveness of LLMs in department guidance and their clinical diagnostic capabilities. These metrics are Department Win Rate (DWR), Department Instruction Following Rate (DIFR), Comprehensive Diagnostic Accuracy (CDA), and Acceptability.