logo

CodeScope

An Execution-based Multilingual Multitask Multidimensional Benchmark
for Evaluating LLMs on Code Understanding and Generation
(2023)

 


Introduction


CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length.


Category Task Detailed Result #Languages #Test Samples Avg. #Tokens/Sample

Leaderboard


Ranking Model Organization CodeScope     CodeScope     CodeScope    
(Understanding) (Generation) (Overall)

Citation


@misc{yan2023codescope,
      title={CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation},
      author={Weixiang Yan and Haitian Liu and Yunkun Wang and Yunzhe Li and Qian Chen and Wen Wang and Tingyu Lin and Weishan Zhao and Li Zhu and Shuiguang Deng and Hari Sundaram},
      year={2023},
      eprint={2311.08588},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Submission


When you build an LLM that meets your expectations, you can submit your test results for official reporting. To submit your model evaluation results on CodeScope, please submit the following information:

  1. The evaluation results of your LLM on each task of the CodeScope benchmark, please follow the evaluation metrics in our paper.
  2. Individual/Team Organization: The name of the organization where the individual or team appears in the leaderboard.
  3. Information about your LLM: The name of the LLM that appears in the leaderboard.
  4. Your paper information: If the LLM is from a published work, the name and URL of the paper will appear on the leaderboard.

Submit the above information by email to yanweixiang.ywx@gmail.com and we will respond to your email within 72 hours.

Contact Us


Have any questions about CodeScope? Please contact us at yanweixiang.ywx@gmail.com or create an issue on Github.