Large model evaluation platform OpenCompass

insert image description here

Introduction to OpenCompass

OpenCompass is a one-stop platform for large model evaluation. Its main features are as follows:

  • Open source and reproducible : provide a fair, open and reproducible large model evaluation solution

  • Comprehensive capability dimension : designed in five dimensions, providing a model evaluation plan with about 300,000 questions in 50+ data sets, and comprehensively evaluating model capabilities

  • Rich model support : already supports 20+ HuggingFace and API models

  • Distributed and efficient evaluation : one line of command realizes task segmentation and distributed evaluation, and the full evaluation of 100 billion models can be completed in a few hours

  • Diversified evaluation paradigms : support zero-sample, small-sample, and chain-of-thinking evaluations, combined with standard or conversational prompt word templates, to easily stimulate the maximum performance of various models

  • Flexible expansion : Want to add new models or datasets? Want to customize more advanced task splitting strategies, or even connect to the new cluster management system? Everything about OpenCompass is easily scalable!

performance list

We will continue to provide specific performance lists of open source models and API models, see OpenCompass Leaderbaord . If you want to join the evaluation, please provide the address of the model warehouse or the standard API interface to the mailbox [email protected].

Dataset support

model support

Install

The steps for express installation are shown below. Some third-party functions may require additional steps to function properly. For detailed steps, please refer to the installation guide .

conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/InternLM/opencompass opencompass
cd opencompass
pip install -e .
# 下载数据集到 data/ 处
wget https://github.com/InternLM/opencompass/releases/download/0.1.0/OpenCompassData.zip
unzip OpenCompassData.zip

review

Please read the quick start to learn how to run an evaluation task.

https://opencompass.readthedocs.io/zh_CN/latest/get_started.html#id2

Guess you like

Origin blog.csdn.net/yanqianglifei/article/details/131849761