Zeno AI Evaluation Platform

GPT MT Benchmark

cabreraalex

20.2k

4

TruthfulQA

a13x

817

8

TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.

MMLU

a13x

14k

8

MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.

HellaSwag

a13x

10k

8

HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.