Welcome to Zeno
Learn about Zeno or sign in or sign up to create and see your projects and reports.
11
TruthfulQA
a13x
TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.
9
HellaSwag
a13x
HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.
9
MMLU
a13x
MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.
9
What does the OpenLLM Leaderboard measure?
a13x
An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.
8
GPT MT Benchmark Report
cabreraalex
Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.
7
Exploring the WebArena Agent Environment
cabreraalex
6
Web Arena
cabreraalex
5
ARC
a13x
ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.
5
DiffusionDB
cabreraalex
Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/
5
Audio Transcription Accents
cabreraalex
Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.
5
Gemini MMLU
a13x
5
Audio Transcription Report
cabreraalex
Analysis of OpenAI's Whisper models across demographic groups.
4
Whisper Audio Transcription Comparison
cabreraalex
Test of audio transcription
4
Flores Translation Evaluation
aashiqmuhamed
4
Gemini BBH
a13x
4
What's in the Updated OpenLLM Leaderboard?
cabreraalex
Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.
3
Gemini Evaluation - HumanEval
gneubig
Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on HumanEval dataset
3
Gemini Evaluation - MawpsMultiArith
sakter
Evaluation of Gemini, GPT-4, and Mixtral on MawpsMultiArith dataset
3
Gemini Evaluation - MMLU
zichunyu
Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on MMLU dataset