Welcome to Zeno
Learn about Zeno or sign in or sign up to create and see your projects and reports.
GPT MT Benchmark
Learn about Zeno or sign in or sign up to create and see your projects and reports.
GPT MT Benchmark
cabreraalex
TruthfulQA
a13x
TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.
MMLU
a13x
MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.
HellaSwag
a13x
HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.
What does the OpenLLM Leaderboard measure?
a13x
An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.
GPT MT Benchmark Report
cabreraalex
Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.
Exploring the WebArena Agent Environment
cabreraalex
Web Arena
cabreraalex
ARC
a13x
ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.
DiffusionDB
cabreraalex
Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/
Audio Transcription Accents
cabreraalex
Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.
Gemini MMLU
a13x
Audio Transcription Report
cabreraalex
Analysis of OpenAI's Whisper models across demographic groups.
Whisper Audio Transcription Comparison
cabreraalex
Test of audio transcription
Flores Translation Evaluation
aashiqmuhamed
Gemini BBH
a13x
What's in the Updated OpenLLM Leaderboard?
cabreraalex
Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.
Gemini Evaluation - HumanEval
gneubig
Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on HumanEval dataset
Gemini Evaluation - MawpsMultiArith
sakter
Evaluation of Gemini, GPT-4, and Mixtral on MawpsMultiArith dataset
Gemini Evaluation - MMLU
zichunyu
Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on MMLU dataset