Welcome to Zeno
Learn about Zeno or sign in or sign up to create and see your projects and reports.
Filter:
Sort:
8
TruthfulQA
a13x
TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.
8
What does the OpenLLM Leaderboard measure?
a13x
An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.
8
GPT MT Benchmark Report
cabreraalex
Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.
7
HellaSwag
a13x
HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.
6
Web Arena
cabreraalex
6
Exploring the WebArena Agent Environment
cabreraalex
5
MMLU
a13x
MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.
5
DiffusionDB
cabreraalex
Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/
5
Audio Transcription Accents
cabreraalex
Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.
5
Audio Transcription Report
cabreraalex
Analysis of OpenAI's Whisper models across demographic groups.
4
ARC
a13x
ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.
3
What's in the Updated OpenLLM Leaderboard?
cabreraalex
Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.
2
InstructPix2Pix Africa
skhanuja
2
BLIP-2 Llama Plug-and-play Japan
skhanuja
2
HADR_Tweets_MY
xkoh
2
Vicuna-7b_CustomerServiceDataset_ContextLengths
aman
2
HADR_Tweets_TL
xkoh
2
Winoground CLIP Analysis
skhanuja
2
Winoground Baseline Analysis
skhanuja