diamond tesselation logo

Welcome to Zeno

Learn about Zeno or sign in or sign up to create and see your projects and reports.

Filter:

Sort:

GPT MT Benchmark

cabreraalex

20.2k
4

TruthfulQA

a13x

817
8

TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.

What does the OpenLLM Leaderboard measure?

a13x

21

An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.

GPT MT Benchmark Report

cabreraalex

14

Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.

HellaSwag

a13x

10k
8

HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.

Web Arena

cabreraalex

100
2

Exploring the WebArena Agent Environment

cabreraalex

8

MMLU

a13x

14k
8

MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.

DiffusionDB

cabreraalex

2M
1

Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/

Audio Transcription Accents

cabreraalex

2.1k
5

Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.

Audio Transcription Report

cabreraalex

14

Analysis of OpenAI's Whisper models across demographic groups.

ARC

a13x

1.2k
8

ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.

What's in the Updated OpenLLM Leaderboard?

cabreraalex

20

Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.

InstructPix2Pix Africa

skhanuja

91
1

BLIP-2 Llama Plug-and-play Japan

skhanuja

304
0

HADR_Tweets_MY

xkoh

4.1k
13

Vicuna-7b_CustomerServiceDataset_ContextLengths

aman

33.7k
7

HADR_Tweets_TL

xkoh

1.4k
10

Winoground CLIP Analysis

skhanuja

800
0

Winoground Baseline Analysis

skhanuja

800
0