diamond tesselation logo

Welcome to Zeno

Learn about Zeno or sign in or sign up to create and see your projects and reports.

GPT MT Benchmark

cabreraalex

20.2k
4

TruthfulQA

a13x

817
8

TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.

MMLU

a13x

14k
8

MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.

HellaSwag

a13x

10k
8

HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.

What does the OpenLLM Leaderboard measure?

a13x

21

An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.

GPT MT Benchmark Report

cabreraalex

14

Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.

Exploring the WebArena Agent Environment

cabreraalex

8

Web Arena

cabreraalex

100
2

ARC

a13x

1.2k
8

ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.

Audio Transcription Accents

cabreraalex

2.1k
5

Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.

DiffusionDB

cabreraalex

2M
1

Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/

Gemini MMLU

a13x

13

Audio Transcription Report

cabreraalex

14

Analysis of OpenAI's Whisper models across demographic groups.

Whisper Audio Transcription Comparison

cabreraalex

2.1k
6

Test of audio transcription

Flores Translation Evaluation

aashiqmuhamed

20.2k
10

Gemini BBH

a13x

15

What's in the Updated OpenLLM Leaderboard?

cabreraalex

20

Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.

GSM8k OpenLLM Leaderboard

cabreraalex

1.3k
1

GSM8k task in the Open-LLM-Leaderboard (https://arxiv.org/abs/2110.14168).

Gemini Evaluation - MawpsMultiArith

sakter

600
4

Evaluation of Gemini, GPT-4, and Mixtral on MawpsMultiArith dataset

Gemini Evaluation - MMLU

zichunyu

14k
9

Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on MMLU dataset