diamond tesselation logo

Welcome to Zeno

Learn about Zeno or sign in or sign up to create and see your projects and reports.

GPT MT Benchmark

cabreraalex

20.2k
4

TruthfulQA

a13x

817
8

TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.

What does the OpenLLM Leaderboard measure?

a13x

21

An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.

GPT MT Benchmark Report

cabreraalex

14

Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.

MMLU

a13x

14k
8

MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.

HellaSwag

a13x

10k
8

HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.

Web Arena

cabreraalex

100
2

Exploring the WebArena Agent Environment

cabreraalex

8

Audio Transcription Accents

cabreraalex

2.1k
5

Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.

DiffusionDB

cabreraalex

2M
1

Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/

Gemini MMLU

a13x

13

Audio Transcription Report

cabreraalex

14

Analysis of OpenAI's Whisper models across demographic groups.

ARC

a13x

1.2k
8

ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.

Gemini BBH

a13x

15

What's in the Updated OpenLLM Leaderboard?

cabreraalex

20

Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.

Whisper Audio Transcription Comparison

cabreraalex

2.1k
6

Test of audio transcription

Gemini Evaluation - HumanEval

gneubig

164
4

Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on HumanEval dataset

Gemini Evaluation - MawpsMultiArith

sakter

600
4

Evaluation of Gemini, GPT-4, and Mixtral on MawpsMultiArith dataset

Flores Translation Evaluation

aashiqmuhamed

20.2k
10

Gemini Webarena

a13x

14