diamond tesselation logo

Welcome to Zeno

Learn about Zeno or sign in or sign up to create and see your projects and reports.

GPT MT Benchmark

cabreraalex

20.2k
4

TruthfulQA

a13x

817
8

TruthfulQA (https://arxiv.org/abs/2109.07958) task in the Open-LLM-Leaderboard.

What does the OpenLLM Leaderboard measure?

a13x

21

An investigation of the Open LLM Leaderboard and why you should double-check before using the top-ranked model.

GPT MT Benchmark Report

cabreraalex

14

Explore how LLMs compare to dedicated language translation models, particularly for low-resourced languages.

HellaSwag

a13x

10k
8

HellaSwag (https://arxiv.org/abs/1905.07830) task in the Open-LLM-Leaderboard.

MMLU

a13x

14k
8

MMLU (https://arxiv.org/abs/2009.03300) tasks in the Open-LLM-Leaderboard.

Web Arena

cabreraalex

100
2

Exploring the WebArena Agent Environment

cabreraalex

8

DiffusionDB

cabreraalex

2M
1

Explore 2 million images generated by Stable Diffusion. From the DiffusionDB dataset: https://poloclub.github.io/diffusiondb/

Audio Transcription Accents

cabreraalex

2.1k
5

Analysis of OpenAI's Whisper transcription models across speakers of different demographic groups.

Gemini MMLU

a13x

13

Audio Transcription Report

cabreraalex

14

Analysis of OpenAI's Whisper models across demographic groups.

ARC

a13x

1.2k
8

ARC (https://arxiv.org/abs/1803.05457) task in the Open-LLM-Leaderboard.

Gemini BBH

a13x

15

What's in the Updated OpenLLM Leaderboard?

cabreraalex

20

Exploration of the three new tasks in the HuggingFace OpenLLM Leaderboard.

Whisper Audio Transcription Comparison

cabreraalex

2.1k
6

Test of audio transcription

Gemini Evaluation - HumanEval

gneubig

164
4

Evaluation of Gemini-Pro, GPT-4, GPT-3.5, and Mixtral on HumanEval dataset

Gemini Evaluation - MawpsMultiArith

sakter

600
4

Evaluation of Gemini, GPT-4, and Mixtral on MawpsMultiArith dataset

Flores Translation Evaluation

aashiqmuhamed

20.2k
10

Gemini Webarena

a13x

14