Gemini Benchmark - Code
In this report, we compare the Pass@1 of Gemini Pro to GPT 3.5 Turbo and GPT 4 Turbo on two code generation tasks HumanEval and ODEX. We present overall performance, performance by gold solution length, performance by used library, and a case study. If you want to look in more detail about the individual examples, you can click over to the corresponding Zeno project HumanEval and Zeno project ODEX.
First, from the overall results shown in the figures below, we can see that Gemini Pro achieves a Pass@1 lower than GPT 3.5 Turbo and much lower than GPT 4 Turbo on both tasks. The results demonstrate that Gemini's code generation capabilities still have room for improvement.