At Predibase, we recently conducted 700+ fine-tuning experiments to benchmark th...

At Predibase, we recently conducted 700+ fine-tuning experiments to benchmark the performance of popular open-source LLMs across 30 tasks and compared their results to GPT-4.

85% of the time they beat GPT-4.

You can see the results here: https://predibase.com/fine-tuning-index.

The site has a series of interactive charts and a link to our Arxiv paper.