The best language models for digital products

The Trustbit LLM Leaderboards

LLM recommendations from Trustbits Data & AI experts

The monthly LLM Leaderboards help to find the best Large Language Model for digital product development.

Based on real benchmark data from our own software products, we re-evaluate each month the performance of different LLM models in addressing specific challenges. We examine specific categories such as document processing, CRM integration, external integration, marketing support, and code generation.

Rely on us to take your projects to the next level!

Benchmarks for July 2024

This month you can expect the following insights & highlights:

Codestral-Mamba 7B - new efficient LLM architecture that achieves surprisingly good results
GPT-4o Mini - affordable, lightweight model. The best in its class!
Mistral Nemo 12B - decent downloadable model in its class, designed for quantization (compression)
Mistral Large 123B v2 - local model that reaches the level of GPT-4 Turbo v3 and Gemini Pro 1.5. It would be the best local model if it weren't for Meta Llama 3.1:
Meta Llama 3.1 - a series of models with a permissive license that set new records in our benchmark.

LLM Benchmarks July 2024 →

The benchmark categories in detail

These categories describe the capabilities of the Trustbit LLM Leaderboard

How well can the model work with large documents and knowledge bases?
How well does the model support work with product catalogs and marketplaces?
Can the model easily interact with external APIs, services and plugins?
How well can the model support marketing activities, e.g. brainstorming, idea generation and text generation?
How well can the model reason and draw conclusions in a given context?
Can the model generate code and help with programming?
The estimated cost of running the workload. For cloud-based models, we calculate the cost according to the pricing. For on-premises models, we estimate the cost based on GPU requirements for each model, GPU rental cost, model speed, and operational overhead.
The "Speed" column indicates the estimated speed of the model in requests per second (without batching). The higher the speed, the better.

Curious about how the scores have evolved? Here you can find all links to previously published leaderboards

Leaderboard June 2024 →

Leaderboard May 2024 →

Leaderboard April 2024 →

Leaderboard March 2024 →

Leaderboard February 2024 →

Leaderboard January 2024 →

Leaderboard December 2023 →

Leaderboard November 2023 →

Leaderboard October 2023 →

Leaderboard September 2023 →

Leaderboard August 2023 →

Leaderboard July 2023 →

LLM PERFORMANCE DEEP DIVE

Batching strategies for optimal LLM performance

In this series, our Innovation & Machine Learning expert Rinat Abdullin explores how to use batching strategies to maximize the performance of Large Language Models (LLMs), increasing efficiency and quality in various applications.

Recommended

20.09.2023

Rinat Abdullin

LLM Performance Series: Batching

20.09.2023

Rinat Abdullin

20.09.2023

Rinat Abdullin

More business value through the use of ChatGPT and Co.

Learn how Trustbit deploys Large Language Models in enterprises, what to consider and why our customers strongly benefit from our partnerships in this context.

Learn more

Would you like to learn more about the use of ChatGPT and Co?

Then we look forward to hearing from you.

christoph.hasenzagl@trustbit.tech

+43 664 88454881