The best language models for digital products
The Trustbit LLM Leaderboards
LLM recommendations from Trustbits Data & AI experts
The monthly LLM Leaderboards help to find the best Large Language Model for digital product development.
Based on real benchmark data from our own software products, we re-evaluate each month the performance of different LLM models in addressing specific challenges. We examine specific categories such as document processing, CRM integration, external integration, marketing support, and code generation.
Rely on us to take your projects to the next level!
Benchmarks for July 2024
This month you can expect the following insights & highlights:
Codestral-Mamba 7B - new efficient LLM architecture that achieves surprisingly good results
GPT-4o Mini - affordable, lightweight model. The best in its class!
Mistral Nemo 12B - decent downloadable model in its class, designed for quantization (compression)
Mistral Large 123B v2 - local model that reaches the level of GPT-4 Turbo v3 and Gemini Pro 1.5. It would be the best local model if it weren't for Meta Llama 3.1:
Meta Llama 3.1 - a series of models with a permissive license that set new records in our benchmark.
The benchmark categories in detail
These categories describe the capabilities of the Trustbit LLM Leaderboard
-
How well can the model work with large documents and knowledge bases?
-
How well does the model support work with product catalogs and marketplaces?
-
Can the model easily interact with external APIs, services and plugins?
-
How well can the model support marketing activities, e.g. brainstorming, idea generation and text generation?
-
How well can the model reason and draw conclusions in a given context?
-
Can the model generate code and help with programming?
-
The estimated cost of running the workload. For cloud-based models, we calculate the cost according to the pricing. For on-premises models, we estimate the cost based on GPU requirements for each model, GPU rental cost, model speed, and operational overhead.
-
The "Speed" column indicates the estimated speed of the model in requests per second (without batching). The higher the speed, the better.
Curious about how the scores have evolved? Here you can find all links to previously published leaderboards
LLM PERFORMANCE DEEP DIVE
Batching strategies for optimal LLM performance
In this series, our Innovation & Machine Learning expert Rinat Abdullin explores how to use batching strategies to maximize the performance of Large Language Models (LLMs), increasing efficiency and quality in various applications.
More business value through the use of ChatGPT and Co.
Learn how Trustbit deploys Large Language Models in enterprises, what to consider and why our customers strongly benefit from our partnerships in this context.