November 2023

Benchmarks for ChatGPT & Co:

These November benchmarks evaluate GPT-4 Turbo, the latest GPT3.5 and introduce Mistral OpenChat 7B.

Trustbit Leaderboard November 2023

The Trustbit benchmarks evaluate the models in terms of their suitability for digital product development. The higher the score, the better.

☁️ - Cloud models with proprietary license
✅ - Open source models that can be run locally without restrictions
🦙 - Local models with Llama license

model code crm docs integrate marketing reason final 🏆 Cost Speed
GPT-4 v1/0314 ☁️ 85 88 95 52 88 50 76 7.18 € 0.77 rps
GPT-4 Turbo v3/1106-preview ☁️ 54 75 98 52 88 62 71 2.52 € 0.66 rps
GPT-3.5 v2/0613 ☁️ 62 79 76 75 81 48 70 0.35 € 0.96 rps
GPT-3.5 v3/1106 ☁️ 56 68 71 63 78 59 66 0.24 € 2.33 rps
GPT-3.5-instruct 0914 ☁️ 51 90 69 60 88 32 65 0.36 € 2.35 rps
GPT-3.5 v1/0301 ☁️ 38 75 67 67 82 38 61 0.36 € 1.76 rps
Mistral 7B OpenChat-3.5 f16 ✅ 53 72 72 49 88 31 61 0.59 € 1.85 rps
Llama2 70B Hermes b8🦙 48 76 46 76 62 36 58 13.10 € 0.13 rps
Mistral 7B Instruct f16 ✅ 36 68 68 44 74 36 54 0.68 € 1.60 rps
Mistral 7B OpenOrca f16 ✅ 42 57 76 21 78 26 50 0.55 € 1.98 rps
Llama2 13B Hermes b8🦙 39 20 29 61 60 43 42 5.71 € 0.19 rps
Llama2 70B chat b4🦙 13 51 53 29 64 27 40 4.06 € 0.27 rps
Llama2 13B Hermes f16🦙 32 15 30 51 56 43 38 0.57 € 1.93 rps
Llama2 13B Vicuna-1.5 f16🦙 36 25 27 18 77 43 38 0.78 € 1.39 rps
Llama2 70B chat b8🦙 1 53 34 27 71 27 36 10.24 € 0.16 rps
Llama2 13B Puffin b8🦙 22 9 34 31 56 39 32 8.29 € 0.13 rps
Llama2 13B chat f16🦙 0 38 15 30 75 8 27 0.64 € 1.71 rps
Mistral 7B Zephyr-β f16 ✅ 23 34 27 44 29 4 27 0.60 € 1.81 rps
Llama2 13B chat b8🦙 0 38 8 30 75 8 26 4.01 € 0.27 rps
Llama2 7B chat f16🦙 0 33 14 27 50 20 24 0.65 € 1.67 rps
Mistral 7B f16 ✅ 8 4 20 42 52 12 23 1.05 € 1.04 rps
Llama2 13B Puffin f16🦙 14 9 9 5 54 19 18 1.71 € 0.64 rps
Llama2 7B f16🦙 0 0 4 2 28 4 6 1.13 € 0.97 rps