November 2023
Benchmarks for ChatGPT & Co:
These November benchmarks evaluate GPT-4 Turbo, the latest GPT3.5 and introduce Mistral OpenChat 7B.
Trustbit Leaderboard November 2023
The Trustbit benchmarks evaluate the models in terms of their suitability for digital product development. The higher the score, the better.
☁️ - Cloud models with proprietary license
✅ - Open source models that can be run locally without restrictions
🦙 - Local models with Llama license
model | code | crm | docs | integrate | marketing | reason | final 🏆 | Cost | Speed |
---|---|---|---|---|---|---|---|---|---|
GPT-4 v1/0314 ☁️ | 85 | 88 | 95 | 52 | 88 | 50 | 76 | 7.18 € | 0.77 rps |
GPT-4 Turbo v3/1106-preview ☁️ | 54 | 75 | 98 | 52 | 88 | 62 | 71 | 2.52 € | 0.66 rps |
GPT-3.5 v2/0613 ☁️ | 62 | 79 | 76 | 75 | 81 | 48 | 70 | 0.35 € | 0.96 rps |
GPT-3.5 v3/1106 ☁️ | 56 | 68 | 71 | 63 | 78 | 59 | 66 | 0.24 € | 2.33 rps |
GPT-3.5-instruct 0914 ☁️ | 51 | 90 | 69 | 60 | 88 | 32 | 65 | 0.36 € | 2.35 rps |
GPT-3.5 v1/0301 ☁️ | 38 | 75 | 67 | 67 | 82 | 38 | 61 | 0.36 € | 1.76 rps |
Mistral 7B OpenChat-3.5 f16 ✅ | 53 | 72 | 72 | 49 | 88 | 31 | 61 | 0.59 € | 1.85 rps |
Llama2 70B Hermes b8🦙 | 48 | 76 | 46 | 76 | 62 | 36 | 58 | 13.10 € | 0.13 rps |
Mistral 7B Instruct f16 ✅ | 36 | 68 | 68 | 44 | 74 | 36 | 54 | 0.68 € | 1.60 rps |
Mistral 7B OpenOrca f16 ✅ | 42 | 57 | 76 | 21 | 78 | 26 | 50 | 0.55 € | 1.98 rps |
Llama2 13B Hermes b8🦙 | 39 | 20 | 29 | 61 | 60 | 43 | 42 | 5.71 € | 0.19 rps |
Llama2 70B chat b4🦙 | 13 | 51 | 53 | 29 | 64 | 27 | 40 | 4.06 € | 0.27 rps |
Llama2 13B Hermes f16🦙 | 32 | 15 | 30 | 51 | 56 | 43 | 38 | 0.57 € | 1.93 rps |
Llama2 13B Vicuna-1.5 f16🦙 | 36 | 25 | 27 | 18 | 77 | 43 | 38 | 0.78 € | 1.39 rps |
Llama2 70B chat b8🦙 | 1 | 53 | 34 | 27 | 71 | 27 | 36 | 10.24 € | 0.16 rps |
Llama2 13B Puffin b8🦙 | 22 | 9 | 34 | 31 | 56 | 39 | 32 | 8.29 € | 0.13 rps |
Llama2 13B chat f16🦙 | 0 | 38 | 15 | 30 | 75 | 8 | 27 | 0.64 € | 1.71 rps |
Mistral 7B Zephyr-β f16 ✅ | 23 | 34 | 27 | 44 | 29 | 4 | 27 | 0.60 € | 1.81 rps |
Llama2 13B chat b8🦙 | 0 | 38 | 8 | 30 | 75 | 8 | 26 | 4.01 € | 0.27 rps |
Llama2 7B chat f16🦙 | 0 | 33 | 14 | 27 | 50 | 20 | 24 | 0.65 € | 1.67 rps |
Mistral 7B f16 ✅ | 8 | 4 | 20 | 42 | 52 | 12 | 23 | 1.05 € | 1.04 rps |
Llama2 13B Puffin f16🦙 | 14 | 9 | 9 | 5 | 54 | 19 | 18 | 1.71 € | 0.64 rps |
Llama2 7B f16🦙 | 0 | 0 | 4 | 2 | 28 | 4 | 6 | 1.13 € | 0.97 rps |