August 2023
Benchmarks for ChatGPT & Co:
Updated monthly: The Trustbit LLM Leaderboard provides you with an up-to-date comparison of various Large Language Models such as ChatGPT and more to evaluate their suitability for use in product development.
Trustbit Leaderboard
August 2023
model | code | crm | docs | integrate | marketing | reason | final |
---|---|---|---|---|---|---|---|
OpenAI GPT4 v2-0613 💰 | 85 | 94 | 100 | 67 | 88 | 60 | 82 |
OpenAI GPT4 v1-0314 💰 | 76 | 97 | 89 | 67 | 75 | 76 | 80 |
Claude v1 💰 | 62 | 77 | 69 | 58 | 88 | 61 | 69 |
OpenAI GPT3.5 v2-0613 💰 | 49 | 77 | 84 | 83 | 84 | 39 | 69 |
Open Models | 46 | 62 | 62 | 100 | 84 | 22 | 63 |
Llama2 13B Nous Hermes q5_K_M ✅ | 46 | 62 | 62 | 100 | 56 | 21 | 58 |
Claude v2 💰 | 38 | 58 | 41 | 67 | 82 | 51 | 56 |
Claude v1 instant 💰 | 72 | 54 | 47 | 67 | 55 | 17 | 52 |
Vicuna v1.1 13B q4_1 | 30 | 45 | 57 | 83 | 71 | 19 | 51 |
Vicuna v1.1 13B q8_0 | 31 | 45 | 52 | 42 | 84 | 16 | 45 |
Vicuna v1.3 13B q5_1 | 36 | 51 | 47 | 50 | 61 | 19 | 44 |
Vicuna v1.1 13B q5_1 | 31 | 45 | 42 | 33 | 84 | 18 | 42 |
Puffin v1.3 13B q5_K_M ✅ | 28 | 48 | 53 | 33 | 25 | 22 | 35 |
Wizard Vicuna 13B Unlocked q5_K_M | 22 | 39 | 53 | 33 | 56 | 0 | 34 |
Llama2 13B Guanaco q5_1 ✅ | 19 | 42 | 62 | 17 | 38 | 0 | 30 |
Llama 7B q8_0 | 25 | 30 | 28 | 25 | 50 | 0 | 26 |
Llama 13B q5_1 | 34 | 9 | 38 | 17 | 44 | 9 | 25 |
Llama2 7B chat ✅ | 7 | 33 | 11 | 17 | 62 | 14 | 24 |
Llama2 7B chat Unlocked q8_0 ✅ | 14 | 33 | 33 | 33 | 25 | 0 | 23 |
Llama2 13B chat q8_0 ✅ | 7 | 33 | 17 | 0 | 66 | 11 | 22 |
Open Llama 7B instruct q8_0 | 16 | 17 | 38 | 17 | 22 | 14 | 21 |
Llama 13B q2_K | 0 | 5 | 47 | 33 | 25 | 0 | 19 |
Llama2 7B ✅ | 18 | 0 | 0 | 0 | 0 | 0 | 3 |