August 2023
Benchmarks für ChatGPT & Co:
Monatlich aktualisiert: Das Trustbit LLM Leaderboard bietet Ihnen einen aktuellen Vergleich verschiedener Large Language Models wie ChatGPT und mehr, um deren Eignung für den Einsatz in der Produktentwicklung zu bewerten.
Trustbit Leaderboard
August 2023
model | code | crm | docs | integrate | marketing | reason | final |
---|---|---|---|---|---|---|---|
OpenAI GPT4 v2-0613 💰 | 85 | 94 | 100 | 67 | 88 | 60 | 82 |
OpenAI GPT4 v1-0314 💰 | 76 | 97 | 89 | 67 | 75 | 76 | 80 |
Claude v1 💰 | 62 | 77 | 69 | 58 | 88 | 61 | 69 |
OpenAI GPT3.5 v2-0613 💰 | 49 | 77 | 84 | 83 | 84 | 39 | 69 |
Open Models | 46 | 62 | 62 | 100 | 84 | 22 | 63 |
Llama2 13B Nous Hermes q5_K_M ✅ | 46 | 62 | 62 | 100 | 56 | 21 | 58 |
Claude v2 💰 | 38 | 58 | 41 | 67 | 82 | 51 | 56 |
Claude v1 instant 💰 | 72 | 54 | 47 | 67 | 55 | 17 | 52 |
Vicuna v1.1 13B q4_1 | 30 | 45 | 57 | 83 | 71 | 19 | 51 |
Vicuna v1.1 13B q8_0 | 31 | 45 | 52 | 42 | 84 | 16 | 45 |
Vicuna v1.3 13B q5_1 | 36 | 51 | 47 | 50 | 61 | 19 | 44 |
Vicuna v1.1 13B q5_1 | 31 | 45 | 42 | 33 | 84 | 18 | 42 |
Puffin v1.3 13B q5_K_M ✅ | 28 | 48 | 53 | 33 | 25 | 22 | 35 |
Wizard Vicuna 13B Unlocked q5_K_M | 22 | 39 | 53 | 33 | 56 | 0 | 34 |
Llama2 13B Guanaco q5_1 ✅ | 19 | 42 | 62 | 17 | 38 | 0 | 30 |
Llama 7B q8_0 | 25 | 30 | 28 | 25 | 50 | 0 | 26 |
Llama 13B q5_1 | 34 | 9 | 38 | 17 | 44 | 9 | 25 |
Llama2 7B chat ✅ | 7 | 33 | 11 | 17 | 62 | 14 | 24 |
Llama2 7B chat Unlocked q8_0 ✅ | 14 | 33 | 33 | 33 | 25 | 0 | 23 |
Llama2 13B chat q8_0 ✅ | 7 | 33 | 17 | 0 | 66 | 11 | 22 |
Open Llama 7B instruct q8_0 | 16 | 17 | 38 | 17 | 22 | 14 | 21 |
Llama 13B q2_K | 0 | 5 | 47 | 33 | 25 | 0 | 19 |
Llama2 7B ✅ | 18 | 0 | 0 | 0 | 0 | 0 | 3 |