This is the R&D Group from Kiki AI.
We have released several fine-tuned models based on Llama3 and Qwen, including:
These models are optimized for Vietnamese language understanding and generation tasks, such as reading comprehension, information extraction, question answering and summarization.
Evaluation Result Summary:
We evaluate our fine-tuned models on VMLU benchmarks provided by https://vmlu.ai
Model | VMLU | ViSquad | ViDrop | ViDialog |
---|---|---|---|---|
Llama3.2-1B-Instruct | 37.6 | 70.1 | 29.6 | 33.9 |
Llama3.2-3B-Instruct | 47.6 | 90.3 | 63.5 | 50.8 |
Qwen2.5-0.5B-Instruct | 39.1 | 62.5 | 31.5 | 28.0 |
Qwen2.5-1.5B-Instruct | 48.6 | 86.7 | 54.5 | 39.8 |
Qwen2.5-3B-Instruct | 52.9 | 88.3 | 72.4 | 54.4 |
Our finetuned models |
||||
Llama3.2-1B-Instruct-KAI | 50.5 (+12.9) | 88.4 (+18.3) | 71.1 (+41.5) | 50.9 (+17.0) |
Llama3.2-3B-Instruct-KAI | 58.1 (+10.5) | 93.5 (+3.2) | 81.4 (+17.9) | 67.3 (+16.5) |
Qwen2.5-0.5B-Instruct-KAI | 49.7 (+10.6) | 87.3 (+24.8) | 62.3 (+30.8) | 39.0 (+11.0) |
Qwen2.5-1.5B-Instruct-KAI | 57.5 (+8.9) | 93.3 (+6.6) | 76.0 (+21.5) | 54.6 (+14.8) |
Qwen2.5-3B-Instruct-KAI | 63.5 (+10.6) | 94.2 (+5.9) | 80.9 (+8.5) | 68.5 (+14.1) |
Additionally, we have evaluated these models on the ArenaHard (CohereForAI) benchmarks.
For quickstart usage details, please refer to the documentation for each model.