This is the R&D Group from Kiki AI.

We have released several fine-tuned models based on Llama3 and Qwen, including:

These models are optimized for Vietnamese language understanding and generation tasks, such as reading comprehension, information extraction, question answering and summarization.

Evaluation Result Summary:

We evaluate our fine-tuned models on VMLU benchmarks provided by https://vmlu.ai

Model VMLU ViSquad ViDrop ViDialog
Llama3.2-1B-Instruct 37.6 70.1 29.6 33.9
Llama3.2-3B-Instruct 47.6 90.3 63.5 50.8
Qwen2.5-0.5B-Instruct 39.1 62.5 31.5 28.0
Qwen2.5-1.5B-Instruct 48.6 86.7 54.5 39.8
Qwen2.5-3B-Instruct 52.9 88.3 72.4 54.4

Our finetuned models
Llama3.2-1B-Instruct-KAI 50.5 (+12.9) 88.4 (+18.3) 71.1 (+41.5) 50.9 (+17.0)
Llama3.2-3B-Instruct-KAI 58.1 (+10.5) 93.5 (+3.2) 81.4 (+17.9) 67.3 (+16.5)
Qwen2.5-0.5B-Instruct-KAI 49.7 (+10.6) 87.3 (+24.8) 62.3 (+30.8) 39.0 (+11.0)
Qwen2.5-1.5B-Instruct-KAI 57.5 (+8.9) 93.3 (+6.6) 76.0 (+21.5) 54.6 (+14.8)
Qwen2.5-3B-Instruct-KAI 63.5 (+10.6) 94.2 (+5.9) 80.9 (+8.5) 68.5 (+14.1)

Additionally, we have evaluated these models on the ArenaHard (CohereForAI) benchmarks.

For quickstart usage details, please refer to the documentation for each model.