Results are only presented with niche LLM of doubao-1.5

**Fix LaaJ LLM model to GPT-4.1-mini**, try the following possible candidates as base LLMs:

- Qwen3: 8b, 14b, 30b-a3b
- Gemini: 2.0-flash-lite, 2.5-flash-lite
- repeat time: one time first for all five models, and preserve two base models (one from Qwen and one from Gemini) for repeating three times
- datasets: first try LM-SYS-100