Difficulty-Aware Adaptive Reasoning for Vietnamese VQA with GPT-OSS
We propose an adaptive, difficulty-aware reasoning framework for Vietnamese Visual Question Answering. The system leverages dense captioning from Gemini 2.5 to enrich context for downstream LLMs (GPT-OSS, Qwen3, DeepSeek), with a router that scales inference compute by question difficulty. The framework reaches competitive BLEU@4 / ROUGE-L / METEOR on ViVQA-X while keeping inference budgets low.