INDOTABVQA: A New Benchmark for Multilingual Table VQA

By Nadia OkoroApril 15, 2026

INDOTABVQA pushes the boundaries of cross-lingual Table Visual Question Answering (VQA). With support for four languages including Bahasa Indonesia, the benchmark reveals striking performance gaps in current Vision-Language Models.

In the field of machine learning, the latest benchmark to watch is INDOTABVQA. This ambitious dataset evaluates cross-lingual Table Visual Question Answering (VQA) on real-world document images. Developed with a focus on Bahasa Indonesia, it includes 1,593 document images featuring various table styles.

Why INDOTABVQA Matters

INDOTABVQA isn't just another dataset. It offers a diverse linguistic challenge with question-answer sets spanning four languages: Bahasa Indonesia, English, Hindi, and Arabic. This diversity allows Vision-Language Models (VLMs) to be assessed in both monolingual and cross-lingual contexts. The real kicker is its potential to highlight performance discrepancies in VLMs, especially in languages that don't get much spotlight.

The Performance Gaps

Leading open-source VLMs like Qwen2.5-VL, Gemma-3, LLaMA-3.2, and GPT-4o were put to the test. The findings weren't exactly flattering. These models exhibited substantial performance gaps, particularly when dealing with complex table structures and low-resource languages. Strip away the marketing and you get a clear picture: we're not there yet.

Fine-Tuning: A Step Forward

Fine-tuning showed promise. A compact 3 billion parameter model and a LoRA-finetuned 7 billion parameter model improved accuracy by 11.6% and 17.8%, respectively. Clearly, the architecture matters more than the parameter count. Notably, adding explicit table region coordinates as input further boosted performance by 4-7%. This highlights the value of spatial priors in table-based reasoning.

Implications for Underrepresented Regions

INDOTABVQA isn't just a technical feat. It's a significant step for underrepresented regions and languages in AI research. Language-diverse and domain-specific datasets like these can propel advancements in document understanding. But are we doing enough to support low-resource languages in AI?

, INDOTABVQA is more than a benchmark. It's a call to action for developing models that truly understand diverse languages and structures. As VLMs evolve, they'll need to rise to such challenges, not just in popular languages but everywhere.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.