We’re the #1 most accurate text-to-SQL model on the BIRD benchmarks.
Early on, we noticed that many text-to-SQL benchmarks didn’t represent a real-world business environment. Our jobs as data scientists often included messy data, convoluted multi-part
JOINs, and confusing
WHERE clauses that filter nested JSON. Some well-known benchmarks mostly concerned a single table or small sets of
JOINs. They lacked "ecological validity" – the evaluation failed to map to its real-world context.
The BIRD benchmarks are designed to be more representative of real-world SQL. Developed by researchers from University of Hong Kong, Tsinghua University, MIT CSAIL, University of Illinois at Urbana-Champaign, and elsewhere, BIRD contains phenomenona like large tables with many values and considers data across industries from entertainment to healthcare. We think it is a much better representation of corporate databases.
We are pleased to have achieved this result, and thank the BIRD team for running the benchmark.