Evaluating LLM Spatial Grounding: A 100-City Audit of 7,000+ Restaurant Recommendations vs. Google Places for Ground Truth [R]

We evaluated the spatial grounding capabilities of ChatGPT, Gemini, and Perplexity (API) by querying 100 US cities and 5 cuisine types. Using the Google Places API as ground truth, we measured hallucination rates, "permanently closed" retrieval errors, and distance-from-center accuracy. This became a City IQ Score.

Key Findings

Chicago Ranked #1: AI scored Chicago the best for overall restaurant accuracy. (City IQ = 89)
Staleness: ~600 recommendations were for businesses closed, clear training data latency.
Spatial Drift - 1078 picks were in the wrong city entirely.

Methodology

City IQ is a 100-point composite: Existence Rate (30pts), Cuisine Accuracy (20pts), Independence Rate (20pts), Bayesian Quality (20pts), Location Accuracy (10pts) — computed per city across all verified recommendations. Bayesian scoring was used for top picks (Google rating weighted by review count vs. dataset mean). Interesting to see what a machine recommends for food choice. Along with accuracy and frequency.

Full Report & Dataset:
https://aiagentsbuzz.com/research/ai-restaurant-recommendations.html

submitted by /u/ubunt2
[link] [comments]

Leave a Comment