OpenAI models grade LA mayoral and CA gubernatorial candidates on staying on topic

A new analysis uses OpenAI's language models to measure how closely LA mayoral and CA gubernatorial candidates stuck to the questions they were asked.
Political debates and candidate forums are full of digressions. A candidate asked about housing cost may pivot to immigration. A question on education reform can become a talking point on taxes. Voters are left trying to decide whether a given answer actually addressed the question or just sounded good.
A new analysis from Jonathan Gonzalez takes a different approach: use AI models from OpenAI to determine how much each candidate stayed on topic. The methodology, applied to the LA mayoral race and the California gubernatorial race, offers a quantitative look at an old political problem.
The work relies on OpenAI's language models to parse both the question and the candidate's answer, then assign a relevance score. The models were trained to recognize when a response directly addresses the query versus when it shifts to a pre-prepared talking point or unrelated subject. The result is a data-driven breakdown of which candidates answered the question and which ones dodged it.
Gonzalez did not release a full dataset or the exact prompt used for the AI, but the core finding is clear: many candidates in both races consistently wandered off topic, and the AI was able to flag those moments with a high degree of consistency. The analysis covered multiple forums, debates, and interviews across the two campaigns.
For the LA mayoral race, the AI found that certain candidates scored consistently higher on topic adherence. Those candidates tended to give direct answers on public safety, homelessness, and housing – the three dominant issues in the city. Others frequently pivoted to broader national themes or personal biography rather than answering the specific question. The tool does not measure whether the answer was correct or persuasive, only whether it was responsive.
In the California gubernatorial race, where the field is larger and the issues range from wildfire policy to education funding, the AI analysis revealed a similar pattern. Candidates who served in state government or held local office tended to score higher on staying on topic. Those with less direct policy experience were more likely to deliver generic responses. The AI also caught candidates who gave long, detailed answers that technically addressed a different question than the one asked.
The technique is not without its limitations. OpenAI models, like all large language models, have been shown to exhibit bias and inconsistency depending on phrasing. The analysis depends entirely on how the question is framed in the input. A question that is itself vague or compound may produce a lower relevance score even for a candidate who answered it well. Gonzalez acknowledged this in the briefing, noting that the AI was calibrated to account for common question structures in political discourse.
Still, the approach represents a novel application of AI in political journalism. Instead of using AI to generate content or summarize events, this analysis uses the model as a kind of auditor – automatically checking claims of responsiveness. The idea is to give voters a more transparent metric than gut feeling or partisan heuristics.
There are also ethical questions. If an AI model determines that a candidate is off topic, that judgment may influence voter perception. The model's internal reasoning is opaque, and a candidate who intentionally digresses to address a more pressing issue – a common tactic in debates – might be penalized unfairly. Politics is not a trivia contest; sometimes the most responsible answer is to reframe the question.
Gonzalez framed the work as an experiment rather than a definitive scoring system. The AI output is intended to complement, not replace, human analysis. Reporters and voters can use the topic adherence score as one data point among many.
What this early effort shows is that large language models can be applied to political discourse in a structured, replicable way. Future iterations could incorporate sentiment analysis, fact-checking cross-references, or even real-time scoring during live debates. The technology to grade a politician's answer is here. What remains to be seen is whether politicians will adapt their behavior when they know an AI is watching.
SysCall News will continue to follow developments in AI-assisted political analysis as the technology matures.
Staff Writer
Maya writes about AI research, natural language processing, and the business of machine learning.
Comments
Loading comments…



