Internal Press · February 2026 · KVA Research · Academic Paper
By Maryam Fooladi and Federico Bottino
7 pages · 6 min read
Beyond Sentiment: Why Traditional NLP Fails Political News — and How LLMs Can Bridge the Gap
When you run sentiment analysis on political news, 70% of it comes back labelled “neutral.” Not because the content is neutral — but because the tool was never designed to see what matters. A new KVA research paper identifies the problem, names it, and proposes a way forward.
The invisible majority
Sentiment analysis is one of the most widely used computational tools in social science research. Its appeal is obvious: feed in text, get back a polarity score — positive, neutral, or negative. Transformer-based models like RoBERTa have pushed classification accuracy to impressive levels on standard benchmarks. But what happens when you point these models at the kind of text that social scientists actually care about?
The answer, according to a new comparative study from the KVA research team, is surprisingly stark. When RoBERTa — specifically the Cardiff NLP Twitter-RoBERTa model, one of the most cited open-source sentiment analysis models in current NLP research — was applied to a corpus of 50 political news articles from 17 international media outlets, it classified 70% of them as neutral.
70%
Classified as neutral
23%
Of "neutral" articles had negative scores above 0.30
4%
Classified as positive
These are not bland articles. The corpus includes coverage of the ouster of Australia's first female Liberal Party leader, Mexican-UK diplomatic tensions over asylum policy, and Franco-German military disagreements. Each of these carries rich political framing, ideological orientation, and rhetorical strategy. Yet the model sees none of it.
Neutral collapse: naming the problem
The paper introduces a precise term for this phenomenon: neutral collapse. It describes the systematic tendency of sentiment models to assign a neutral label to political news that is, from a social science perspective, substantively rich in framing, bias, and rhetorical strategy.
Neutral collapse occurs because journalistic writing employs hedging, attribution, balanced sourcing, and formal register — linguistic features that sentiment models interpret as absence of valence, even when the underlying content carries significant ideological and rhetorical weight.
The problem is not that RoBERTa is broken. The model performs exactly as designed — it classifies emotional polarity. The problem is that political news articles are not primarily characterised by emotional polarity. They are characterised by framing, ideological positioning, and rhetorical strategy. Sentiment analysis was built for a different question.
The paper quantifies this mismatch at the decision boundary. Of the 35 articles classified as neutral, 8 had negative probability scores above 0.30 — meaning the model detected substantial negative content but was narrowly overruled by the neutral class. One BBC article on Australia's Liberal Party leadership received a margin of just 0.02 between neutral and negative. A researcher studying gender and political leadership would find this “neutral” classification actively misleading.
What LLMs see that sentiment analysis cannot
The paper's second contribution is a direct comparison. The same 50 articles were analysed using an LLM-based platform that processes the full text — not just the first 512 tokens — across four dimensions corresponding to established social science analytical frameworks:
Political bias
Direction (left / neutral / right) with a continuous intensity score (0–100). Addresses the core SSH question: from which political perspective is content presented?
Sensationalism
Degree of emotional exaggeration, clickbait patterns, and hyperbolic language. Captures media quality concerns central to journalism studies.
Emotional appeal
Extent to which the article uses emotional rather than rational argumentation. Operates orthogonally to polarity — an article can be "neutral" in sentiment but high in emotional appeal.
Political framing
Intensity of rhetorical devices: fear appeal, scapegoating, us-vs-them dichotomies, victim/hero narratives. The core object of study in framing theory.
The structural difference is revealing. RoBERTa provides one dimension of variation — polarity — with high reproducibility but low social science relevance. The LLM-based approach provides four continuous dimensions plus categorical bias direction, all with direct correspondence to how political communication scholars actually conceptualise media influence.
There is also a practical asymmetry. RoBERTa's 512-token input limit means it analyses only the first third to half of a typical news article. Political framing often operates through cumulative article structure — the rhetorical arc across an entire piece — which is preserved only by full-text processing.
Complementary, not substitutive
The paper is careful not to argue that LLM-based analysis should replace sentiment analysis. Each tool has distinctive strengths. RoBERTa is open-source, deterministic, computationally cheap, and well suited to large-scale longitudinal tracking of emotional valence — if emotional valence is what you are studying.
But for questions about framing, ideology, or media quality — the core concerns of political communication scholarship — sentiment analysis alone is insufficient. The paper proposes a complementary framework: sentiment analysis provides a first-pass polarity screen, and LLM-based multi-dimensional analysis provides the deeper, SSH-aligned analytical layer.
When 70% of a corpus is classified as “neutral,” the researcher is left with a category that provides no analytical leverage. Researchers who rely on sentiment analysis as their primary tool may be drawing conclusions from only the most extreme 30% of their corpus.
The challenges ahead
The paper is transparent about the limitations of the LLM-based approach. Reproducibility is a concern: proprietary models may produce slightly different outputs across API versions. Cost is significant: LLM inference is orders of magnitude more expensive than running a fine-tuned transformer. And validation remains an open question — while LLM outputs are intuitively richer, their alignment with expert human judgments requires systematic evaluation.
There is also the question of opacity. LLM-based analysis produces more interpretable outputs — bias labels, framing categories, intensity scores — but relies on less transparent internal processes. For social science researchers accustomed to methodological transparency, this trade-off deserves careful consideration.
Why this matters
The paper contributes to a growing argument in computational social science: that the field needs AI tools configured for social-scientific reasoning, not merely repurposed from NLP benchmarks. The sentiment analysis model performs well on what it was trained to do. The problem is that what it was trained to do is not what social scientists need it to do.
The study was conducted on a deliberately diverse corpus — 17 outlets spanning Western Anglophone, European, and Global South media, covering parliamentary governance, diplomatic tensions, election law, civil rights, and corruption. The neutral collapse pattern held uniformly across all outlet regions, with neutral labels dominating at 62–74% regardless of geographic or editorial context.
The study in numbers
50 political news articles from 17 international media outlets
70% classified as neutral by RoBERTa (35 of 50 articles)
23% of neutral articles had negative scores above 0.30
4 continuous dimensions captured by the LLM-based approach
512 tokens — RoBERTa's input limit vs. full-text LLM processing
Future work will extend the comparison to larger, multilingual corpora, incorporate expert evaluation by political communication scholars, and investigate whether fine-tuning open-source LLMs can approximate the multi-dimensional analysis currently achieved with proprietary models.
Maryam Fooladi and Federico Bottino are affiliated with Kakashi Ventures Accelerator.
Download the technical report
Beyond Sentiment: Comparing Traditional NLP and LLM-Based Multi-Dimensional Analysis for Political News Evaluation
PDF · 7 pages · February 2026
Download PDF ↓For enquiries about the research or partnership opportunities, contact the KVA team.
Internal Press · Kakashi Ventures Accelerator Srl · Turin, Italy
Published February 2026