FIRE 2025

Forum for Information Retrieval Evaluation

Indian Institute of Technology (BHU) Varanasi

17th - 20th December

Large language models (LLMs) are transforming how people search for, summarize, and interact with information. Their ability to generate fluent text and synthesize complex materials is reshaping traditional information-seeking behaviors. However, their growing influence raises concerns: LLMs can produce hallucinations, plausible but false statements, and reinforce confirmation bias by mirroring users’ assumptions. Because their outputs sound convincing, users may struggle to judge reliability, risking the spread of misinformation and erosion of trust in digital knowledge sources. The talk will outline these limitations, explore how user behavior contributes to them, and present strategies to mitigate such risks.
Scientific communication is experiencing unprecedented growth, with publication volumes increasing at a scale that overwhelms researchers’ capacity to process and evaluate information. This information overload is not only a byproduct of legitimate scholarly activity but is increasingly driven by low-quality and even fraudulent content. Alongside rigorous, well-designed studies, the scholarly record is also populated by weak methodologies, poorly vetted results, and intentional manipulation. The rise of AI-accelerated publishing, paper mills, tortured phrases, and other forms of “fake science” intensifies this problem, creating massive noise and undermining the reliability of academic information systems. For the IR community, this poses both critical challenges and unique opportunities. On the one hand, information overload and quality degradation pose a challenge that needs to be addressed more directly by models that, traditionally, are mainly considering topical relevance. On the other hand, advances in AI, NLP, and bibliometric-enhanced IR offer promising directions for filtering, ranking, and contextualising scholarly information. In this keynote, I will examine the evolving problem of fake science and its role in driving information overload. I will outline recent developments in scholarly information access, highlight open research problems — from detecting low-quality and fraudulent content to designing veracity-aware retrieval and recommendation models — and discuss how IR research can contribute to ensuring that high-quality knowledge remains discoverable, trustworthy, and actionable in an era of overwhelming information abundance.
Social networks have become a central part of how people interact today — sharing ideas, expressing opinions, and engaging with communities. But this constant stream of online interaction also brings conflict. While research has made progress in detecting openly hostile behaviours like bullying, harassment, and threats, subtle forms of conflict—such as teasing, criticism, and sarcasm—have received much less attention. Yet, these forms can be just as harmful, both socially and psychologically.
In this talk, I’ll introduce our work on detecting both overt and subtle types of online conflict using a multi-class, multi-objective model. We developed a new conflict dataset that captures this full range of behaviours and designed a novel classification approach based on class-specific reward functions. These rewards help the model learn more effectively by penalising certain kinds of misclassifications—an important step in complex, multi-class problems. Our architecture leverages the Decision Transformer, allowing us to treat classification as a reinforcement learning task and better manage ambiguity between classes.
Across three benchmark datasets, our approach achieved significant improvements in recall, precision, F1-score, and overall accuracy compared to state-of-the-art deep learning models. Finally, I’ll share insights from our thematic analysis of model misclassifications, highlighting what they reveal about the blurry boundaries between teasing, criticism, and hostility in online communication.
Becoming scientifically literate is more important than ever. Objective scientific information helps people navigate a world where misinformation, disinformation, and other unverified claims are just one click or chat away. Generative AI models that simplify scientific text could give everyone direct access to the latest, objective scientific information in the academic literature. However, these models can also overgenerate, presenting users with the truth and more than the truth… This talk will cover recent efforts to speed up scientific text simplification—especially at the CLEF SimpleText Track—and look at the near future. Can we develop new models that not only answer questions but also question the answers?
Access to information is critical to collective sense-making of our place and relationships in this world. Information (and access to information) has therefore always been saliently political. Throughout history authoritarian forces have tried to control what information is disseminated and how; and information access media have been sites of conflict between liberation and oppression. While there is currently much excitement in the IR community around the application of generative AI for Information Access, we must critically consider the systemic risks that these technologies pose with respect to concentrating control over our online information ecosystems in the hands of few privileged individuals and institutions as well as building effective tools for mass manipulation and persuasion. This talk is a provocation for the IR community to recognize the role of computer-mediated information access in our emancipatory struggles and acknowledge our own responsibilities and role in realizing more equitable, emancipatory, and sustainable futures. We are calling on the community to develop a new emancipatory IR research agenda that embraces humanistic values, commits to universal emancipation and social justice, challenges systems of oppression, grounds itself in practices of organizing and movement building, and works in solidarity with scholars and experts from other disciplines as well as with legal and policy experts, civil rights activists, movement organizers, and artists, among others. Collectively, we must both reimagine post-oppressive futures and the role of IR in leading us there.
Evaluation has long been an important part of information retrieval research. Over decades of research, well established methodologies have been created and refined that, for years, have provided reliable relatively low cost benchmarks for assessing the effectiveness of retrieval systems. With the rise of generative AI and the explosion of interest in Retrieval Augmented Generation (RAG), evaluation is having to be rethought. In this talk, I will speculate on what might be solutions to evaluating RAG systems as well as highlighting some of the opportunities that are opening up. As important as it is to evaluate the new generative retrieval systems it is also important to recognize the traditional information retrieval has not (yet) gone away. However, the way that these systems are being evaluated is undergoing a revolution. I will detail the transformation that is currently taking place in evaluation research. Here I will highlight some of the work that we've been doing at RMIT university as part of the exciting, though controversial, new research directions that generative AI is enabling.
Generative AI and NLP is now quite popular in quantitative finance for information extraction & signal discovery from unstructured textual data such as news and company filings. We show how extracting sentiment and topics using ML methods can lead to profitable quantitative trading strategies.
We will summarize BloombergGPT, a 50 billion parameter LLM that is trained on a wide range of financial data. We construct a 700 billion token dataset based on Bloomberg’s extensive data sources and some external sources. We validate BloombergGPT on standard LLM benchmarks as well as open financial benchmarks.
Finally we will share our analysis on potential geopolitical biases embedded in LLMs, especially in regard to financial sentiment analysis of news stories. We analyzed various large language models including GPT-4o, Llama & Claude among others to find biases in sentiment w.r.t. specific countries/regions or to specific industries. There is also an evidence of bias in terms of the language of the news stories; specifically English language stories tend to score more positive in sentiment as compared to sentiment of the original news stories in Chinese or Japanese language.


TCS Research


ACM SIGIR


To be announced soon.