FIRE 2025

Forum for Information Retrieval Evaluation

Indian Institute of Technology (BHU) Varanasi

17th - 20th December

Scientific communication is experiencing unprecedented growth, with publication volumes increasing at a scale that overwhelms researchers’ capacity to process and evaluate information. This information overload is not only a byproduct of legitimate scholarly activity but is increasingly driven by low-quality and even fraudulent content. Alongside rigorous, well-designed studies, the scholarly record is also populated by weak methodologies, poorly vetted results, and intentional manipulation. The rise of AI-accelerated publishing, paper mills, tortured phrases, and other forms of “fake science” intensifies this problem, creating massive noise and undermining the reliability of academic information systems. For the IR community, this poses both critical challenges and unique opportunities. On the one hand, information overload and quality degradation pose a challenge that needs to be addressed more directly by models that, traditionally, are mainly considering topical relevance. On the other hand, advances in AI, NLP, and bibliometric-enhanced IR offer promising directions for filtering, ranking, and contextualising scholarly information. In this keynote, I will examine the evolving problem of fake science and its role in driving information overload. I will outline recent developments in scholarly information access, highlight open research problems — from detecting low-quality and fraudulent content to designing veracity-aware retrieval and recommendation models — and discuss how IR research can contribute to ensuring that high-quality knowledge remains discoverable, trustworthy, and actionable in an era of overwhelming information abundance.
Social networks have become a central part of how people interact today — sharing ideas, expressing opinions, and engaging with communities. But this constant stream of online interaction also brings conflict. While research has made progress in detecting openly hostile behaviours like bullying, harassment, and threats, subtle forms of conflict—such as teasing, criticism, and sarcasm—have received much less attention. Yet, these forms can be just as harmful, both socially and psychologically. In this talk, I’ll introduce our work on detecting both overt and subtle types of online conflict using a multi-class, multi-objective model. We developed a new conflict dataset that captures this full range of behaviours and designed a novel classification approach based on class-specific reward functions. These rewards help the model learn more effectively by penalising certain kinds of misclassifications—an important step in complex, multi-class problems. Our architecture leverages the Decision Transformer, allowing us to treat classification as a reinforcement learning task and better manage ambiguity between classes. Across three benchmark datasets, our approach achieved significant improvements in recall, precision, F1-score, and overall accuracy compared to state-of-the-art deep learning models. Finally, I’ll share insights from our thematic analysis of model misclassifications, highlighting what they reveal about the blurry boundaries between teasing, criticism, and hostility in online communication.
Becoming scientifically literate is more important than ever. Objective scientific information helps people navigate a world where misinformation, disinformation, and other unverified claims are just one click or chat away. Generative AI models that simplify scientific text could give everyone direct access to the latest, objective scientific information in the academic literature. However, these models can also overgenerate, presenting users with the truth and more than the truth… This talk will cover recent efforts to speed up scientific text simplification—especially at the CLEF SimpleText Track—and look at the near future. Can we develop new models that not only answer questions but also question the answers?
Evaluation has long been an important part of information retrieval research. Over decades of research, well established methodologies have been created and refined that, for years, have provided reliable relatively low cost benchmarks for assessing the effectiveness of retrieval systems. With the rise of generative AI and the explosion of interest in Retrieval Augmented Generation (RAG), evaluation is having to be rethought. In this talk, I will speculate on what might be solutions to evaluating RAG systems as well as highlighting some of the opportunities that are opening up. As important as it is to evaluate the new generative retrieval systems it is also important to recognize the traditional information retrieval has not (yet) gone away. However, the way that these systems are being evaluated is undergoing a revolution. I will detail the transformation that is currently taking place in evaluation research. Here I will highlight some of the work that we've been doing at RMIT university as part of the exciting, though controversial, new research directions that generative AI is enabling.


TCS Research


ACM SIGIR


To be announced soon.