Scientific communication is experiencing unprecedented growth, with publication volumes increasing at a scale that overwhelms researchers’ capacity to process and evaluate information. This information overload is not only a byproduct of legitimate scholarly activity but is increasingly driven by low-quality and even fraudulent content. Alongside rigorous, well-designed studies, the scholarly record is also populated by weak methodologies, poorly vetted results, and intentional manipulation. The rise of AI-accelerated publishing, paper mills, tortured phrases, and other forms of “fake science” intensifies this problem, creating massive noise and undermining the reliability of academic information systems.
For the IR community, this poses both critical challenges and unique opportunities. On the one hand, information overload and quality degradation pose a challenge that needs to be addressed more directly by models that, traditionally, are mainly considering topical relevance. On the other hand, advances in AI, NLP, and bibliometric-enhanced IR offer promising directions for filtering, ranking, and contextualising scholarly information. In this keynote, I will examine the evolving problem of fake science and its role in driving information overload. I will outline recent developments in scholarly information access, highlight open research problems — from detecting low-quality and fraudulent content to designing veracity-aware retrieval and recommendation models — and discuss how IR research can contribute to ensuring that high-quality knowledge remains discoverable, trustworthy, and actionable in an era of overwhelming information abundance.