Adam Wyner, Swansea University, UK
Legal services must be provided to citizens and business. Yet, the legal system is overburden by cases and understaffed. If justice is not properly and efficiently served, then a range of fundamental social and economic values are undermined. To upgrade legal service provision, we must analyse the data; given that the law is largely represented in language, we are faced with a vast, highly complex corpus of unstructured legal texts - primary and secondary legislation, regulations, case law, and others. The question is how can this corpus be processed and at the requisite level of quality to serve the needs of the legal system? Subordinate to this question are questions bearing on subtypes of corpora, legal jurisdictions, diversity of users, user expectations, legal concepts and reasoning, sources of analysis, ambiguity, interpretation, and others. Addressing these questions, Legal Informatics and LegalTech are fast-growing areas in Computer Science research, development, and deployment. The talk briefly overviews the current landscape, then outlines an India-UK collaboration on a deep learning study of a corpus of legal cases, which resulted in a JURIX conference best paper award. Several suggestions are proposed for deepening and extending the opportunities for further collaborations, which support an upgrade of technologies for legal services.
Ellen M. Voorhees, NIST, USA
The Text REtrieval Conference (TREC), an information retrieval (IR) evaluation workshop organized by the U.S. National Institute of Standards and Technology (NIST), has spent close to thirty years investigating strategies for building large, reusable test collections to support IR research. Then, in late March, 2020 NIST received a request to quickly build a collection for search of the scientific literature related to COVID-19. It was time to put experience into action.
The result was TREC-COVID (ir.nist.gov/covidSubmit/), a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. A key characteristic of pandemic search is the accelerated rate of change--the topics of interest evolve as the pandemic progresses and the scientific literature in the area explodes--and the COVID-19 pandemic provided an opportunity to capture this progression as it happened. Over the course of five rounds, approximately 90 participating teams from around the globe submitted more than 500 retrieval runs using the CORD-19 corpus and test questions harvested from medical library search logs. Each round produced its own small test collection, and the union of the rounds' collections forms a larger, more comprehensive collection we call TREC-COVID Complete. This talk will explain the design decisions behind why TREC-COVID Complete was constructed as it was, and examine its quality as an IR test collection.
Paul Clough , The University of Sheffield and Peak Indicators, UK
Increasingly algorithms are driving information systems and services where they influence people’s decision-making and behaviours. However, recent coverage in the media and academic research has shown negative effects of data-driven methods, such as discrimination and the reinforcement of social biases. This talk will review algorithmic bias, transparency and fairness and reflect on the results of research conducted with colleagues on gender stereotypes and backlash within image search. Results highlighted the need to understand how and why biases enter search algorithms and at which stages of the engineering process. The findings also align with current concerns about algorithms that underlie information services, especially search engines, and the view of the world they present and the extent to which they are biased. The talk will also summarise initiatives, such as Microsoft’s Fairness, Accountability, Transparency and Ethics (FATE) in AI, and potential solutions to the problem through technical solutions, such as Explainable AI, and tools to assist with the discovery and prevention of data and algorithmic biases.