LIRIAe : Reader and intelligent research tool for environmental authorities
Every year, environmental authorities handle nearly 4,000 files related to projects and programs such as wind farms, industrial facilities, urban developments, etc., under constrained deadlines and human resources. Some files are not processed, or not sufficiently detailed, with environmental issues that could be avoided, reduced, or compensated. A significant part of the time and difficulty of processing is related to the search for information, checking their consistency, and drafting informed opinions.
The Ecolab of the CGDD develops tools to assist in reading and researching information, with the support of DNUM/MTECT, INRIA via labIA/DINUM, and sector experts from DREAL Bretagne and Bourgogne-Franche-Comté. The needs and initial functionalities, described below, were identified following workshops and discussions with auditors. Feedback from about ten users, especially in the “pilot” regions, helped validate and adapt hypotheses at each stage and broaden the scope and performance of the tools.
These tools aim to allow smoother reading through the consolidation of all relevant documents of a file, generally composed of several hundred to several thousand pages, distributed across multiple non-standardized .pdf documents. A user-friendly navigation component containing the hierarchy of titles (sections, subsections…) is constructed automatically, with user validation. Advanced search functionalities within a file and automatic tags facilitate the identification of sections that deal with a theme (e.g., impacts on environments and landscapes, flora and fauna, etc.). Shareable notes allow for the localization of critical points in the file and enable a collaborative instruction and validation process.
The use of more advanced AI techniques will improve the relevance and user experience in information search, with a formulation in natural language and the identification of passages that address issues, even when the query terms are not present (e.g., soil pollution). The search could apply to external sources, such as a corpus of files and opinions from MRAe/Ae, framing documents, legal texts, and others. By using large language models (LLMs), summaries and writing suggestions can be proposed. These developments will allow for the integration of an environmental component into the models, applicable to similar use cases, such as classified installation files, urban planning documents, PCAET, and others.