Computational Identification of Regulatory Statements in EU Legislation
Abstract
Identifying regulatory statements in legislation is useful for developing metrics to measure the regulatory density and strictness of legislation. A computational method is valuable for scaling the identification of such statements from a growing body of EU legislation, constituting approximately 180,000 published legal acts between 1952 and 2023. Past work on extraction of these statements varies in the permissiveness of their definitions for what constitutes a regulatory statement. In this work, we provide a specific definition for our purposes based on the institutional grammar tool. We develop and compare two contrasting approaches for automatically identifying such statements in EU legislation, one based on dependency parsing, and the other on a transformer-based machine learning model. We found both approaches performed similarly well with accuracies of 80% and 84% respectively and a K alpha of 0.58. The high accuracies and not exceedingly high agreement suggests potential for combining strengths of both approaches.