Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
This paper establishes a formal equivalence between the architectural classes of modern agentic AI systems and the abstract machines of the Chomsky hierarchy. We posit that the memory architecture of an AI agent is the definitive feature determining its computational power and that it directly maps it to a corresponding class of automaton. Specifically, we demonstrate that simple reflex agents are equivalent to Finite Automata, hierarchical task-decomposition agents are equivalent to Pushdown Automata, and agents employing readable/writable memory for reflection are equivalent to TMs. This Automata-Agent Framework provides a principled methodology for right-sizing agent architectures to optimize computational efficiency and cost. More critically, it creates a direct pathway to formal verification, enables the application of mature techniques from automata theory to guarantee agent safety and predictability. By classifying agents, we can formally delineate the boundary between verifiable systems and those whose behavior is fundamentally undecidable. We address the inherent probabilistic nature of LLM-based agents by extending the framework to probabilistic automata that allow quantitative risk analysis. The paper concludes by outlining an agenda for developing static analysis tools and grammars for agentic frameworks.
This volume contains the proceedings of EXPRESS/SOS 2025: the Combined 32nd International Workshop on Expressiveness in Concurrency and the 22nd Workshop on Structural Operational Semantics, which was held in Aarhus, Denmark, as an affiliated workshop of CONFEST 2025. The EXPRESS/SOS workshop series aims at bringing together researchers interested in the formal semantics of systems and programming concepts, and in the expressiveness of computational models.
Attacks, including the manipulation of sensor readings and the modification of actuator commands, pose a significant challenge to the security and privacy of automated systems. This paper considers discrete event systems that can be modeled with nondeterministic finite state automata that are susceptible to state attacks. A state attack allows an intruder to learn whether or not the current state of a system falls into certain subsets of states. The intruder has a limited total number of state attacks at its disposal, but can launch state attacks at arbitrary instants of its choosing. We are interested on violations of current-state anonymity (resp. opacity), i.e., situations where the intruder, based on the sequence of observations generated by the system and the outcome of any performed state attacks, can ascertain the exact current state of the system (resp. that the current state of the system definitely resides in a subset of secret states). When the system violates current-state anonymity (resp. opacity) under a bounded number of state attacks, a subsequent question is whether the intruder can design an attack strategy such that anonymity-violating (resp. opacity-violating) situations will always be reached. In this latter case, we also design an attack strategy that guarantees that the system will reach a violating situation regardless of system actions. We provide pertinent complexity analysis of the corresponding verification algorithms and examples to illustrate the proposed methods.
Although Shapley additive explanations (SHAP) can be computed in polynomial time for simple models like decision trees, they unfortunately become NP-hard to compute for more expressive black-box models like neural networks - where generating explanations is often most critical. In this work, we analyze the problem of computing SHAP explanations for *Tensor Networks (TNs)*, a broader and more expressive class of models than those for which current exact SHAP algorithms are known to hold, and which is widely used for neural network abstraction and compression. First, we introduce a general framework for computing provably exact SHAP explanations for general TNs with arbitrary structures. Interestingly, we show that, when TNs are restricted to a *Tensor Train (TT)* structure, SHAP computation can be performed in *poly-logarithmic* time using *parallel* computation. Thanks to the expressiveness power of TTs, this complexity result can be generalized to many other popular ML models such as decision trees, tree ensembles, linear models, and linear RNNs, therefore tightening previously reported complexity results for these families of models. Finally, by leveraging reductions of binarized neural networks to Tensor Network representations, we demonstrate that SHAP computation can become *efficiently tractable* when the network's *width* is fixed, while it remains computationally hard even with constant *depth*. This highlights an important insight: for this class of models, width - rather than depth - emerges as the primary computational bottleneck in SHAP computation.
In this work, we extend undecidability of language equivalence for two-dimensional Vector Addition System with States (VASS) accepting by coverability condition. We show that the problem is undecidable even when one of the two-dimensional VASSs is deterministic and the other is history-deterministic. Moreover, we observe, that the languages of two history-deterministic VASSs are equal if and only if each can simulate the other. This observation allows us to extend the undecidability to any equivalence relation between two-sided simulation and language equivalence.
Cloud computing is ubiquitous, with a growing number of services being hosted on the cloud every day. Typical cloud compute systems allow administrators to write policies implementing access control rules which specify how access to private data is governed. These policies must be manually written, and due to their complexity can often be error prone. Moreover, existing policies often implement complex access control specifications and thus can be difficult to precisely analyze in determining their behavior works exactly as intended. Recently, Large Language Models (LLMs) have shown great success in automated code synthesis and summarization. Given this success, they could potentially be used for automatically generating access control policies or aid in understanding existing policies. In this paper, we explore the effectiveness of LLMs for access control policy synthesis and summarization. Specifically, we first investigate diverse LLMs for access control policy synthesis, finding that: although LLMs can effectively generate syntactically correct policies, they have permissiveness issues, generating policies equivalent to the given specification 45.8% of the time for non-reasoning LLMs, and 93.7% of the time for reasoning LLMs. We then investigate how LLMs can be used to analyze policies by introducing a novel semantic-based request summarization approach which leverages LLMs to generate a precise characterization of the requests allowed by a policy. Our results show that while there are significant hurdles in leveraging LLMs for automated policy generation, LLMs show promising results when combined with symbolic approaches in analyzing existing policies.
We propose succinctness as a measure of the expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, we show that verifying properties of transformers is provably intractable (i.e. EXPSPACE-complete).
When does a deterministic computational model define a probability distribution? What are its properties? This work formalises and settles this stochasticity problem for weighted automata, and its generalisation cost register automata (CRA). We show that checking stochasticity is undecidable for CRAs in general. This motivates the study of the fully linear fragment, where a complete and tractable theory is established. For this class, stochasticity becomes decidable in polynomial time via spectral methods, and every stochastic linear CRA admits an equivalent model with locally sub-stochastic update functions. This provides a local syntactic characterisation of the semantics of the quantitative model. This local characterisation allows us to provide an algebraic Kleene-Schutzenberger characterisation for stochastic languages. The class of rational stochastic languages is the smallest class containing finite support distributions, which is closed under convex combination, Cauchy product, and discounted Kleene star. We also introduce Stochastic Regular Expressions as a complete and composable grammar for this class. Our framework provides the foundations for a formal theory of probabilistic computation, with immediate consequences for approximation, sampling, and distribution testing.
We present ZipLex, a verified framework for invertible lexical analysis. Unlike past verified lexers that focus only on satisfying the semantics of regular expressions and the maximal munch property, ZipLex also guarantees that lexing and printing are mutual inverses. Our design relies on two sets of ideas: (1) a new abstraction of token sequences that captures the separability of tokens in a sequence while supporting their efficient manipulation, and (2) a combination of verified data structures and optimizations, including Huet's zippers and memoized derivatives, to achieve practical performance. We implemented ZipLex in Scala and verified its correctness, including invertibility, using the Stainless verifier. Our evaluation demonstrates that ZipLex supports realistic applications such as JSON processing and lexers of programming languages. In comparison to other verified lexers (which do not enforce invertibility), ZipLex is 4x slower than Coqlex and two orders of magnitude faster than Verbatim++, showing that verified invertibility can be achieved without prohibitive cost.
This paper provides a new and more direct proof of the assertion that a Turing computable function of the natural numbers is primitive recursive if and only if the time complexity of the corresponding Turing machine is bounded by a primitive recursive function of the function's arguments. In addition, it provides detailed proofs of two consequences of this fact, which, although well-known in some circles, do not seem to have ever been published. The first is that the Satisfiability Problem, properly construed as a function of natural numbers, is primitive recursive. The second is a generalization asserting that all the problems in NP are similarly primitive recursive. The purpose here is to present these theorems, fully detailed, in an archival journal, thereby giving them a status of permanence and general availability.
This paper focuses on a fundamental problem on information security of bounded labeled Petri nets: non-interference analysis. As in hierarchical control, we assume that a system is observed by users at different levels, namely high-level users and low-level users. The output events produced by the firing of transitions are also partitioned into high-level output events and low-level output events. In general, high-level users can observe the occurrence of all the output events, while low-level users can only observe the occurrence of low-level output events. A system is said to be non-interferent if low-level users cannot infer the firing of transitions labeled with high-level output events by looking at low-level outputs. In this paper, we study a particular non-interference property, namely strong non-deterministic non-interference (SNNI), using a special automaton called SNNI Verifier, and propose a necessary and sufficient condition for SNNI.
This paper develops multihead finite-state compression, a generalization of finite-state compression, complementary to the multihead finite-state dimensions of Huang, Li, Lutz, and Lutz (2025). In this model, an infinite sequence of symbols is compressed by a compressor that produces outputs according to finite-state rules, based on the symbols read by a constant number of finite-state read heads moving forward obliviously through the sequence. The main theorem of this work establishes that for every sequence and every positive integer $h$, the infimum of the compression ratios achieved by $h$-head finite-state information-lossless compressors equals the $h$-head finite-state predimension of the sequence. As an immediate corollary, the infimum of these ratios over all $h$ is the multihead finite-state dimension of the sequence.
The famous problem of Busy Beavers can be stated as the question on how long a $n$-state Turing machine (using a 2-symbol alphabet or -- in a generalization -- a $m$-symbol alphabet) can run if it is started on the blank tape before it holds. Thus, not halting Turing machines are excluded. For up to four states the answer to this question is well-known. Recently, it could be verified that the widely assumed candidate for five states is in fact the champion. And there is progress in searching for good candidates with six or more states. We investigate a variant of this problem: Additionally to the requirement that the Turing machines have to start from the blank tape we only consider such Turing machines that hold on the blank tape, too. For this variant we give definitive answers on how long such a Turing machine with up to five states can run, analyze the behavior of a six-states candidate and give some findings on the generalization of Turing-machines with $m$-symbol alphabet.
Traditional approaches to inference of deterministic finite-state automata (DFA) stem from symbolic AI, including both active learning methods (e.g., Angluin's L* algorithm and its variants) and passive techniques (e.g., Biermann and Feldman's method, RPNI). Meanwhile, sub-symbolic AI, particularly machine learning, offers alternative paradigms for learning from data, such as supervised, unsupervised, and reinforcement learning (RL). This paper investigates the use of Q-learning, a well-known reinforcement learning algorithm, for the passive inference of deterministic finite automata. It builds on the core insight that the learned Q-function, which maps state-action pairs to rewards, can be reinterpreted as the transition function of a DFA over a finite domain. This provides a novel bridge between sub-symbolic learning and symbolic representations. The paper demonstrates how Q-learning can be adapted for automaton inference and provides an evaluation on several examples.
A key challenge in reinforcement learning (RL) is reward (mis)specification, whereby imprecisely defined reward functions can result in unintended, possibly harmful, behaviours. Indeed, reward functions in RL are typically treated as black-box mappings from state-action pairs to scalar values. While effective in many settings, this approach provides no information about why rewards are given, which can hinder learning and interpretability. Reward Machines address this issue by representing reward functions as finite state automata, enabling the specification of structured, non-Markovian reward functions. However, their expressivity is typically bounded by regular languages, leaving them unable to capture more complex behaviours such as counting or parametrised conditions. In this work, we build on the Runtime Monitoring Language (RML) to develop a novel class of language-based Reward Machines. By leveraging the built-in memory of RML, our approach can specify reward functions for non-regular, non-Markovian tasks. We demonstrate the expressiveness of our approach through experiments, highlighting additional advantages in flexible event-handling and task specification over existing Reward Machine-based methods.
We study the most elementary family of cellular automata defined over an arbitrary group universe $G$ and an alphabet $A$: the lazy cellular automata, which act as the identity on configurations in $A^G$, except when they read a unique active transition $p \in A^S$, in which case they write a fixed symbol $a \in A$. As expected, the dynamical behavior of lazy cellular automata is relatively simple, yet subtle questions arise since they completely depend on the choice of $p$ and $a$. In this paper, we investigate the order of a lazy cellular automaton $\tau : A^G \to A^G$, defined as the cardinality of the set $\{ \tau^k : k \in \mathbb{N} \}$. In particular, we establish a general upper bound for the order of $\tau$ in terms of $p$ and $a$, and we prove that this bound is attained when $p$ is a quasi-constant pattern.
Model checking for real-timed systems is a rich and diverse topic. Among the different logics considered, Metric Interval Temporal Logic (MITL) is a powerful and commonly used logic, which can succinctly encode many interesting timed properties especially when past and future modalities are used together. In this work, we develop a new approach for MITL model checking in the pointwise semantics, where our focus is on integrating past and maximizing determinism in the translated automata. Towards this goal, we define synchronous networks of timed automata with shared variables and show that the past fragment of MITL can be translated in linear time to synchronous networks of deterministic timed automata. Moreover determinism can be preserved even when the logic is extended with future modalities at the top-level of the formula. We further extend this approach to the full MITL with past, translating it into networks of generalized timed automata (GTA) with future clocks (which extend timed automata and event clock automata). We present an SCC-based liveness algorithm to analyse GTA. We implement our translation in a prototype tool which handles both finite and infinite timed words and supports past modalities. Our experimental evaluation demonstrates that our approach significantly outperforms the state-of-the-art in MITL satisfiability checking in pointwise semantics on a benchmark suite of 72 formulas. Finally, we implement an end-to-end model checking algorithm for pointwise semantics and demonstrate its effectiveness on two well-known benchmarks.
Group cellular automata are continuous, shift-commuting endomorphisms of $G^\mathbb{Z}$, where $G$ is a finite group. We provide an easy-to-check characterization of expansivity for group cellular automata on abelian groups and we prove that expansivity is a decidable property for general (non-abelian) groups. Moreover, we show that the class of expansive group cellular automata is strictly contained in that of topologically transitive injective group cellular automata.