Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs, a task that Large Language Models seem to struggle with. Objectives: We introduce a differentiable neuro-symbolic architecture and a loss function dedicated to learning how to solve NP-hard reasoning problems. Methods: Our new probabilistic loss allows for learning both the constraints and the objective, thus delivering a complete model that can be scrutinized and completed with side constraints. By pushing the combinatorial solver out of the training loop, our architecture also offers scalable training while exact inference gives access to maximum accuracy. Results: We empirically show that it can efficiently learn how to solve NP-hard reasoning problems from natural inputs. On three variants of the Sudoku benchmark -- symbolic, visual, and many-solution --, our approach requires a fraction of training time of other hybrid methods. On a visual Min-Cut/Max-cut task, it optimizes the regret better than a Decision-Focused-Learning regret-dedicated loss. Finally, it efficiently learns the energy optimization formulation of the large real-world problem of designing proteins.
Metis is an ordered paramodulation prover built into the Isabelle/HOL proof assistant. It attempts to close the current goal using a given list of lemmas. Typically these lemmas are found by Sledgehammer, a tool that integrates external automatic provers. We present a new tool that analyzes successful Metis proofs to derive variable instantiations. These increase Sledgehammer's success rate, improve the speed of Sledgehammer-generated proofs, and help users understand why a goal follows from the lemmas.
We formalize a proof that any stochastic and iterative global optimization algorithm is consistent over Lipschitz continuous functions if and only if it samples the whole search space. To achieve this, we use the L$\exists$$\forall$N theorem prover and the Mathlib library. The major challenge of this formalization, apart from the technical aspects of the proof itself, is to converge to a definition of a stochastic and iterative global optimization algorithm that is both general enough to encompass all algorithms of this type and specific enough to be used in a formal proof. We define such an algorithm as a pair of an initial probability measure and a sequence of Markov kernels that describe the distribution of the next point sampled by the algorithm given the previous points and their evaluations. We then construct a probability measure on finite and infinite sequences of iterations of the algorithm using the Ionescu-Tulcea theorem.
Machine-learning models are increasingly driving decisions in high-stakes settings, such as finance, law, and hiring, thus, highlighting the need for transparency. However, the key challenge is to balance transparency -- clarifying `why' a decision was made -- with recourse: providing actionable steps on `how' to achieve a favourable outcome from an unfavourable outcome. Counterfactual explanations reveal `why' an undesired outcome occurred and `how' to reverse it through targeted feature changes (interventions). Current counterfactual approaches have limitations: 1) they often ignore causal dependencies between features, and 2) they typically assume all interventions can happen simultaneously, an unrealistic assumption in practical scenarios where actions are typically taken in a sequence. As a result, these counterfactuals are often not achievable in the real world. We present P2C (Path-to-Counterfactuals), a model-agnostic framework that produces a plan (ordered sequence of actions) converting an unfavourable outcome to a causally consistent favourable outcome. P2C addresses both limitations by 1) Explicitly modelling causal relationships between features and 2) Ensuring that each intermediate state in the plan is feasible and causally valid. P2C uses the goal-directed Answer Set Programming system s(CASP) to generate the plan accounting for feature changes that happen automatically due to causal dependencies. Furthermore, P2C refines cost (effort) computation by only counting changes actively made by the user, resulting in realistic cost estimates. Finally, P2C highlights how its causal planner outperforms standard planners, which lack causal knowledge and thus can generate illegal actions.
The study of categories abstracting the structural properties of relations has been extensively developed over the years, resulting in a rich and diverse body of work. A previous paper offered a survey providing a modern and comprehensive presentation of these ``categories for relations'' as instances of gs-monoidal categories, showing how they arise as Kleisli categories of suitable symmetric monoidal monads. The end result was a taxonomy that organised numerous related concepts in the literature, including in particular Markov and restriction categories. This paper further enriches the taxonomy: it proposes two categories that are once more instances of gs-monoidal categories, yet more abstract than Markov and restriction categories. They are characterised by an axiomatic notion of mass and domain of an arrow, the latter one of the key ingredient of restriction categories, which generalises the domain of partial functions. The paper then introduces mass and domain preserving monads, proving that the associated Kleisli categories in fact preserve the corresponding equations and that these monads arise naturally for the categories of semiring-weighted relations.
The past decade has witnessed substantial developments in string solving. Motivated by the complexity of string solving strategies adopted in existing string solvers, we investigate a simple and generic method for solving string constraints: regular constraint propagation. The method repeatedly computes pre- or post-images of regular languages under the string functions present in a string formula, inferring more and more knowledge about the possible values of string variables, until either a conflict is found or satisfiability of the string formula can be concluded. Such a propagation strategy is applicable to string constraints with multiple operations like concatenation, replace, and almost all flavors of string transductions. We demonstrate the generality and effectiveness of this method theoretically and experimentally. On the theoretical side, we show that RCP is sound and complete for a large fragment of string constraints, subsuming both straight-line and chain-free constraints, two of the most expressive decidable fragments for which some modern string solvers provide formal completeness guarantees. On the practical side, we implement regular constraint propagation within the open-source string solver OSTRICH. Our experimental evaluation shows that this addition significantly improves OSTRICH's performance and makes it competitive with existing solvers. In fact, it substantially outperforms other solvers on random PCP and bioinformatics benchmarks. The results also suggest that incorporating regular constraint propagation alongside other techniques could lead to substantial performance gains for existing solvers.
Formal verification is crucial for ensuring the robustness of security protocols against adversarial attacks. The Needham-Schroeder protocol, a foundational authentication mechanism, has been extensively studied, including its integration with Physical Layer Security (PLS) techniques such as watermarking and jamming. Recent research has used ProVerif to verify these mechanisms in terms of secrecy. However, the ProVerif-based approach limits the ability to improve understanding of security beyond verification results. To overcome these limitations, we re-model the same protocol using an Isabelle formalism that generates sound animation, enabling interactive and automated formal verification of security protocols. Our modelling and verification framework is generic and highly configurable, supporting both cryptography and PLS. For the same protocol, we have conducted a comprehensive analysis (secrecy and authenticity in four different eavesdropper locations under both passive and active attacks) using our new web interface. Our findings not only successfully reproduce and reinforce previous results on secrecy but also reveal an uncommon but expected outcome: authenticity is preserved across all examined scenarios, even in cases where secrecy is compromised. We have proposed a PLS-based Diffie-Hellman protocol that integrates watermarking and jamming, and our analysis shows that it is secure for deriving a session key with required authentication. These highlight the advantages of our novel approach, demonstrating its robustness in formally verifying security properties beyond conventional methods.
We present a comprehensive system for addressing Tasks A, B, and C of the LLMs4OL 2025 challenge, which together span the full ontology construction pipeline: term extraction, typing, and taxonomy discovery. Our approach combines retrieval-augmented prompting, zero-shot classification, and attention-based graph modeling -- each tailored to the demands of the respective task. For Task A, we jointly extract domain-specific terms and their ontological types using a retrieval-augmented generation (RAG) pipeline. Training data was reformulated into a document to terms and types correspondence, while test-time inference leverages semantically similar training examples. This single-pass method requires no model finetuning and improves overall performance through lexical augmentation Task B, which involves assigning types to given terms, is handled via a dual strategy. In the few-shot setting (for domains with labeled training data), we reuse the RAG scheme with few-shot prompting. In the zero-shot setting (for previously unseen domains), we use a zero-shot classifier that combines cosine similarity scores from multiple embedding models using confidence-based weighting. In Task C, we model taxonomy discovery as graph inference. Using embeddings of type labels, we train a lightweight cross-attention layer to predict is-a relations by approximating a soft adjacency matrix. These modular, task-specific solutions enabled us to achieve top-ranking results in the official leaderboard across all three tasks. Taken together these strategies showcase the scalability, adaptability, and robustness of LLM-based architectures for ontology learning across heterogeneous domains. Code is available at: https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek
We study the extension of Presburger arithmetic by the class of sub-polynomial Hardy field functions, and show the majority of these extensions to be undecidable. More precisely, we show that the theory $\mathrm{Th}(\mathbb{Z}; <, +, \lfloor f \rceil)$, where $f$ is a Hardy field function and $\lfloor \cdot \rceil$ the nearest integer operator, is undecidable when $f$ grows polynomially faster than $x$. Further, we show that when $f$ grows sub-linearly quickly, but still as fast as some polynomial, the theory $\mathrm{Th}(\mathbb{Z}; <, +, \lfloor f \rceil)$ is undecidable.
A main open question in contemporary AI research is quantifying the forms of reasoning neural networks can perform when perfectly trained. This paper answers this by interpreting reasoning tasks as circuit emulation, where the gates define the type of reasoning; e.g. Boolean gates for predicate logic, tropical circuits for dynamic programming, arithmetic and analytic gates for symbolic mathematical representation, and hybrids thereof for deeper reasoning; e.g. higher-order logic. We present a systematic meta-algorithm that converts essentially any circuit into a feedforward neural network (NN) with ReLU activations by iteratively replacing each gate with a canonical ReLU MLP emulator. We show that, on any digital computer, our construction emulates the circuit exactly--no approximation, no rounding, modular overflow included--demonstrating that no reasoning task lies beyond the reach of neural networks. The number of neurons in the resulting network (parametric complexity) scales with the circuit's complexity, and the network's computational graph (structure) mirrors that of the emulated circuit. This formalizes the folklore that NNs networks trade algorithmic run-time (circuit runtime) for space complexity (number of neurons). We derive a range of applications of our main result, from emulating shortest-path algorithms on graphs with cubic--size NNs, to simulating stopped Turing machines with roughly quadratically--large NNs, and even the emulation of randomized Boolean circuits. Lastly, we demonstrate that our result is strictly more powerful than a classical universal approximation theorem: any universal function approximator can be encoded as a circuit and directly emulated by a NN.
We analyse the complexity of the satisfiability problem ssmSAT for State Space Models (SSM), which asks whether an input sequence can lead the model to an accepting configuration. We find that ssmSAT is undecidable in general, reflecting the computational power of SSM. Motivated by practical settings, we identify two natural restrictions under which ssmSAT becomes decidable and establish corresponding complexity bounds. First, for SSM with bounded context length, ssmSAT is NP-complete when the input length is given in unary and in NEXPTIME (and PSPACE-hard) when the input length is given in binary. Second, for quantised SSM operating over fixed-width arithmetic, ssmSAT is PSPACE-complete resp. in EXPSPACE depending on the bit-width encoding. While these results hold for diagonal gated SSM we also establish complexity bounds for time-invariant SSM. Our results establish a first complexity landscape for formal reasoning in SSM and highlight fundamental limits and opportunities for the verification of SSM-based language models.
Reactive synthesis addresses the problem of generating a controller for a temporal specification in an adversarial environment; it was typically studied for LTL. Driven by applications ranging from AI to business process management, LTL modulo first order-theories over finite traces (LTLfMT) has recently gained traction, where propositional variables in properties are replaced by first-order constraints. Though reactive synthesis for LTLf with some first-order features has been addressed, existing work in this direction strongly restricts or excludes the possibility to compare variables across instants, a limitation that severely restricts expressiveness and applicability. In this work we present a reactive synthesis procedure for LTLfMT, where properties support "lookback" to model cross-instant comparison of variables. Our procedure works for full LTLfMT with lookback, subsuming the fragments of LTLfMT for which realizability was studied earlier. However, the setting with cross-instant comparison is inherently highly complex, as realizability is undecidable even over decidable background theories. Hence termination of our approach is in general not guaranteed. Nevertheless, we prove its soundness, and show that it is complete if a bound on the strategy length exists. Finally, we show that our approach constitutes a decision procedure for several relevant fragments of LTLfMT, at once re-proving known decidability results and identifying new decidable classes.
Concurrent separation logic with fractional permissions (CSLPerm) provides a promising reasoning system to verify most complex sequential and concurrent fine-grained programs. The logic with strong and weak separating conjunctions offers a solid foundation for producing concise and precise proofs. However, it lacks automation and compositionality support. This paper addresses this limitation by introducing a compositional verification system for concurrent programs that manipulate regions of shared memory. The centre of our system is novel logical principles and an entailment procedure that can infer the residual heaps in the frame rule for a fragment of CSL-Perm with explicit arithmetical constraints for memory heaps' disjointness. This procedure enables the compositional reasoning for concurrent threads and function calls. We have implemented the proposal in a prototype tool called CoSl, tested it with 10 challenging concurrent programs, including those beyond the state-of-the-art, and confirmed the advantage of our approach.
Lightweight validation technique, such as those based on random testing, are sometimes practical alternatives to full formal verification -- providing valuable benefits, such as finding bugs, without requiring a disproportionate effort. In fact, they can be useful even for fully formally verified tools, by exercising the parts of a complex system that go beyond the reach of formal models. In this context, this paper introduces BCC: a model-based testing technique for the Boogie intermediate verifier. BCC combines the formalization of a small, deterministic subset of the Boogie language with the generative capabilities of the PLT Redex language engineering framework. Basically, BCC uses PLT Redex to generate random Boogie programs, and to execute them according to a formal operational semantics; then, it runs the same programs through the Boogie verifier. Any inconsistency between the two executions (in PLT Redex and with Boogie) may indicate a potential bug in Boogie's implementation. To understand whether BCC can be useful in practice, we used it to generate three million Boogie programs. These experiments found 2% of cases indicative of completeness failures (i.e., spurious verification failures) in Boogie's toolchain. These results indicate that lightweight analysis tools, such as those for model-based random testing, are also useful to test and validate formal verification tools such as Boogie.
Multi-objective probabilistic model checking is a powerful technique for verifying stochastic systems against multiple (potentially conflicting) properties. To enhance the trustworthiness and explainability of model checking tools, we present independently checkable certificates and witnesses for multi-objective {\omega}-regular queries in Markov decision processes. For the certification, we extend and improve existing certificates for the decomposition of maximal end components and reachability properties. We then derive mixed-integer linear programs (MILPs) for finding minimal witnessing subsystems. For the special case of Markov chains and LTL properties, we use unambiguous B\"uchi automata to find witnesses, resulting in an algorithm that requires single-exponential space. Existing approaches based on deterministic automata require doubly-exponential space in the worst case. Finally, we consider the practical computation of our certificates and witnesses and provide an implementation of the developed techniques, along with an experimental evaluation, demonstrating the efficacy of our techniques.
Monitoring is a runtime verification technique that allows one to check whether an ongoing computation of a system (partial trace) satisfies a given formula. It does not need a complete model of the system, but it typically requires the construction of a deterministic automaton doubly exponential in the size of the formula (in the worst case), which limits its practicality. In this paper, we show that, when considering finite, discrete traces, monitoring of pure past (co)safety fragments of Signal Temporal Logic (STL) can be reduced to trace checking, that is, evaluation of a formula over a trace, that can be performed in time polynomial in the size of the formula and the length of the trace. By exploiting such a result, we develop a GPU-accelerated framework for interpretable early failure detection based on vectorized trace checking, that employs genetic programming to learn temporal properties from historical trace data. The framework shows a 2-10% net improvement in key performance metrics compared to the state-of-the-art methods.
We present a family of paraconsistent counterparts of the constructive modal logic CK. These logics aim to formalise reasoning about contradictory but non-trivial propositional attitudes like beliefs or obligations. We define their Kripke-style semantics based on intuitionistic frames with two valuations which provide independent support for truth and falsity; they are connected by strong negation as defined in Nelson's logic. A family of systems is obtained depending on whether both modal operators are defined using the same or by different accessibility relations for their positive and negative support. We propose Hilbert-style axiomatisations for all logics determined by this semantic framework. We also propose a~family of modular cut-free sequent calculi that we use to establish decidability.
Partial derivatives of regular expressions, introduced by Antimirov, define an elegant algorithm for generating equivalent non-deterministic finite automata (NFA) with a limited number of states. Here we focus on runtime verification (RV) of simple properties expressible with regular expressions. In this case, words are finite traces of monitorable events forming the language's alphabet, and the generated NFA may have an intractable number of states. This typically occurs when sub-traces of mutually independent events are allowed to interleave. To address this issue, regular expressions used for RV are extended with the shuffle operator to make specifications more compact and easier to read. Exploiting partial derivatives enables a rewriting-based approach to RV, where only one derivative is stored at each step, avoiding the construction of an intractably large automaton. This raises the question of the space complexity of the largest generated partial derivative. While the total number of generated partial derivatives is known to be linear in the size of the initial regular expression, no results can be found in the literature regarding the size of the largest partial derivative. We study this problem w.r.t. two metrics (height and size of regular expressions), and show that the former increases by at most one, while the latter is quadratic in the size of the regular expression. Surprisingly, these results also hold with shuffle.