Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Codifying mathematical theories in a proof assistant or computer algebra system is a challenging task, of which the most difficult part is, counterintuitively, structuring definitions. This results in a steep learning curve for new users and slow progress in formalizing even undergraduate level mathematics. There are many considerations one has to make, such as level of generality, readability, and ease of use in the type system, and there are typically multiple equivalent or related definitions from which to choose. Often, a definition that is ultimately selected for formalization is settled on after a lengthy trial and error process. This process involves testing potential definitions for usability by formalizing standard theorems about them, and weeding out the definitions that are unwieldy. Inclusion of a formal definition in a centralized community-run mathematical library is typically an indication that the definition is "good." For this reason, in this survey, we make some observations about what makes a definition "good," and examine several case studies of the refining process for definitions that have ultimately been added to the Lean Theorem Prover community-run mathematical library, mathlib. We observe that some of the difficulties are shared with the design of libraries for computer algebra systems, and give examples of related issues in that context.
This article proposes a paraconsistent framework for evaluating similarity in knowledge bases. Unlike classical approaches, this framework explicitly integrates contradictions, enabling a more robust and interpretable similarity measure. A new measure $ S^* $ is introduced, which penalizes inconsistencies while rewarding shared properties. Paraconsistent super-categories $ \Xi_K^* $ are defined to hierarchically organize knowledge entities. The model also includes a contradiction extractor $ E $ and a repair mechanism, ensuring consistency in the evaluations. Theoretical results guarantee reflexivity, symmetry, and boundedness of $ S^* $. This approach offers a promising solution for managing conflicting knowledge, with perspectives in multi-agent systems.
Neurosymbolic (NeSy) frameworks combine neural representations and learning with symbolic representations and reasoning. Combining the reasoning capacities, explainability, and interpretability of symbolic processing with the flexibility and power of neural computing allows us to solve complex problems with more reliability while being data-efficient. However, this recently growing topic poses a challenge to developers with its learning curve, lack of user-friendly tools, libraries, and unifying frameworks. In this paper, we characterize the technical facets of existing NeSy frameworks, such as the symbolic representation language, integration with neural models, and the underlying algorithms. A majority of the NeSy research focuses on algorithms instead of providing generic frameworks for declarative problem specification to leverage problem solving. To highlight the key aspects of Neurosymbolic modeling, we showcase three generic NeSy frameworks - \textit{DeepProbLog}, \textit{Scallop}, and \textit{DomiKnowS}. We identify the challenges within each facet that lay the foundation for identifying the expressivity of each framework in solving a variety of problems. Building on this foundation, we aim to spark transformative action and encourage the community to rethink this problem in novel ways.
Upgradation of Programmable Logic Controller (PLC) software is quite common to accommodate evolving industrial requirements. Verifying the correctness of such upgrades remains a significant challenge. In this paper, we propose a verification-based approach to ensure the correctness of the existing functionality in the upgraded version of a PLC software. The method converts the older and the newer versions of the sequential function chart (SFC) into two Petri net models. We then verify whether one model is contained within another, based on a novel containment checking algorithm grounded in symbolic path equivalence. For this purpose, we have developed a home-grown Petri net-based containment checker. Experimental evaluation on 80 real-world benchmarks from the OSCAT library highlights the scalability and effectiveness of the framework. We have compared our approach with verifAPS, a popular tool used for software upgradation, and observed nearly 4x performance improvement.
In automated planning, control parameters extend standard action representations through the introduction of continuous numeric decision variables. Existing state-of-the-art approaches have primarily handled control parameters as embedded constraints alongside other temporal and numeric restrictions, and thus have implicitly treated them as additional constraints rather than as decision points in the search space. In this paper, we propose an efficient alternative that explicitly handles control parameters as true decision points within a systematic search scheme. We develop a best-first, heuristic search algorithm that operates over infinite decision spaces defined by control parameters and prove a notion of completeness in the limit under certain conditions. Our algorithm leverages the concept of delayed partial expansion, where a state is not fully expanded but instead incrementally expands a subset of its successors. Our results demonstrate that this novel search algorithm is a competitive alternative to existing approaches for solving planning problems involving control parameters.
These notes originate from a reading course held by the authors in the spring of 2024 at the Universit\`a di Genova. They provide a hands-on introduction to the F4 and FGLM algorithms. In addition to the notes, we present two implementations of the algorithms: FGLM in CoCoALib and F4 in Sage. These implementations closely follow the structure of the algorithms as described here and are intended to help readers experiment with them in practice, thereby gaining a deeper understanding.
Symbolic regression (SR) has emerged as a powerful tool for automated scientific discovery, enabling the derivation of governing equations from experimental data. A growing body of work illustrates the promise of integrating domain knowledge into the SR to improve the discovered equation's generality and usefulness. Physics-informed SR (PiSR) addresses this by incorporating domain knowledge, but current methods often require specialized formulations and manual feature engineering, limiting their adaptability only to domain experts. In this study, we leverage pre-trained Large Language Models (LLMs) to facilitate knowledge integration in PiSR. By harnessing the contextual understanding of LLMs trained on vast scientific literature, we aim to automate the incorporation of domain knowledge, reducing the need for manual intervention and making the process more accessible to a broader range of scientific problems. Namely, the LLM is integrated into the SR's loss function, adding a term of the LLM's evaluation of the SR's produced equation. We extensively evaluate our method using three SR algorithms (DEAP, gplearn, and PySR) and three pre-trained LLMs (Falcon, Mistral, and LLama 2) across three physical dynamics (dropping ball, simple harmonic motion, and electromagnetic wave). The results demonstrate that LLM integration consistently improves the reconstruction of physical dynamics from data, enhancing the robustness of SR models to noise and complexity. We further explore the impact of prompt engineering, finding that more informative prompts significantly improve performance.
Large language models (LLMs) typically deploy safety mechanisms to prevent harmful content generation. Most current approaches focus narrowly on risks posed by malicious actors, often framing risks as adversarial events and relying on defensive refusals. However, in real-world settings, risks also come from non-malicious users seeking help while under psychological distress (e.g., self-harm intentions). In such cases, the model's response can strongly influence the user's next actions. Simple refusals may lead them to repeat, escalate, or move to unsafe platforms, creating worse outcomes. We introduce Constructive Safety Alignment (CSA), a human-centric paradigm that protects against malicious misuse while actively guiding vulnerable users toward safe and helpful results. Implemented in Oyster-I (Oy1), CSA combines game-theoretic anticipation of user reactions, fine-grained risk boundary discovery, and interpretable reasoning control, turning safety into a trust-building process. Oy1 achieves state-of-the-art safety among open models while retaining high general capabilities. On our Constructive Benchmark, it shows strong constructive engagement, close to GPT-5, and unmatched robustness on the Strata-Sword jailbreak dataset, nearing GPT-o1 levels. By shifting from refusal-first to guidance-first safety, CSA redefines the model-user relationship, aiming for systems that are not just safe, but meaningfully helpful. We release Oy1, code, and the benchmark to support responsible, user-centered AI.
An effective method for optimizing path planning for a specific model of a 6-degree-of-freedom (6-DOF) robot manipulator is presented as part of the motion planning of the manipulator using computer algebra. We assume that we are given a path in the form of a set of line segments that the end-effector should follow. We also assume that we have a method to solve the inverse kinematic problem of the manipulator at each via-point of the trajectory. The proposed method consists of three steps. First, we calculate the feasible region of the manipulator under a specific configuration of the end-effector. Next, we aim to find a trajectory on the line segments and a sequence of joint configurations the manipulator should follow to move the end-effector along the specified trajectory. Finally, we find the optimal combination of solutions to the inverse kinematic problem at each via-point along the trajectory by reducing the problem to a shortest-path problem of the graph and applying Dijkstra's algorithm. We show the effectiveness of the proposed method by experiments.
We propose an effective method for solving the inverse kinematic problem of a specific model of 6-degree-of-freedom (6-DOF) robot manipulator using computer algebra. It is known that when the rotation axes of three consecutive rotational joints of a manipulator intersect at a single point, the inverse kinematics problem can be divided into determining position and orientation. We extend this method to more general manipulators in which the rotational axes of two consecutive joints intersect. This extension broadens the class of 6-DOF manipulators for which the inverse kinematics problem can be solved, and is expected to enable more efficient solutions. The inverse kinematic problem is solved using the Comprehensive Gr\"obner System (CGS) with joint parameters of the robot appearing as parameters in the coefficients to prevent repetitive calculations of the Gr\"obner bases. The effectiveness of the proposed method is shown by experiments.
Parametric linear systems are linear systems of equations in which some symbolic parameters, that is, symbols that are not considered to be candidates for elimination or solution in the course of analyzing the problem, appear in the coefficients of the system. In this work we assume that the symbolic parameters appear polynomially in the coefficients and that the only variables to be solved for are those of the linear system. The consistency of the system and expression of the solutions may vary depending on the values of the parameters. It is well-known that it is possible to specify a covering set of regimes, each of which is a semi-algebraic condition on the parameters together with a solution description valid under that condition. We provide a method of solution that requires time polynomial in the matrix dimension and the degrees of the polynomials when there are up to three parameters. In previous methods the number of regimes needed is exponential in the system dimension and polynomial degree of the parameters. Our approach exploits the Hermite and Smith normal forms that may be computed when the system coefficient domain is mapped to the univariate polynomial domain over suitably constructed fields. Our approach effectively identifies intrinsic singularities and ramification points where the algebraic and geometric structure of the matrix changes.
In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs, a task that Large Language Models seem to struggle with. Objectives: We introduce a differentiable neuro-symbolic architecture and a loss function dedicated to learning how to solve NP-hard reasoning problems. Methods: Our new probabilistic loss allows for learning both the constraints and the objective, thus delivering a complete model that can be scrutinized and completed with side constraints. By pushing the combinatorial solver out of the training loop, our architecture also offers scalable training while exact inference gives access to maximum accuracy. Results: We empirically show that it can efficiently learn how to solve NP-hard reasoning problems from natural inputs. On three variants of the Sudoku benchmark -- symbolic, visual, and many-solution --, our approach requires a fraction of training time of other hybrid methods. On a visual Min-Cut/Max-cut task, it optimizes the regret better than a Decision-Focused-Learning regret-dedicated loss. Finally, it efficiently learns the energy optimization formulation of the large real-world problem of designing proteins.
We refine the bit complexity analysis of an algorithm for the computation of at least one point per connected component of a smooth real algebraic set, yielding exponential speedup (with respect to the number of variables) compared to prior works. The algorithm which is analyzed is based on the critical point method, reducing the problem to computations of critical points associated to the restriction of generic projections on lines to the studied variety. Our refinement, and the subsequent improved complexity statement, comes from a better utilization of the multi-affine structure of polynomial systems encoding these sets of critical points. The bit-size estimates on the size of the output produced by this algorithm are also improved by this refinement.
Analytic combinatorics studies asymptotic properties of families of combinatorial objects using complex analysis on their generating functions. In their reference book on the subject, Flajolet and Sedgewick describe a general approach that allows one to derive precise asymptotic expansions starting from systems of combinatorial equations. In the situation where the combinatorial system involves only cartesian products and disjoint unions, the generating functions satisfy polynomial systems with positivity constraints for which many results and algorithms are known. We extend these results to the general situation. This produces an almost complete algorithmic chain going from combinatorial systems to asymptotic expansions. Thus, it is possible to compute asymptotic expansions of all generating functions produced by the symbolic method of Flajolet and Sedgewick when they have algebraic-logarithmic singularities (which can be decided), under the assumption that Schanuel's conjecture from number theory holds. That conjecture is not needed for systems that do not involve the constructions of sets and cycles.
This paper presents optimizations to improve the scalability of reachability analysis on a subclass of hybrid automata extended with stochasticity. The optimizations target different components of the analysis, such as quantifier elimination for state set projection, and automated parameter selection during the numerical integration. Most importantly, whereas the original method combines forward and backward reachability, we show that the usage of backward reachability is optional for computing maximal reachability probabilities.
We present a comprehensive system for addressing Tasks A, B, and C of the LLMs4OL 2025 challenge, which together span the full ontology construction pipeline: term extraction, typing, and taxonomy discovery. Our approach combines retrieval-augmented prompting, zero-shot classification, and attention-based graph modeling -- each tailored to the demands of the respective task. For Task A, we jointly extract domain-specific terms and their ontological types using a retrieval-augmented generation (RAG) pipeline. Training data was reformulated into a document to terms and types correspondence, while test-time inference leverages semantically similar training examples. This single-pass method requires no model finetuning and improves overall performance through lexical augmentation Task B, which involves assigning types to given terms, is handled via a dual strategy. In the few-shot setting (for domains with labeled training data), we reuse the RAG scheme with few-shot prompting. In the zero-shot setting (for previously unseen domains), we use a zero-shot classifier that combines cosine similarity scores from multiple embedding models using confidence-based weighting. In Task C, we model taxonomy discovery as graph inference. Using embeddings of type labels, we train a lightweight cross-attention layer to predict is-a relations by approximating a soft adjacency matrix. These modular, task-specific solutions enabled us to achieve top-ranking results in the official leaderboard across all three tasks. Taken together these strategies showcase the scalability, adaptability, and robustness of LLM-based architectures for ontology learning across heterogeneous domains. Code is available at: https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek
Contemporary artificial intelligence (AI) development largely centers on rational decision-making, valued for its measurability and suitability for objective evaluation. Yet in real-world contexts, an intelligent agent's decisions are shaped not only by logic but also by deeper influences such as beliefs, values, and preferences. The diversity of human decision-making styles emerges from these differences, highlighting that "style" is an essential but often overlooked dimension of intelligence. This dissertation introduces playstyle as an alternative lens for observing and analyzing the decision-making behavior of intelligent agents, and examines its foundational meaning and historical context from a philosophical perspective. By analyzing how beliefs and values drive intentions and actions, we construct a two-tier framework for style formation: the external interaction loop with the environment and the internal cognitive loop of deliberation. On this basis, we formalize style-related characteristics and propose measurable indicators such as style capacity, style popularity, and evolutionary dynamics. The study focuses on three core research directions: (1) Defining and measuring playstyle, proposing a general playstyle metric based on discretized state spaces, and extending it to quantify strategic diversity and competitive balance; (2) Expressing and generating playstyle, exploring how reinforcement learning and imitation learning can be used to train agents exhibiting specific stylistic tendencies, and introducing a novel approach for human-like style learning and modeling; and (3) Practical applications, analyzing the potential of these techniques in domains such as game design and interactive entertainment. Finally, the dissertation outlines future extensions, including the role of style as a core element in building artificial general intelligence (AGI).
The graph database (GDB) is an increasingly common storage model for data involving relationships between entries. Beyond its widespread usage in database industries, the advantages of GDBs indicate a strong potential in constructing symbolic artificial intelligences (AIs) and retrieval-augmented generation (RAG), where knowledge of data inter-relationships takes a critical role in implementation. However, current GDB models are not optimised for hardware acceleration, leading to bottlenecks in storage capacity and computational efficiency. In this paper, we propose a hardware-friendly GDB model, called Views. We show its data structure and organisation tailored for efficient storage and retrieval of graph data and demonstrate its equivalence to represent traditional graph representations. We further demonstrate its symbolic processing abilities in semantic reasoning and cognitive modelling with practical examples and provide a short perspective on future developments.