Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Myopic optimization (MO) outperforms reinforcement learning (RL) in portfolio management: RL yields lower or negative returns, higher variance, larger costs, heavier CVaR, lower profitability, and greater model risk. We model execution/liquidation frictions with mark-to-market accounting. Using Malliavin calculus (Clark-Ocone/BEL), we derive policy gradients and risk shadow price, unifying HJB and KKT. This gives dual gap and convergence results: geometric MO vs. RL floors. We quantify phantom profit in RL via Malliavin policy-gradient contamination analysis and define a control-affects-dynamics (CAD) premium of RL indicating plausibly positive.
Reinforcement Learning has emerged as a promising framework for developing adaptive and data-driven strategies, enabling market makers to optimize decision-making policies based on interactions with the limit order book environment. This paper explores the integration of a reinforcement learning agent in a market-making context, where the underlying market dynamics have been explicitly modeled to capture observed stylized facts of real markets, including clustered order arrival times, non-stationary spreads and return drifts, stochastic order quantities and price volatility. These mechanisms aim to enhance stability of the resulting control agent, and serve to incorporate domain-specific knowledge into the agent policy learning process. Our contributions include a practical implementation of a market making agent based on the Proximal-Policy Optimization (PPO) algorithm, alongside a comparative evaluation of the agent's performance under varying market conditions via a simulator-based environment. As evidenced by our analysis of the financial return and risk metrics when compared to a closed-form optimal solution, our results suggest that the reinforcement learning agent can effectively be used under non-stationary market conditions, and that the proposed simulator-based environment can serve as a valuable tool for training and pre-training reinforcement learning agents in market-making scenarios.
Prediction markets have gained adoption as on-chain mechanisms for aggregating information, with platforms such as Polymarket demonstrating demand for stablecoin-denominated markets. However, denominating in non-interest-bearing stablecoins introduces inefficiencies: participants face opportunity costs relative to the fiat risk-free rate, and Bitcoin holders in particular lose exposure to BTC appreciation when converting into stablecoins. This paper explores the case for prediction markets denominated in Bitcoin, treating BTC as a deflationary settlement asset analogous to gold under the classical gold standard. We analyse three methods of supplying liquidity to a newly created BTC-denominated prediction market: cross-market making against existing stablecoin venues, automated market making, and DeFi-based redirection of user trades. For each approach we evaluate execution mechanics, risks (slippage, exchange-rate risk, and liquidation risk), and capital efficiency. Our analysis shows that cross-market making provides the most user-friendly risk profile, though it requires active professional makers or platform-subsidised liquidity. DeFi redirection offers rapid bootstrapping and reuse of existing USDC liquidity, but exposes users to liquidation thresholds and exchange-rate volatility, reducing capital efficiency. Automated market making is simple to deploy but capital-inefficient and exposes liquidity providers to permanent loss. The results suggest that BTC-denominated prediction markets are feasible, but their success depends critically on the choice of liquidity provisioning mechanism and the trade-off between user safety and deployment convenience.
We study how sentiment shocks propagate through equity returns and investor clientele using four independent proxies with sign-aligned kappa-rho parameters. A structural calibration links a one standard deviation innovation in sentiment to a pricing impact of 1.06 basis points with persistence parameter rho = 0.940, yielding a half-life of 11.2 months. The impulse response peaks around the 12-month horizon, indicating slow-moving amplification. Cross-sectionally, a simple D10-D1 portfolio earns 4.0 basis points per month with Sharpe ratios of 0.18-0.85, consistent with tradable exposure to the sentiment factor. Three regularities emerge: (i) positive sentiment innovations transmit more strongly than negative shocks (amplification asymmetry); (ii) effects are concentrated in retail-tilted and non-optionable stocks (clientele heterogeneity); and (iii) responses are state-dependent across volatility regimes - larger on impact in high-VIX months but more persistent in low-VIX months. Baseline time-series fits are parsimonious (R2 ~ 0.001; 420 monthly observations), yet the calibrated dynamics reconcile modest impact estimates with sizable long-short payoffs. Consistent with Miller (1977), a one standard deviation sentiment shock has 1.72-8.69 basis points larger effects in low-breadth stocks across horizons of 1-12 months, is robust to institutional flows, and exhibits volatility state dependence - larger on impact but less persistent in high-VIX months, smaller on impact but more persistent in low-VIX months.
Developing professional, structured reasoning on par with human financial analysts and traders remains a central challenge in AI for finance, where markets demand interpretability and trust. Traditional time-series models lack explainability, while LLMs face challenges in turning natural-language analysis into disciplined, executable trades. Although reasoning LLMs have advanced in step-by-step planning and verification, their application to risk-sensitive financial decisions is underexplored. We present Trading-R1, a financially-aware model that incorporates strategic thinking and planning for comprehensive thesis composition, facts-grounded analysis, and volatility-adjusted decision making. Trading-R1 aligns reasoning with trading principles through supervised fine-tuning and reinforcement learning with a three-stage easy-to-hard curriculum. Training uses Tauric-TR1-DB, a 100k-sample corpus spanning 18 months, 14 equities, and five heterogeneous financial data sources. Evaluated on six major equities and ETFs, Trading-R1 demonstrates improved risk-adjusted returns and lower drawdowns compared to both open-source and proprietary instruction-following models as well as reasoning models. The system generates structured, evidence-based investment theses that support disciplined and interpretable trading decisions. Trading-R1 Terminal will be released at https://github.com/TauricResearch/Trading-R1.
To understand the emergence of Ultrafast Extreme Events (UEEs), the influence of algorithmic trading or high-frequency traders is of major interest as they make it extremely difficult to intervene and to stabilize financial markets. In an empirical analysis, we compare various characteristics of UEEs over different years for the US stock market to assess the possible non-stationarity of the effects. We show that liquidity plays a dominant role in the emergence of UEEs and find a general pattern in their dynamics. We also empirically investigate the after-effects in view of the recovery rate. We find common patterns for different years. We explain changes in the recovery rate by varying market sentiments for the different years.
Recent regulation on intraday electricity markets has led to the development of shared order books with the intention to foster competition and increase market liquidity. In this paper, we address the question of the efficiency of such regulations by analysing the situation of two exchanges sharing a single limit order book, i.e. a quote by a market maker can be hit by a trade arriving on the other exchange. We develop a Principal-Agent model where each exchange acts as the Principal of her own market maker acting as her Agent. Exchanges and market makers have all CARA utility functions with potentially different risk-aversion parameters. In terms of mathematical result, we show existence and uniqueness of the resulting Nash equilibrium between exchanges, give the optimal incentive contracts and provide numerical solution to the PDE satisfied by the certainty equivalent of the exchanges. From an economic standpoint, our model demonstrates that incentive provision constitutes a public good. More precisely, it highlights the presence of a competitiveness spillover effect: when one exchange optimally incentivizes its market maker, the competing exchange also reaps indirect benefits. This interdependence gives rise to a free-rider problem. Given that providing incentives entails a cost, the strategic interaction between exchanges may lead to an equilibrium in which neither platform offers incentives -- ultimately resulting in diminished overall competition.
We study the problem of optimal liquidity withdrawal for a representative liquidity provider (LP) in an automated market maker (AMM). LPs earn fees from trading activity but are exposed to impermanent loss (IL) due to price fluctuations. While existing work has focused on static provision and exogenous exit strategies, we characterise the optimal exit time as the solution to a stochastic control problem with an endogenous stopping time. Mathematically, the LP's value function is shown to satisfy a Hamilton-Jacobi-Bellman quasi-variational inequality, for which we establish uniqueness in the viscosity sense. To solve the problem numerically, we develop two complementary approaches: a Euler scheme based on operator splitting and a Longstaff-Schwartz regression method. Calibrated simulations highlight how the LP's optimal exit strategy depends on the oracle price volatility, fee levels, and the behaviour of arbitrageurs and noise traders. Our results show that while arbitrage generates both fees and IL, the LP's optimal decision balances these opposing effects based on the pool state variables and price misalignments. This work contributes to a deeper understanding of dynamic liquidity provision in AMMs and provides insights into the sustainability of passive LP strategies under different market regimes.
Simulating limit order books (LOBs) has important applications across forecasting and backtesting for financial market data. However, deep generative models struggle in this context due to the high noise and complexity of the data. Previous work uses autoregressive models, although these experience error accumulation over longer-time sequences. We introduce a novel approach, converting LOB data into a structured image format, and applying diffusion models with inpainting to generate future LOB states. This method leverages spatio-temporal inductive biases in the order book and enables parallel generation of long sequences overcoming issues with error accumulation. We also publicly contribute to LOB-Bench, the industry benchmark for LOB generative models, to allow fair comparison between models using Level-2 and Level-3 order book data (with or without message level data respectively). We show that our model achieves state-of-the-art performance on LOB-Bench, despite using lower fidelity data as input. We also show that our method prioritises coherent global structures over local, high-fidelity details, providing significant improvements over existing methods on certain metrics. Overall, our method lays a strong foundation for future research into generative diffusion approaches to LOB modelling.
The inherent non-stationarity of financial markets and the complexity of multi-modal information pose significant challenges to existing quantitative trading models. Traditional methods relying on fixed structures and unimodal data struggle to adapt to market regime shifts, while large language model (LLM)-driven solutions - despite their multi-modal comprehension - suffer from static strategies and homogeneous expert designs, lacking dynamic adjustment and fine-grained decision mechanisms. To address these limitations, we propose MM-DREX: a Multimodal-driven, Dynamically-Routed EXpert framework based on large language models. MM-DREX explicitly decouples market state perception from strategy execution to enable adaptive sequential decision-making in non-stationary environments. Specifically, it (1) introduces a vision-language model (VLM)-powered dynamic router that jointly analyzes candlestick chart patterns and long-term temporal features to allocate real-time expert weights; (2) designs four heterogeneous trading experts (trend, reversal, breakout, positioning) generating specialized fine-grained sub-strategies; and (3) proposes an SFT-RL hybrid training paradigm to synergistically optimize the router's market classification capability and experts' risk-adjusted decision-making. Extensive experiments on multi-modal datasets spanning stocks, futures, and cryptocurrencies demonstrate that MM-DREX significantly outperforms 15 baselines (including state-of-the-art financial LLMs and deep reinforcement learning models) across key metrics: total return, Sharpe ratio, and maximum drawdown, validating its robustness and generalization. Additionally, an interpretability module traces routing logic and expert behavior in real time, providing an audit trail for strategy transparency.
This work extends and complements our previous theoretical paper on the subtle interplay between impact, order flow and volatility. In the present paper, we generate synthetic market data following the specification of that paper and show that the approximations made there are actually justified, which provides quantitative support our conclusion that price volatility can be fully explained by the superposition of correlated metaorders which all impact prices, on average, as a square-root of executed volume. One of the most striking predictions of our model is the structure of the correlation between generalized order flow and returns, which is observed empirically and reproduced using our synthetic market generator. Furthermore, we were able to construct proxy metaorders from our simulated order flow that reproduce the square-root law of market impact, lending further credence to the proposal made in Ref. [2] to measure the impact of real metaorders from tape data (i.e. anonymized trades), which was long thought to be impossible.
This paper presents a comprehensive study on the empirical dynamics of Uniswap v3 liquidity, which we model as a time-tick surface, $L_t(x)$. Using a combination of functional principal component analysis (FPCA) and dynamic factor methods, we analyze three distinct pools over multiple sample periods. Our findings offer three main contributions: a statistical characterization of automated market maker liquidity, an interpretable and portable basis for dimension reduction, and a robust analysis of liquidity dynamics using rolling window metrics. For the 5 bps pools, the leading empirical eigenfunctions explain the majority of cross-tick variation and remain stable, aligning closely with a low-order Legendre polynomial basis. This alignment provides a parsimonious and interpretable structure, similar to the dynamic Nelson-Siegel method for yield curves. The factor coefficients exhibit a time series structure well-captured by AR(1) models with clear GARCH-type heteroskedasticity and heavy-tailed innovations.
This work builds upon the long-standing conjecture that linear diffusion models are inadequate for complex market dynamics. Specifically, it provides experimental validation for the author's prior arguments that realistic market dynamics are governed by higher-order (cubic and higher) non-linearities in the drift. As the diffusion drift is given by the negative gradient of a potential function, this means that a non-linear drift translates into a non-quadratic potential. These arguments were based both on general theoretical grounds as well as a structured approach to modeling the price dynamics which incorporates money flows and their impact on market prices. Here, we find direct confirmation of this view by analyzing high-frequency crypto currency data at different time scales ranging from minutes to months. We find that markets can be characterized by either a single-well or a double-well potential, depending on the time period and sampling frequency, where a double-well potential may signal market uncertainty or stress.
This study examines the impact of different computing implementations of clearing mechanisms on multi-asset price dynamics within an artificial stock market framework. We show that sequential processing of order books introduces a systematic and significant bias by affecting the allocation of traders' capital within a single time step. This occurs because applying budget constraints sequentially grants assets processed earlier preferential access to funds, distorting individual asset demand and consequently their price trajectories. The findings highlight that while the overall price level is primarily driven by macro factors like the money-to-stock ratio, the market's microstructural clearing mechanism plays a critical role in the allocation of value among individual assets. This underscores the necessity for careful consideration and validation of clearing mechanisms in artificial markets to accurately model complex financial behaviors.
Through a novel approach, this paper shows that substantial change in stock market behavior has a statistically and economically significant impact on equity risk premium predictability both on in-sample and out-of-sample cases. In line with Auer's ''Bullish ratio'', a ''Bullish index'' is introduced to measure the changes in stock market behavior, which we describe through a ''fluctuation detrending moving average analysis'' (FDMAA) for returns. We consider 28 indicators. We find that a ''positive shock'' of the Bullish Index is closely related to strong equity risk premium predictability for forecasts based on macroeconomic variables for up to six months. In contrast, a ''negative shock'' is associated with strong equity risk premium predictability with adequate forecasts for up to nine months when based on technical indicators.
Over the past decade, many dealers have implemented algorithmic models to automatically respond to RFQs and manage flows originating from their electronic platforms. In parallel, building on the foundational work of Ho and Stoll, and later Avellaneda and Stoikov, the academic literature on market making has expanded to address trade size distributions, client tiering, complex price dynamics, alpha signals, and the internalization versus externalization dilemma in markets with dealer-to-client and interdealer-broker segments. In this paper, we tackle two critical dimensions: adverse selection, arising from the presence of informed traders, and price reading, whereby the market maker's own quotes inadvertently reveal the direction of their inventory. These risks are well known to practitioners, who routinely face informed flows and algorithms capable of extracting signals from quoting behavior. Yet they have received limited attention in the quantitative finance literature, beyond stylized toy models with limited actionability. Extending the existing literature, we propose a tractable and implementable framework that enables market makers to adjust their quotes with greater awareness of informational risk.
This paper explores the bifurcative dynamics of an artificial stock market exchange (ASME) with endogenous, myopic traders interacting through a limit order book (LOB). We showed that agent-based price dynamics possess intrinsic bistability, which is not a result of randomness but an emergent property of micro-level trading rules, where even identical initial conditions lead to qualitatively different long-run price equilibria: a deterministic zero-price state and a persistent positive-price equilibrium. The study also identifies a metastable region with elevated volatility between the basins of attraction and reveals distinct transient behaviors for trajectories converging to these equilibria. Furthermore, we observe that the system is neither entirely regular nor fully chaotic. By highlighting the emergence of divergent market outcomes from uniform beginnings, this work contributes a novel perspective on the inherent path dependence and complex dynamics of artificial stock markets.
Financial markets are critical to global economic stability, yet trade-based manipulation (TBM) often undermines their fairness. Spoofing, a particularly deceptive TBM strategy, exhibits multilevel anomaly patterns that have not been adequately modeled. These patterns are usually concealed within the rich, hierarchical information of the Limit Order Book (LOB), which is challenging to leverage due to high dimensionality and noise. To address this, we propose a representation learning framework combining a cascaded LOB representation pipeline with supervised contrastive learning. Extensive experiments demonstrate that our framework consistently improves detection performance across diverse models, with Transformer-based architectures achieving state-of-the-art results. In addition, we conduct systematic analyses and ablation studies to investigate multilevel anomalies and the contributions of key components, offering broader insights into representation learning and anomaly detection for complex sequential data. Our code will be released later at this URL.