Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Cryptocurrencies are digital tokens built on blockchain technology, with thousands actively traded on centralized exchanges (CEXs). Unlike stocks, which are backed by real businesses, cryptocurrencies are recognized as a distinct class of assets by researchers. How do investors treat this new category of asset in trading? Are they similar to stocks as an investment tool for investors? We answer these questions by investigating cryptocurrencies' and stocks' price time series which can reflect investors' attitudes towards the targeted assets. Concretely, we use different machine learning models to classify cryptocurrencies' and stocks' price time series in the same period and get an extremely high accuracy rate, which reflects that cryptocurrency investors behave differently in trading from stock investors. We then extract features from these price time series to explain the price pattern difference, including mean, variance, maximum, minimum, kurtosis, skewness, and first to third-order autocorrelation, etc., and then use machine learning methods including logistic regression (LR), random forest (RF), support vector machine (SVM), etc. for classification. The classification results show that these extracted features can help to explain the price time series pattern difference between cryptocurrencies and stocks.
In this study, we propose a novel integrated Generalized Autoregressive Conditional Heteroskedasticity-Gated Recurrent Unit (GARCH-GRU) model for financial volatility modeling and forecasting. The model embeds the GARCH(1,1) formulation directly into the GRU cell architecture, yielding a unified recurrent unit that jointly captures both traditional econometric properties and complex temporal dynamics. This hybrid structure leverages the strengths of GARCH in modeling key stylized facts of financial volatility, such as clustering and persistence, while utilizing the GRU's capacity to learn nonlinear dependencies from sequential data. Compared to the GARCH-LSTM counterpart, the GARCH-GRU model demonstrates superior computational efficiency, requiring significantly less training time, while maintaining and improving forecasting accuracy. Empirical evaluation across multiple financial datasets confirms the model's robust outperformance in terms of mean squared error (MSE) and mean absolute error (MAE) relative to a range of benchmarks, including standard neural networks, alternative hybrid architectures, and classical GARCH-type models. As an application, we compute Value-at-Risk (VaR) using the model's volatility forecasts and observe lower violation ratios, further validating the predictive reliability of the proposed framework in practical risk management settings.
In [8], easily computable scale-invariant estimator $\widehat{\mathscr{R}}^s_n$ was constructed to estimate the Hurst parameter of the drifted fractional Brownian motion $X$ from its antiderivative. This paper extends this convergence result by proving that $\widehat{\mathscr{R}}^s_n$ also consistently estimates the Hurst parameter when applied to the antiderivative of $g \circ X$ for a general nonlinear function $g$. We also establish an almost sure rate of convergence in this general setting. Our result applies, in particular, to the estimation of the Hurst parameter of a wide class of rough stochastic volatility models from discrete observations of the integrated variance, including the fractional stochastic volatility model.
In the theory of financial markets, a stylized fact is a qualitative summary of a pattern in financial market data that is observed across multiple assets, asset classes and time horizons. In this article, we test a set of eleven stylized facts for financial market data. Our main contribution is to consider a broad range of geographical regions across Asia, continental Europe, and the US over a time period of 150 years, as well as two of the most traded cryptocurrencies, thus providing insights into the robustness and generalizability of commonly known stylized facts.
In this study, we employ k-means clustering algorithm of polyspectral means to analyze 49 stocks in the Indian stock market. We have used spectral and bispectral information obtained from the data, by using spectral and bispectral means with different weight functions that will give us varying insights into the temporal patterns of the stocks. In particular, the higher order polyspectral means can provide significantly more information than what we can gather from power spectra, and can also unveil nonlinear trends in a time series. Through rigorous analysis, we identify five distinctive clusters, uncovering nuanced market structures. Notably, one cluster emerges as that of a conglomerate powerhouse, featuring ADANI, BIRLA, TATA, and unexpectedly, government-owned bank SBI. Another cluster spotlights the IT sector with WIPRO and TCS, while a third combines private banks, government entities, and RELIANCE. The final cluster comprises publicly traded companies with dispersed ownership. Such clustering of stocks sheds light on intricate financial relationships within the stock market, providing valuable insights for investors and analysts navigating the dynamic landscape of the Indian stock market.
Financial scenario simulation is essential for risk management and portfolio optimization, yet it remains challenging especially in high-dimensional and small data settings common in finance. We propose a diffusion factor model that integrates latent factor structure into generative diffusion processes, bridging econometrics with modern generative AI to address the challenges of the curse of dimensionality and data scarcity in financial simulation. By exploiting the low-dimensional factor structure inherent in asset returns, we decompose the score function--a key component in diffusion models--using time-varying orthogonal projections, and this decomposition is incorporated into the design of neural network architectures. We derive rigorous statistical guarantees, establishing nonasymptotic error bounds for both score estimation at O(d^{5/2} n^{-2/(k+5)}) and generated distribution at O(d^{5/4} n^{-1/2(k+5)}), primarily driven by the intrinsic factor dimension k rather than the number of assets d, surpassing the dimension-dependent limits in the classical nonparametric statistics literature and making the framework viable for markets with thousands of assets. Numerical studies confirm superior performance in latent subspace recovery under small data regimes. Empirical analysis demonstrates the economic significance of our framework in constructing mean-variance optimal portfolios and factor portfolios. This work presents the first theoretical integration of factor structure with diffusion models, offering a principled approach for high-dimensional financial simulation with limited data.
This paper examines the empirical failure of uncovered interest parity (UIP) and proposes a structural explanation based on a mean-reverting risk premium. We define a realized premium as the deviation between observed exchange rate returns and the interest rate differential, and demonstrate its strong mean-reverting behavior across multiple horizons. Motivated by this pattern, we model the risk premium using an Ornstein-Uhlenbeck (OU) process embedded within a stochastic differential equation for the exchange rate. Our model yields closed-form approximations for future exchange rate distributions, which we evaluate using coverage-based backtesting. Applied to USD/KRW data from 2010 to 2025, the model shows strong predictive performance at both short-term and long-term horizons, while underperforming at intermediate (3-month) horizons and showing conservative behavior in the tails of long-term forecasts. These results suggest that exchange rate deviations from UIP may reflect structured, forecastable dynamics rather than pure noise, and point to future modeling improvements via regime-switching or time-varying volatility.
This study analyzes the financial resilience of agricultural and food production companies in Spain amid the Ukraine-Russia war using cluster analysis based on financial ratios. This research utilizes centered log-ratios to transform financial ratios for compositional data analysis. The dataset comprises financial information from 1197 firms in Spain's agricultural and food sectors over the period 2021-2023. The analysis reveals distinct clusters of firms with varying financial performance, characterized by metrics of solvency and profitability. The results highlight an increase in resilient firms by 2023, underscoring sectoral adaptation to the conflict's economic challenges. These findings together provide insights for stakeholders and policymakers to improve sectorial stability and strategic planning.
Probabilistic electricity price forecasting (PEPF) is a key task for market participants in short-term electricity markets. The increasing availability of high-frequency data and the need for real-time decision-making in energy markets require online estimation methods for efficient model updating. We present an online, multivariate, regularized distributional regression model, allowing for the modeling of all distribution parameters conditional on explanatory variables. Our approach is based on the combination of the multivariate distributional regression and an efficient online learning algorithm based on online coordinate descent for LASSO-type regularization. Additionally, we propose to regularize the estimation along a path of increasingly complex dependence structures of the multivariate distribution, allowing for parsimonious estimation and early stopping. We validate our approach through one of the first forecasting studies focusing on multivariate probabilistic forecasting in the German day-ahead electricity market while using only online estimation methods. We compare our approach to online LASSO-ARX-models with adaptive marginal distribution and to online univariate distributional models combined with an adaptive Copula. We show that the multivariate distributional regression, which allows modeling all distribution parameters - including the mean and the dependence structure - conditional on explanatory variables such as renewable in-feed or past prices provide superior forecasting performance compared to modeling of the marginals only and keeping a static/unconditional dependence structure. Additionally, online estimation yields a speed-up by a factor of 80 to over 400 times compared to batch fitting.
We study decades-long historic distributions of accumulated S\&P500 returns, from daily returns to those over several weeks. The time series of the returns emphasize major upheavals in the markets -- Black Monday, Tech Bubble, Financial Crisis and Covid Pandemic -- which are reflected in the tail ends of the distributions. De-trending the overall gain, we concentrate on comparing distributions of gains and losses. Specifically, we compare the tails of the distributions, which are believed to exhibit power-law behavior and possibly contain outliers. Towards this end we find confidence intervals of the linear fits of the tails of the complementary cumulative distribution functions on a log-log scale, as well as conduct a statistical U-test in order to detect outliers. We also study probability density functions of the full distributions of the returns with the emphasis on their asymmetry. The key empirical observations are that the mean of de-trended distributions increases near-linearly with the number of days of accumulation while the overall skew is negative -- consistent with the heavier tails of losses -- and depends little on the number of days of accumulation. At the same time the variance of the distributions exhibits near-perfect linear dependence on the number of days of accumulation, that is it remains constant if scaled to the latter. Finally, we discuss the theoretical framework for understanding accumulated returns. Our main conclusion is that the current state of theory, which predicts symmetric or near-symmetric distributions of returns cannot explain the aggregate of empirical results.
Loss Given Default (LGD) is a key risk parameter in determining a bank's regulatory capital. During LGD-estimation, realised recovery cash flows are to be discounted at an appropriate rate. Regulatory guidance mandates that this rate should allow for the time value of money, as well as include a risk premium that reflects the "undiversifiable risk" within these recoveries. Having extensively reviewed earlier methods of determining this rate, we propose a new approach that is inspired by the cost of capital approach from the Solvency II regulatory regime. Our method involves estimating a market-consistent price for a portfolio of defaulted loans, from which an associated discount rate may be inferred. We apply this method to mortgage and personal loans data from a large South African bank. The results reveal the main drivers of the discount rate to be the mean and variance of these recoveries, as well as the bank's cost of capital in excess of the risk-free rate. Our method therefore produces a discount rate that reflects both the undiversifiable risk of recovery recoveries and the time value of money, thereby satisfying regulatory requirements. This work can subsequently enhance the LGD-component within the modelling of both regulatory and economic capital.
Quantitative investment (quant) is an emerging, technology-driven approach in asset management, increasingy shaped by advancements in artificial intelligence. Recent advances in deep learning and large language models (LLMs) for quant finance have improved predictive modeling and enabled agent-based automation, suggesting a potential paradigm shift in this field. In this survey, taking alpha strategy as a representative example, we explore how AI contributes to the quantitative investment pipeline. We first examine the early stage of quant research, centered on human-crafted features and traditional statistical models with an established alpha pipeline. We then discuss the rise of deep learning, which enabled scalable modeling across the entire pipeline from data processing to order execution. Building on this, we highlight the emerging role of LLMs in extending AI beyond prediction, empowering autonomous agents to process unstructured data, generate alphas, and support self-iterative workflows.
Although the valuation of life contingent assets has been thoroughly investigated under the framework of mathematical statistics, little financial economics research pays attention to the pricing of these assets in a non-arbitrage, complete market. In this paper, we first revisit the Fundamental Theorem of Asset Pricing (FTAP) and the short proof of it. Then we point out that discounted asset price is a martingale only when dividends are zero under all random states of the world, using a simple proof based on pricing kernel. Next, we apply Fundamental Theorem of Asset Pricing (FTAP) to find valuation formula for life contingent assets including life insurance policies and life contingent annuities. Last but not least, we state the assumption of static portfolio in a dynamic economy, and clarify the FTAP that accommodates the valuation of a portfolio of life contingent policies.
We report the first application of a tailored Complexity-Entropy Plane designed for binary sequences and structures. We do so by considering the daily up/down price fluctuations of the largest cryptocurrencies in terms of capitalization (stable-coins excluded) that are worth $circa \,\, 90 \%$ of the total crypto market capitalization. With that, we focus on the basic elements of price motion that compare with the random walk backbone features associated with mathematical properties of the Efficient Market Hypothesis. From the location of each crypto on the Binary Complexity-Plane (BiCEP) we define an inefficiency score, $\mathcal I$, and rank them accordingly. The results based on the BiCEP analysis, which we substantiate with statistical testing, indicate that only Shiba Inu (SHIB) is significantly inefficient, whereas the largest stake of crypto trading is reckoned to operate in close-to-efficient conditions. Generically, our $\mathcal I$-based ranking hints the design and consensus architecture of a crypto is at least as relevant to efficiency as the features that are usually taken into account in the appraisal of the efficiency of financial instruments, namely canonical fiat money. Lastly, this set of results supports the validity of the binary complexity analysis.
In the modern financial sector, the exponential growth of data has made efficient and accurate financial data analysis increasingly crucial. Traditional methods, such as statistical analysis and rule-based systems, often struggle to process and derive meaningful insights from complex financial information effectively. These conventional approaches face inherent limitations in handling unstructured data, capturing intricate market patterns, and adapting to rapidly evolving financial contexts, resulting in reduced accuracy and delayed decision-making processes. To address these challenges, this paper presents an intelligent financial data analysis system that integrates Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) technology. Our system incorporates three key components: a specialized preprocessing module for financial data standardization, an efficient vector-based storage and retrieval system, and a RAG-enhanced query processing module. Using the NASDAQ financial fundamentals dataset from 2010 to 2023, we conducted comprehensive experiments to evaluate system performance. Results demonstrate significant improvements across multiple metrics: the fully optimized configuration (gpt-3.5-turbo-1106+RAG) achieved 78.6% accuracy and 89.2% recall, surpassing the baseline model by 23 percentage points in accuracy while reducing response time by 34.8%. The system also showed enhanced efficiency in handling complex financial queries, though with a moderate increase in memory utilization. Our findings validate the effectiveness of integrating RAG technology with LLMs for financial analysis tasks and provide valuable insights for future developments in intelligent financial data processing systems.
Financial time-series forecasting remains a challenging task due to complex temporal dependencies and market fluctuations. This study explores the potential of hybrid quantum-classical approaches to assist in financial trend prediction by leveraging quantum resources for improved feature representation and learning. A custom Quantum Neural Network (QNN) regressor is introduced, designed with a novel ansatz tailored for financial applications. Two hybrid optimization strategies are proposed: (1) a sequential approach where classical recurrent models (RNN/LSTM) extract temporal dependencies before quantum processing, and (2) a joint learning framework that optimizes classical and quantum parameters simultaneously. Systematic evaluation using TimeSeriesSplit, k-fold cross-validation, and predictive error analysis highlights the ability of these hybrid models to integrate quantum computing into financial forecasting workflows. The findings demonstrate how quantum-assisted learning can contribute to financial modeling, offering insights into the practical role of quantum resources in time-series analysis.
We study the asymptotic properties of the GLS estimator in multivariate regression with heteroskedastic and autocorrelated errors. We derive Wald statistics for linear restrictions and assess their performance. The statistics remains robust to heteroskedasticity and autocorrelation.
Using a panel data local projections model and controlling for firm characteristics, procurement bid attributes, and macroeconomic conditions, the study estimates the dynamic effects of procurement awards on new lending, a more precise measure than the change in the stock of credit. The analysis further examines heterogeneity in credit responses based on firm size, industry, credit maturity, and value chain position of the firms. The empirical evidence confirms that public procurement awards significantly increase new lending, with NGEU-funded contracts generating stronger credit expansion than traditional procurement during the recent period. The results show that the impact of NGEU procurement programs aligns closely with historical procurement impacts, with differences driven mainly by lower utilization rates. Moreover, integrating high-frequency financial data with procurement records highlights the potential of Big Data in refining public policy design.