Loading...
Loading...
Browse, search and filter the latest cybersecurity research papers from arXiv
Pansharpening refers to the process of integrating a high resolution panchromatic (PAN) image with a lower resolution multispectral (MS) image to generate a fused product, which is pivotal in remote sensing. Despite the effectiveness of CNNs in addressing this challenge, they are inherently constrained by the uniform application of convolutional kernels across all spatial positions, overlooking local content variations. To overcome this issue, we introduce RAPNet, a new architecture that leverages content-adaptive convolution. At its core, RAPNet employs the Receptive-field Adaptive Pansharpening Convolution (RAPConv), designed to produce spatially adaptive kernels responsive to local feature context, thereby enhancing the precision of spatial detail extraction. Additionally, the network integrates the Pansharpening Dynamic Feature Fusion (PAN-DFF) module, which incorporates an attention mechanism to achieve an optimal balance between spatial detail enhancement and spectral fidelity. Comprehensive evaluations on publicly available datasets confirm that RAPNet delivers superior performance compared to existing approaches, as demonstrated by both quantitative metrics and qualitative assessments. Ablation analyses further substantiate the effectiveness of the proposed adaptive components.
Accurate and timely cancer diagnosis from histopathological slides is vital for effective clinical decision-making. This paper introduces DepViT-CAD, a deployable AI system for multi-class cancer diagnosis in histopathology. At its core is MAViT, a novel Multi-Attention Vision Transformer designed to capture fine-grained morphological patterns across diverse tumor types. MAViT was trained on expert-annotated patches from 1008 whole-slide images, covering 11 diagnostic categories, including 10 major cancers and non-tumor tissue. DepViT-CAD was validated on two independent cohorts: 275 WSIs from The Cancer Genome Atlas and 50 routine clinical cases from pathology labs, achieving diagnostic sensitivities of 94.11% and 92%, respectively. By combining state-of-the-art transformer architecture with large-scale real-world validation, DepViT-CAD offers a robust and scalable approach for AI-assisted cancer diagnostics. To support transparency and reproducibility, software and code will be made publicly available at GitHub.
Reliable in-vitro models are used for optoelectronic device development such as fluorescence detection devices for fluorescence-guided surgery of gliomas. A common approach is based on inducing gliomas in animal models. This is followed by a dosage of 5-ALA to induce Protoporphyrin IX (PpIX) in the glioma and which fluoresces. Although these approaches excel in capturing key biomolecular and physiological features of the tumour, they are inherently indeterministic. This limits the scope of their use for preclinical device development, where consistent and controllable tumour reproduction across multiple animals is needed. Approaches using fluorescence markers in gelatine provide a simple replication but fail to capture the complexities of in-vivo models. In this study, we introduce an exogenous brain tumour model for assessing PpIX fluorescence detection. The model was developed by injecting a PpIX solution into the cortical region of a resected adult rat brain, the injection site simulated a tumoral region with elevated PpIX concentration. The tumoral region had a gradient of concentrations, with a peak at the centre and a decrease towards the margins, akin to in-vivo gliomas. The fluorescence profile was compared to in-vivo conditions using 5-ALA and correlated well with other reported works, achieving a correlation of R2>0.93. The model's validity was tested by examining the effect of the solvent, DMSO, on the Autofluorescence (AF) of the brain sample and the short-term effect of storage on AF was analysed. Examinations confirmed the solvent did not alter AF, and the brain sample should be stored in Hanks Balanced Salt Solution and refrigerated to maintain moisture and preserve AF. The model accurately replicated surgical fluorescence conditions and offers a suitable alternative to glioma induction, benefiting the development of fluorescence detection devices across design iterations.
We present Spatial Lifting (SL), a novel methodology for dense prediction tasks. SL operates by lifting standard inputs, such as 2D images, into a higher-dimensional space and subsequently processing them using networks designed for that higher dimension, such as a 3D U-Net. Counterintuitively, this dimensionality lifting allows us to achieve good performance on benchmark tasks compared to conventional approaches, while reducing inference costs and significantly lowering the number of model parameters. The SL framework produces intrinsically structured outputs along the lifted dimension. This emergent structure facilitates dense supervision during training and enables robust, near-zero-additional-cost prediction quality assessment at test time. We validate our approach across 19 benchmark datasets (13 for semantic segmentation and 6 for depth estimation), demonstrating competitive dense prediction performance while reducing the model parameter count by over 98% (in the U-Net case) and lowering inference costs. Spatial Lifting introduces a new vision modeling paradigm that offers a promising path toward more efficient, accurate, and reliable deep networks for dense prediction tasks in vision.
Brain tumor segmentation plays a critical role in clinical diagnosis and treatment planning, yet the variability in imaging quality across different MRI scanners presents significant challenges to model generalization. To address this, we propose the Edge Iterative MRI Lesion Localization System (EdgeIMLocSys), which integrates Continuous Learning from Human Feedback to adaptively fine-tune segmentation models based on clinician feedback, thereby enhancing robustness to scanner-specific imaging characteristics. Central to this system is the Graph-based Multi-Modal Interaction Lightweight Network for Brain Tumor Segmentation (GMLN-BTS), which employs a Modality-Aware Adaptive Encoder (M2AE) to extract multi-scale semantic features efficiently, and a Graph-based Multi-Modal Collaborative Interaction Module (G2MCIM) to model complementary cross-modal relationships via graph structures. Additionally, we introduce a novel Voxel Refinement UpSampling Module (VRUM) that synergistically combines linear interpolation and multi-scale transposed convolutions to suppress artifacts while preserving high-frequency details, improving segmentation boundary accuracy. Our proposed GMLN-BTS model achieves a Dice score of 85.1% on the BraTS2017 dataset with only 4.58 million parameters, representing a 98% reduction compared to mainstream 3D Transformer models, and significantly outperforms existing lightweight approaches. This work demonstrates a synergistic breakthrough in achieving high-accuracy, resource-efficient brain tumor segmentation suitable for deployment in resource-constrained clinical environments.
Precise segmentation of brain tumors from magnetic resonance imaging (MRI) is essential for neuro-oncology diagnosis and treatment planning. Despite advances in deep learning methods, automatic segmentation remains challenging due to tumor morphological heterogeneity and complex three-dimensional spatial relationships. Current techniques primarily rely on visual features extracted from MRI sequences while underutilizing semantic knowledge embedded in medical reports. This research presents a multi-level fusion architecture that integrates pixel-level, feature-level, and semantic-level information, facilitating comprehensive processing from low-level data to high-level concepts. The semantic-level fusion pathway combines the semantic understanding capabilities of Contrastive Language-Image Pre-training (CLIP) models with the spatial feature extraction advantages of 3D U-Net through three mechanisms: 3D-2D semantic bridging, cross-modal semantic guidance, and semantic-based attention mechanisms. Experimental validation on the BraTS 2020 dataset demonstrates that the proposed model achieves an overall Dice coefficient of 0.8567, representing a 4.8% improvement compared to traditional 3D U-Net, with a 7.3% Dice coefficient increase in the clinically important enhancing tumor (ET) region.
Super-resolution (SR) has been a pivotal task in image processing, aimed at enhancing image resolution across various applications. Recently, look-up table (LUT)-based approaches have attracted interest due to their efficiency and performance. However, these methods are typically designed for fixed scale factors, making them unsuitable for arbitrary-scale image SR (ASISR). Existing ASISR techniques often employ implicit neural representations, which come with considerable computational cost and memory demands. To address these limitations, we propose Interpolation Mixing LUT (IM-LUT), a novel framework that operates ASISR by learning to blend multiple interpolation functions to maximize their representational capacity. Specifically, we introduce IM-Net, a network trained to predict mixing weights for interpolation functions based on local image patterns and the target scale factor. To enhance efficiency of interpolation-based methods, IM-Net is transformed into IM-LUT, where LUTs are employed to replace computationally expensive operations, enabling lightweight and fast inference on CPUs while preserving reconstruction quality. Experimental results on several benchmark datasets demonstrate that IM-LUT consistently achieves a superior balance between image quality and efficiency compared to existing methods, highlighting its potential as a promising solution for resource-constrained applications.
This study investigates the effectiveness of U-Net architectures integrated with various convolutional neural network (CNN) backbones for automated lung cancer detection and segmentation in chest CT images, addressing the critical need for accurate diagnostic tools in clinical settings. A balanced dataset of 832 chest CT images (416 cancerous and 416 non-cancerous) was preprocessed using Contrast Limited Adaptive Histogram Equalization (CLAHE) and resized to 128x128 pixels. U-Net models were developed with three CNN backbones: ResNet50, VGG16, and Xception, to segment lung regions. After segmentation, CNN-based classifiers and hybrid models combining CNN feature extraction with traditional machine learning classifiers (Support Vector Machine, Random Forest, and Gradient Boosting) were evaluated using 5-fold cross-validation. Metrics included accuracy, precision, recall, F1-score, Dice coefficient, and ROC-AUC. U-Net with ResNet50 achieved the best performance for cancerous lungs (Dice: 0.9495, Accuracy: 0.9735), while U-Net with VGG16 performed best for non-cancerous segmentation (Dice: 0.9532, Accuracy: 0.9513). For classification, the CNN model using U-Net with Xception achieved 99.1 percent accuracy, 99.74 percent recall, and 99.42 percent F1-score. The hybrid CNN-SVM-Xception model achieved 96.7 percent accuracy and 97.88 percent F1-score. Compared to prior methods, our framework consistently outperformed existing models. In conclusion, combining U-Net with advanced CNN backbones provides a powerful method for both segmentation and classification of lung cancer in CT scans, supporting early diagnosis and clinical decision-making.
Central to Earth observation is the trade-off between spatial and temporal resolution. For temperature, this is especially critical because real-world applications require high spatiotemporal resolution data. Current technology allows for hourly temperature observations at 2 km, but only every 16 days at 100 m, a gap further exacerbated by cloud cover. Earth system models offer continuous hourly temperature data, but at a much coarser spatial resolution (9-31 km). Here, we present a physics-guided deep learning framework for temperature data reconstruction that integrates these two data sources. The proposed framework uses a convolutional neural network that incorporates the annual temperature cycle and includes a linear term to amplify the coarse Earth system model output into fine-scale temperature values observed from satellites. We evaluated this framework using data from two satellites, GOES-16 (2 km, hourly) and Landsat (100 m, every 16 days), and demonstrated effective temperature reconstruction with hold-out and in situ data across four datasets. This physics-guided deep learning framework opens new possibilities for generating high-resolution temperature data across spatial and temporal scales, under all weather conditions and globally.
Pneumonia is a leading cause of mortality in children under five, requiring accurate chest X-ray diagnosis. This study presents a machine learning-based Pediatric Chest Pneumonia Classification System to assist healthcare professionals in diagnosing pneumonia from chest X-ray images. The CNN-based model was trained on 5,863 labeled chest X-ray images from children aged 0-5 years from the Guangzhou Women and Children's Medical Center. To address limited data, we applied augmentation techniques (rotation, zooming, shear, horizontal flipping) and employed GANs to generate synthetic images, addressing class imbalance. The system achieved optimal performance using combined original, augmented, and GAN-generated data, evaluated through accuracy and F1 score metrics. The final model was deployed via a Flask web application, enabling real-time classification with probability estimates. Results demonstrate the potential of deep learning and GANs in improving diagnostic accuracy and efficiency for pediatric pneumonia classification, particularly valuable in resource-limited clinical settings https://github.com/AbdulManaf12/Pediatric-Chest-Pneumonia-Classification
Medical Imagings are considered one of the crucial diagnostic tools for different bones-related diseases, especially bones fractures. This paper investigates the robustness of pre-trained deep learning models for classifying bone fractures in X-ray images and seeks to address global healthcare disparity through the lens of technology. Three deep learning models have been tested under varying simulated equipment quality conditions. ResNet50, VGG16 and EfficientNetv2 are the three pre-trained architectures which are compared. These models were used to perform bone fracture classification as images were progressively degraded using noise. This paper specifically empirically studies how the noise can affect the bone fractures detection and how the pre-trained models performance can be changes due to the noise that affect the quality of the X-ray images. This paper aims to help replicate real world challenges experienced by medical imaging technicians across the world. Thus, this paper establishes a methodological framework for assessing AI model degradation using transfer learning and controlled noise augmentation. The findings provide practical insight into how robust and generalizable different pre-trained deep learning powered computer vision models can be when used in different contexts.
Large-area microscopy with submicron resolution is limited by tradeoffs between field of view (FOV), resolution, and imaging speed. Samples are rarely flat across centimeter-scale FOV, which often requires existing solutions to use mechanical scanning to ensure focused capture at reduced throughput. Here, we present PANORAMA, a single-shot, re-imaging microscope that achieves seamless, gigapixel imaging over a 16.3$\times$18.8 $\text{mm}^2$ FOV at 0.84 um resolution without mechanical scanning. By using a telecentric photolithography lens, a large-aperture tube lens, and a flat micro-camera array with adaptive per-camera focus control, PANORAMA maintains submicron focus across flat, curved or uneven samples that span centimeters. This approach improves imaging throughput and adaptability, enabling gigapixel multi-modal microscopy of large flat and non-flat samples in one shot, thus broadening its applications in biomedical and materials imaging.
High-resolution elevation estimations are essential to understand catchment and hillslope hydrology, study urban morphology and dynamics, and monitor the growth, decline, and mortality of terrestrial ecosystems. Various deep learning approaches (e.g., super-resolution techniques, monocular depth estimation) have been developed to create high-resolution Digital Elevation Models (DEMs). However, super-resolution techniques are limited by the upscaling factor, and monocular depth estimation lacks global elevation context, making its conversion to a seamless DEM restricted. The recently introduced technique of prompt-based monocular depth estimation has opened new opportunities to extract estimates of absolute elevation in a global context. We present here a framework for the estimation of high-resolution DEMs as a new paradigm for absolute global elevation mapping. It is exemplified using low-resolution Shuttle Radar Topography Mission (SRTM) elevation data as prompts and high-resolution RGB imagery from the National Agriculture Imagery Program (NAIP). The approach fine-tunes a vision transformer encoder with LiDAR-derived DEMs and employs a versatile prompting strategy, enabling tasks such as DEM estimation, void filling, and updating. Our framework achieves a 100x resolution gain (from 30-m to 30-cm), surpassing prior methods by an order of magnitude. Evaluations across three diverse U.S. landscapes show robust generalization, capturing urban structures and fine-scale terrain features with < 5 m MAE relative to LiDAR, improving over SRTM by up to 18%. Hydrological analysis confirms suitability for hazard and environmental studies. We demonstrate scalability by applying the framework to large regions in the U.S. and Israel. All code and pretrained models are publicly available at: https://osherr1996.github.io/prompt2dem_propage/.
Phase retrieval involves recovering a signal from intensity-only measurements, crucial in many fields such as imaging, holography, optical computing, crystallography, and microscopy. Although there are several well-known phase retrieval algorithms, including classical iterative solvers, the reconstruction performance often remains sensitive to initialization and measurement noise. Recently, image-to-image diffusion models have gained traction in various image reconstruction tasks, yielding significant theoretical insights and practical breakthroughs. In this work, we introduce a novel phase retrieval approach based on an image-to-image diffusion framework called Inversion by Direct Iteration. Our method begins with an enhanced initialization stage that leverages a hybrid iterative technique, combining the Hybrid Input-Output and Error Reduction methods and incorporating a novel acceleration mechanism to obtain a robust crude estimate. Then, it iteratively refines this initial crude estimate using the learned image-to-image pipeline. Our method achieves substantial improvements in both training efficiency and reconstruction quality. Furthermore, our approach utilizes aggregation techniques to refine quality metrics and demonstrates superior results compared to both classical and contemporary techniques. This highlights its potential for effective and efficient phase retrieval across various applications.
We propose a novel framework for phase retrieval that leverages Langevin dynamics to enable efficient posterior sampling, yielding reconstructions that explicitly balance distortion and perceptual quality. Unlike conventional approaches that prioritize pixel-wise accuracy, our method navigates the perception-distortion tradeoff through a principled combination of stochastic sampling, learned denoising, and model-based updates. The framework comprises three variants of increasing complexity, integrating theoretically grounded Langevin inference, adaptive noise schedule learning, parallel reconstruction sampling, and warm-start initialization from classical solvers. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple benchmarks, both in terms of fidelity and perceptual quality.
In Inverse Synthetic Aperture Radar (ISAR), random missing entries of the received radar echo matrix deteriorate the imaging quality, compromising target distinction from the background. Compressive sensing techniques or matrix completion prior to conventional imaging have been used in recent years to solve this issue. However, while the former techniques fail to preserve target continuity due to the sparsity constraint, the latter fails for high missing ratios. This paper proposes to use deep image prior (DIP) to complete the complex radar data and then obtain the radar image by conventional Fourier imaging. Real and imaginary parts are separately completed by independent deep structures and then put together for the imaging part. The proposed DIP based imaging method has been compared with IALM, 2D-SL0 and NNM methods visually and quantitatively for both simulated and real data. The results demonstrate an increase of 100% for some extreme cases in terms of RMSE, 50% increase on Correlation and 30% increase on IC metrics quantitatively.
Path smoothness is often overlooked in path imitation learning from expert demonstrations. In this paper, we introduce a novel learning method, termed deep angular A* (DAA*), by incorporating the proposed path angular freedom (PAF) into A* to improve path similarity through adaptive path smoothness. The PAF aims to explore the effect of move angles on path node expansion by finding the trade-off between their minimum and maximum values, allowing for high adaptiveness for imitation learning. DAA* improves path optimality by closely aligning with the reference path through joint optimization of path shortening and smoothing, which correspond to heuristic distance and PAF, respectively. Throughout comprehensive evaluations on 7 datasets, including 4 maze datasets, 2 video-game datasets, and a real-world drone-view dataset containing 2 scenarios, we demonstrate remarkable improvements of our DAA* over neural A* in path similarity between the predicted and reference paths with a shorter path length when the shortest path is plausible, improving by 9.0% SPR, 6.9% ASIM, and 3.9% PSIM. Furthermore, when jointly learning pathfinding with both path loss and path probability map loss, DAA* significantly outperforms the state-of-the-art TransPath by 6.7% SPR, 6.5% PSIM, and 3.7% ASIM. We also discuss the minor trade-off between path optimality and search efficiency where applicable.
Patch-based transformer surrogates have become increasingly effective for modeling spatiotemporal dynamics, but the fixed patch size is a major limitation for budget-conscience deployment in production. We introduce two lightweight, architecture-agnostic modules-the Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)-that enable dynamic patch size control at inference in patch based models, without retraining or accuracy loss. Combined with a cyclic patch-size rollout, our method mitigates patch artifacts and improves long-term stability for video-like prediction tasks. Applied to a range of challenging 2D and 3D PDE benchmarks, our approach improves rollout fidelity and runtime efficiency. To our knowledge, this is the first framework to enable inference-time patch-size tunability in patch-based PDE surrogates. Its plug-and-play design makes it broadly applicable across architectures-establishing a general foundation for compute-adaptive modeling in PDE surrogate tasks.