Diffusion-Aided Bandwidth-Efficient Semantic Communication with Adaptive Requests
Abstract
Semantic communication focuses on conveying the intrinsic meaning of data rather than its raw symbolic representation. For visual content, this paradigm shifts from traditional pixel-level transmission toward leveraging the semantic structure of images to communicate visual meaning. Existing approaches generally follow one of two paths: transmitting only text descriptions, which often fail to capture precise spatial layouts and fine-grained appearance details; or transmitting text alongside dense latent visual features, which tends to introduce substantial semantic redundancy. A key challenge, therefore, is to reduce semantic redundancy while preserving semantic understanding and visual fidelity, thereby improving overall transmission efficiency. This paper introduces a diffusion-based semantic communication framework with adaptive retransmission. The system transmits concise text descriptions together with a limited set of key latent visual features, and employs a diffusion-based inpainting model to reconstruct the image. A receiver-side semantic consistency mechanism is designed to evaluate the alignment between the reconstructed image and the original text description. When a semantic discrepancy is detected, the receiver triggers a retransmission to request a small set of additional latent blocks and refine the image reconstruction. This approach significantly reduces bandwidth usage while preserving high semantic accuracy, achieving an efficient balance between reconstruction quality and transmission overhead.