RJD-BASE: Multi-Modal Spectral Clustering via Randomized Joint Diagonalization
Abstract
We revisit the problem of spectral clustering in multimodal settings, where each data modality is encoded as a graph Laplacian. While classical approaches--including joint diagonalization, spectral co-regularization, and multiview clustering--attempt to align embeddings across modalities, they often rely on costly iterative refinement and may fail to directly target the spectral subspace relevant for clustering. In this work, we introduce two key innovations. First, we bring the power of randomization to this setting by sampling random convex combinations of Laplacians as a simple and scalable alternative to explicit eigenspace alignment. Second, we propose a principled selection rule based on Bottom-$k$ Aggregated Spectral Energy (BASE)--a $k$-dimensional extension of the directional smoothness objective from recent minimax formulations--which we uniquely apply as a selection mechanism rather than an optimization target. The result is Randomized Joint Diagonalization with BASE Selection (RJD-BASE), a method that is easily implementable, computationally efficient, aligned with the clustering objective, and grounded in decades of progress in standard eigensolvers. Through experiments on synthetic and real-world datasets, we show that RJD-BASE reliably selects high-quality embeddings, outperforming classical multimodal clustering methods at low computational cost.