The Optimal Condition Number for ReLU Function
Abstract
ReLU is a widely used activation function in deep neural networks. This paper explores the stability properties of the ReLU map. For any weight matrix $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ and bias vector $\boldsymbol{b} \in \mathbb{R}^{m}$ at a given layer, we define the condition number $\beta_{\boldsymbol{A},\boldsymbol{b}}$ as $\beta_{\boldsymbol{A},\boldsymbol{b}} = \frac{\mathcal{U}_{\boldsymbol{A},\boldsymbol{b}}}{\mathcal{L}_{\boldsymbol{A},\boldsymbol{b}}}$, where $\mathcal{U}_{\boldsymbol{A},\boldsymbol{b}}$ and $\mathcal{L}_{\boldsymbol{A},\boldsymbol{b}}$ are the upper and lower Lipschitz constants, respectively. We first demonstrate that for any given $\boldsymbol{A}$ and $\boldsymbol{b}$, the condition number satisfies $\beta_{\boldsymbol{A},\boldsymbol{b}} \geq \sqrt{2}$. Moreover, when the weights of the network at a given layer are initialized as random i.i.d. Gaussian variables and the bias term is set to zero, the condition number asymptotically approaches this lower bound. This theoretical finding suggests that Gaussian weight initialization is optimal for preserving distances in the context of random deep neural network weights.