Search Machine Learning Repository: Learning Polynomials with Neural Networks
Authors: Alexandr Andoni, Rina Panigrahy, Gregory Valiant and Li Zhang
Conference: Proceedings of the 31st International Conference on Machine Learning (ICML-14)
Year: 2014
Pages: 1908-1916
Abstract: We study the effectiveness of learning low degree polynomials using neural networks by the gradient descent method. While neural networks have been shown to have great expressive power, and gradient descent has been widely used in practice for learning neural networks, few theoretical guarantees are known for such methods. In particular, it is well known that gradient descent can get stuck at local minima, even for simple classes of target functions. In this paper, we present several positive theoretical results to support the effectiveness of neural networks. We focus on two-layer neural networks (i.e. one hidden layer) where the top layer node is a linear function, similar to~\cite{barron93}. First we show that for a randomly initialized neural network with sufficiently many hidden units, the gradient descent method can learn any low degree polynomial. Secondly, we show that if we use complex-valued weights (the target function can still be real), then under suitable conditions, there are no ``robust local minima'': the neural network can always escape a local minimum by performing a random perturbation. This property does not hold for real-valued weights. Thirdly, we discuss whether sparse polynomials can be learned with \emph{small} neural networks, where the size is dependent on the sparsity of the target function.
[pdf] [BibTeX]

authors venues years
Suggest Changes to this paper.
Brought to you by the WUSTL Machine Learning Group. We have open faculty positions (tenured and tenure-track).