What is the difference between Deep Learning and traditional Artificial Neural Network machine learning?

The standard backpropagation algorithm (gradient descent) gets serious issues when the number of layers becomes large. The probability of local minima in the error function increases with every layer. Not only local minima in a mathematical sense cause problems, sometimes there are just flat regions in the error function (modifying one or more weights does not significantly change the error) where gradient descent does not work.

On the other hand, networks with many layers can solve more difficult problems, as every layer of cells can also provide a layer of abstraction.

Deep Learning addresses exactly this problem. The basic idea is to perform an unsupervised learning procedure on every single layer in addition to using gradient descent for the network as a whole. The goal of the unsupervised learning is to make each single layer extract characteristic features out of its input that can be used by subsequent layers.

Although the term "Deep Learning" is currently being used much too widely, it is more than just a marketing hype.

Edit: A few years ago, many people, including myself, believed that unsupervised pre-training was the main enabler of deep learning. Since then, other techniques became popular that produce even better results in many cases. As mentioned in the comment by @Safak Okzan (below his own answer), these include:

  • Residual Networks

  • Batch normalization

  • Rectified linear units


I beg to differ with @Frank Puffer's answer. I don't understand what he meant by performing an unsupervised learning procedure on the hidden layers etc.

Deep Learning refers to Neural Network models with generally more than 2 or 3 hidden layers. Most DL models have 10 to 100 or more layers.

The recent revolution in the Deep Learning models relies on two things:
1. the availability of lots of data--which is a product of the internet age
2. the availability of GPUs

The algorithm used for optimization of DL models is called the backpropagation algorithm (which is mathematically equivalent to gradient decent). Backprop actually has been around since at least the 80s--it's not a DL specific thing.

DL models generally require copious amounts of data due to the complexity and size of the models. They typically have millions of tunable weight parameters. The optimization requires high compute power because of the size of training data and the millions of partial derivatives (with respect to the weights) that need to be computed at each iteration.

In essence, Deep Learning is not a marketing hype. It's a large multi layered Neural Network model that requires lots of data and powerful GPUs to train. And once it's trained they achieve super-human accuracies at certain tasks.


In recent years, the models developed to solve various machine learning problems have become far more complex, with a very large number of layers. For example, Google's Inception-v3 model has (I think) 42 layers. Traditional neural networks used to typically use only a handful of hidden layers. The term "Deep" used in the context of "Deep Learning" and "Deep Convolutional Neural Nets" is a nod to the substantial # of layers involved.