What is the difference between Q-learning, Deep Q-learning and Deep Q-network?

$\begingroup$

In Q-learning (and in general value based reinforcement learning) we are typically interested in learning a Q-function, $Q(s, a)$. This is defined as
$$Q(s, a) = \mathbb{E}_\pi\left[ G_t | S_t = s, A_t = a \right]\;.$$

For tabular Q-learning, where you have a finite state and action space you can maintain a table lookup that maintains your current estimate of the Q-value. Note that in practice even the spaces being finite might not be enough to not use DQN, if e.g. your state space contains a large number, say $10^{10000}$, of states, then it might not be manageable to maintain a separate Q-function for each state-action pair

When you have an infinite state space (and/or action space) then it becomes impossible to use a table, and so you need to use function approximation to generalise across states. This is typically done using a deep neural network due to their expressive power. As a technical aside, the Q-networks don’t usually take state and action as input, but take in a representation of the state (e.g. a $d$-dimensional vector, or an image) and output a real valued vector of size $|\mathcal{A}|$, where $\mathcal{A}$ is the action space.

Now, it seems in your question that you’re confused as to why you use a model (the neural network) when Q-learning is, as you rightly say, model-free. The answer here is that when we talk about Reinforcement Learnings being model-free we are not talking about how their value-functions or policy are parameterised, we are actually talking about whether the algorithms use a model of the transition dynamics to help with their learning. That is, a model free algorithm doesn’t use any knowledge about $p(s’ | s, a)$ whereas model-based methods look to use this transition function – either because it is known exactly such as in Atari environments, or it must need to be approximated – to perform planning with the dynamics.