What's the difference between backtracking and depth first search?
Backtracking is a more general purpose algorithm.
Depth-First search is a specific form of backtracking related to searching tree structures. From Wikipedia:
One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.
It uses backtracking as part of its means of working with a tree, but is limited to a tree structure.
Backtracking, though, can be used on any type of structure where portions of the domain can be eliminated - whether or not it is a logical tree. The Wiki example uses a chessboard and a specific problem - you can look at a specific move, and eliminate it, then backtrack to the next possible move, eliminate it, etc.
I think this answer to another related question offers more insights.
For me, the difference between backtracking and DFS is that backtracking handles an implicit tree and DFS deals with an explicit one. This seems trivial, but it means a lot. When the search space of a problem is visited by backtracking, the implicit tree gets traversed and pruned in the middle of it. Yet for DFS, the tree/graph it deals with is explicitly constructed and unacceptable cases have already been thrown, i.e. pruned, away before any search is done.
So, backtracking is DFS for implicit tree, while DFS is backtracking without pruning.
Backtracking is usually implemented as DFS plus search pruning. You traverse search space tree depth-first constructing partial solutions along the way. Brute-force DFS can construct all search outcomes, even the ones, that do not make sense practically. This can be also very inefficient to construct all solutions (n! or 2^n). So in reality as you do DFS, you need to also prune partial solutions, which do not make sense in context of the actual task, and focus on the partial solutions, which can lead to valid optimal solutions. This is the actual backtracking technique - you discard partial solutions as early as possible, make a step back and try to find local optimum again.
Nothing stops to traverse search space tree using BFS and execute backtracking strategy along the way, but it doesn't make sense in practice because you would need to store search state layer by layer in the queue, and tree width grows exponentially to the height, so we would waste a lot of space very quickly. That's why trees are usually traversed using DFS. In this case search state is stored in the stack (call stack or explicit structure) and it can't exceed tree height.