How exactly does wheel factorization work and what is it used for?
Since trial division is mostly useless for factoring large numbers, and using a prime number sieve for factoring is just a minor refinement of trial division, you shouldn't think of it as a factorization algorithm. Instead, this is a prime-generating algorithm: the goal is to generate the list of primes in the set $[n] := \{1, 2, 3, \dots, n\}$ as quickly as possible.
We are trying to improve on the sieve of Eratosthenes in efficiency, which does $\Theta(n \cdot \log \log n)$ arithmetical operations on elements of $[n]$.
Wheel factorization does this by using the fact that for the first few primes, the sieve we're constructing is periodic, and there's no point in extending the periodic pattern all the way to $n$. Instead, we only generate the list of numbers not divisible by the first $k$ primes $p_1, p_2, \dots, p_k$ only up to their product $p_1 p_2 \dotsm p_k$. That is, we:
- Begin by generating the list of numbers in $\{1,2,3,4,5,6\}$ not divisible by $2$ or $3$: it is $\{1,5\}$.
- Extend this to the list of numbers in $\{1,2,3,\dots,30\}$ not divisible by $2$, $3$, or $5$: it is $\{1,7,11,13,17,19,23,29\}$.
- Extend this to the list of numbers in $\{1,2,3,\dots, 210\}$ not divisible by $2$, $3$, $5$, or $7$, and so on.
For each extension step, if the set we've generated is $S$ and the next prime we're adding is $p$, then the next set consists of $p$ translated copies of $S$, with $p \cdot S$ removed. For example, if $S = \{1,5\}$ and $p=5$, then we repeat $S$ $5$ times (to get $\{1,5\} \cup \{7,11\} \cup \{13,17\} \cup \{19, 23\} \cup \{25, 29\}$) and remove $5\cdot S = \{5,25\}$. By the way, $p$ is also easy to find: it is the element of $S$ after $1$.
Once $p_1 p_2 \dotsm p_k > n$, we no longer take repeated copies of $S$, and just remove $p \cdot S$ from $S$ to extend. We stop, as with the sieve of Eratosthenes, when $p_k > \sqrt n$. At this point, $S$ contains all the primes larger than $p_k$; the primes smaller than $p_k$ are the ones we used along the way, which we keep track of separately.
According to this paper, this only requires $\Theta(\frac{n}{\log \log n})$ arithmetical operations on elements of $[n]$, if implemented carefully.