How to formally model the "hesitation" in the hat-guessing puzzle?
The puzzle is older than its mathematical formalizations and a version of it dates back to the fifties. The most common way to model such a situation is by using the partitional model introduced by Robert Aumann. There is a finite set (can be somewhat generalized) $\Omega$ of states. A state describes everything relevant that could be the case. In the puzzle, a state could describe who wears what hat, so there are seven states $$\Omega=\{www,wwb,wbw,bww,bbw,bwb,wbb\},$$ where for example $wbw$ stands for kid $1$ wearing a white hat, kid $2$ wearing a black hat, and kid $3$ wearing a white hat. Now we have to model what the kids know. We do this the following way: A person is able to differentiate between certain states, but not others. We model this by that person having a partition of $\Omega$ and the person gets informed in which element in the partition the true state lies but not the precise state. In the puzzle, a kid is unable to differentiate states in which everyone else wears the same hat. For example, kid $2$ has the partition $$P_2=\big\{\{www,wbw\}, \{bww,bbw\},\{wwb,wbb\},\{bwb\}\big\}.$$
Note that kid $2$ knows the color of her hat only if the true state is $bwb$. Kid $1$ knows it if the true state is $wbb$ and kid $3$ know it if it is $bbw$. Generally, let $P_i$ be the information partition of a person $i$. If the true state is $\omega$, we let $P_i(\omega)$ be the element of the partition that contains $\omega$ and we interpret it as the set of states $i$ deems possible. An event is simply a set of states. We say that $i$ knows the event $E$ at state $\omega$, if $P_i(\omega)\subseteq E$. We let $K_i(E)$ be the set of states at which $i$ knows $E$. Note that $K_i(E)$ is an event itself. So, one can write things like $K_1(K_2(K_1(E)))$, which can be interpreted as $1$ knowing that $2$ knows that $1$ knows $E$.
Now in the puzzle, the true state is $www$. But a kid seeing two white hats does not tell her anything about the color of her own hat. Nobody knows the color of her hat, for otherwise she would say it. The event that nobody knows the color of her hat is $$E=\{www,wwb,wbw,bww\},$$ which is exactly the set of states at which no two kids wear a black hat. Everyone knows this and gets therefore a new partition in which this knowledge is incorporated. For example, $$P_2'=\big\{\{www,wbw\}, \{bww\},\{bbw\},\{wwb\},\{wbb\},\{bwb\}\big\}.$$ Formally, this is the coarsest partition that is at least as fine as the two partitions $P_2$ and $\{E,E^C\}$ and this is how the model is learning. Even with this partition, no kid knows the color of her hat. Everyone knows this, and this allows her to deduce that the state is not on in which the other two kids have white heads and she has black hats, for the other kids could then deduce that they have white hats and they did not. From this, every kid can deduce that the true state is $www$, or more formally $P''_i(www)=\{www\}$ for $i=1,2,3$, so this becomes a formal statement.
A survey of this kind of modeling has bee written by John Geanakoplos for the Handbook of Game Theory and Economic Applcations. The survey can be found here. It also discusses essentially the same puzzle. The article is somewhat technical, and a slightly simplified version can be found here.
The variations of this question that I've seen fall into two types:
- Time is very carefully quantized (e.g. every day at noon they gather to say something if possible), so that it is possible to make inferences of the sort "Nobody said anything during the last unit of time".
- The knowledge that the other people couldn't say anything with the knowledge immediately given is all that one needs, so one can patiently wait long enough that any reasonable person would have given an answer if they could to gain that information.
The question you state, I believe, is a bad one (at least as translated). Hesitation is, as you put it, too vague; without knowledge of how quickly the other people can make deductions (and knowledge of how quickly they think you can make deductions, et cetera), it becomes difficult to impossible to make reliable statements as the number of people grows.
These type of questions are analyzed in game theory/microeconomics and typically rely on probability filtrations (discrete or continuous time information updating) or algebraic topological methods. The hesitation, as suggested by comment above, is updated information over time; and it results from inaction as a different information set would have led to action. This is surprisingly hard stuff, but analyzing a similar problem made someone from my school his year's job market superstar (Sherlock Holmes - Dr. Moriarity puzzle, using hierarchies of beliefs)
This isn't truly helpful to help you see how to model your particular problem. But you seem genuinely interested in understanding this better, and googling any of the above keywords (also Repeated Games; Bounded Rationality) should help.
P.S.: The muddied children problem, say, would probably be modeled using hierarchies of beliefs, which are sequences of reasoning "You know x", "I know that you know x", "You know that I know that you know x"...., which are then (often) examined for fix points.