Where does the $i$ come from in the Schrödinger equation?
Let $U$ be an unitary operator. Write $$ U=\mathbb I+\epsilon A $$ for some $\epsilon\in\mathbb C$, and some operator $A$.
Unitarity means $U^\dagger U=\mathbb I$, i.e., $$ U^\dagger U=(\mathbb I+\epsilon^* A^\dagger)(\mathbb I+\epsilon A)=I+\epsilon^*A^\dagger+\epsilon A+\mathcal O(\epsilon^2) $$
Therefore, if $U^\dagger U=\mathbb I$, we must have $$ \epsilon^*A^\dagger+\epsilon A=0 $$
How can we archive this? We can always redefine both $\epsilon$ and $A$ so that $\epsilon$ is real. If you do this, we get $A^\dagger=-A$, i.e., $A$ is anti-hermitian. In principle, this is perfectly valid, but we can do better.
If we choose $\epsilon$ imaginary, we get $A^\dagger=A$. We like this option better, because we like hermitian operators. If $A$ is to be identified with the Hamiltonian, we better have $\epsilon$ imaginary, because otherwise $H$ cannot be hermitian (i.e., observable).
Note that Susskind writes $U=\mathbb I-i\epsilon H$ instead of $U=\mathbb I+i\epsilon H$. This negative sign is just convention, it is what everybody does, but in principle it could be a $+$ sign. This sign doesn't affect the physics (but we must be consistent with our choice). This is similar to certain ODE's in classical mechanics (driven harmonic oscillator, RLC circuits, etc), where we use the ansatz $x(t)=\mathrm e^{-i\omega t}$, with a minus sign for historical reasons.
So, we include the factor $i$ in $U$, and we end up with the Schrödinger equation. Had we not included the $i$, we would have got $$ \frac{\partial\psi}{\partial t}=\nabla^2\psi $$ where I take $\hbar=2m=1$ and $V=0$ to simplify the analysis (this doesn't change the conclusions). Note that this is the heat equation. The general solution of the heat equation is
$$ \psi(x,t)=\int\mathrm dy\ \frac{\psi(y,0)}{\sqrt{4\pi t}}\mathrm e^{-(x-y)^2/4t} $$
No matter what $\psi(y,0)$ is, this solution is non-propagating, non-oscillatory and decays in time. Therefore, the "particles" described by the heat equation don't move, and they slowly disappear! (for example, "stationary" solutions are of the form $\psi(x,t)=\mathrm e^{-Et}\phi(x)$, which goes to zero as $t\to \infty$).
Also, if it were not for the $i$ in Schrödinger equation, $\int\mathrm dx\ |\psi(x,t)|^2$ wouldn't be time independent, so the total probability would change in time, and this makes no sense. Therefore, the $i$ in Schrödinger equation makes the Born interpretation of the wave-function possible!
Some things you might want to check out:
Continuous Groups, Lie Groups, and Lie Algebras, for example in http://www.cmth.ph.ic.ac.uk/people/d.vvedensky/groups/Chapter7.pdf
Wigner's theorem: symmetries are realised through linear/unitary operators, or through antilinear/antiunitary operators.
Translation operator: in quantum mechanics (and in classical mechanics as well, in a sense), space/time translations are represented through unitary operators, where the generators of such operations are the energy/momentum operators.
Spectral theorem: Hermitian operators have real eigenvalues.
The Maximum Principle of the heat equation: if $\psi(x,t)$ solves the heat equation and $t_2>t_1$, then $\psi (t_2,x)\le \psi(t_1,y)\ \forall x,y$, which means $\psi$ "decreases" in time (therefore, probability "is destroyed", or "particles disappear").
Schrödinger versus heat equations
Generally, the $i$ doesn't have to be there: we can just define the time evolution to be $e^{tK}$, but then $K$ will be anti-hermitian. It really makes no difference. I guess it's more convenient to deal with hermitian operators, so people just factored out an $i$: $$K=iH.$$ As for why Hermitian operators are nicer to deal with, note that Dirac's beloved bra-ket sandwiches with this operator $K$, all over QM, would be slightly more ambiguous: we'd get nonsensical expressions like $$\left<\psi\right|K\left|\psi\right>=\left<\psi\right|-K\left|\psi\right>$$ And so we'd get minus signs, depending on whether we're acting on the left or on the right. It's simply easier to have a hermitian operator around.
My loosey-goosey intuition for why the $i$ appears in Schrödinger's equation comes down to the logarithm of a unitary matrix being $i$ times a Hermitian matrix, and differential equations tending to exponentiate things.
All quantum operations are unitary (besides measurement, depending on your favorite interpretation). Unitary matrices have a lot of nice properties. In particular, their eigendecomposition has perpendicular eigenvectors with eigenvalues from the complex unit circle. Symbolically:
$$\begin{align} U=&\sum_{k} e^{i\theta_k} \left|v_k\right\rangle \left\langle v_k \right| \end{align}$$
Where each $\theta_k$ is between $0$ and $2 \pi$, and the $v_k$'s are all mutually perpendicular.
If you want to turn a unitary matrix into a continuous action over time, you need to interpolate it. You could do that either by raising it to a fractional power like $U_t = U^t$, or by scaling its logarithm in an exponential like $U_t = e^{t \ln(U)}$. We're going to use the logarithm approach because it's more elegant, and more useful for differential equations (*cough* *cough*).
Because we have an eigendecomposition for $U$ in terms of $e^\text{whatever}$, the logarithm has a very nice form. The phase factors' angles drop down:
$$\begin{align} \ln(U)&= \ln\sum_{k} e^{i\theta_k} \left|v_k\right\rangle \left\langle v_k \right| \\&= \sum_{k} \ln(e^{i\theta_k}) \left|v_k\right\rangle \left\langle v_k \right| \\&= \sum_{k} i \theta_k \left|v_k\right\rangle \left\langle v_k \right| \\&= i \sum_{k} \theta_k \left|v_k\right\rangle \left\langle v_k \right| \\&= i H \end{align}$$
Here we've defined $H$ in terms of an eigendecomposition with perpendicular vectors scaled by real eigenvalues. Therefore $H$ is Hermitian. So the logarithm of a unitary matrix is $i$ times a Hermitian matrix, and our interpolation-by-exponentiating-a-scaled-logarithm becomes:
$$U_t = e^{t \ln(U)} = e^{i t H}$$
If you were paying attention, you saw where the $i$ came from.
- It initially appeared because unitary matrices only apply phase factors to their eigenvectors, so their eigenvalues are like $e^{i \theta_k}$.
- When we computed the logarithm, to interpolate the unitary matrix's effect, the $i$ dropped out of the exponential and escaped the sum.
- The remaining sum was a Hermitian matrix.
Since we want our differential equation to have unitary solutions, and differential equations tend to exponentiate things, we'd better put $i$ times a Hermitian matrix in there.