How can a set contain itself?
Yes, this is an issue.
Naively, this issue cannot be dealt with, and we'll get to that in a moment. But in 1917 mathematicians already noticed that "normal sets" do not contain themselves, and in fact have an even stronger property. Namely, there are no infinite decreasing chains in $\in$, so not only that $a\notin a$ it is also true that $a\notin b$ whenever $b\in a$, and that $a\notin c$ whenever for some $b\in a$ we have $c\in b$; and more generally there is no sequence $x_n$ such that $x_{n+1}\in x_n$ for all $n$.
This is exactly what the axiom of regularity came to formalize. It says that the membership relation is well-founded, which assuming the axiom of choice, is equivalent to saying that there are no decreasing chains. In particular $A\notin A$, for any set $A$.
But we know, nowadays, that it is consistent relative to the other axioms of modern set theory (read: $\sf ZFC$) that there are sets which include themselves, namely $x\in x$. We can even go as far as having $x=\{x\}$. You can even arrange for infinitely many sets of the form $x=\{x\}$.
This shows that naively we cannot prove nor disprove that sets which contain themselves exist. Because naive set theory has no formal axioms, and is usually taken as a subset of axioms which include very little from $\sf ZFC$ in terms of axioms, and certainly it does not include the axiom of regularity.
But it also tells us that we cannot point out at a set which includes itself, if we do not assume the axiom of regularity. Since these sets cannot be defined in a nontrivial way. They may exist and may not exist, depending on the universe of sets we are in. But we do know that in order to do naive set theory and even more, we can safely assume that this situation never occurs.
I guess you are asking what makes a great man trouble himself with such a trivial problem. The following excerpt is Russell's own explanation of his mental journey:
I was led to this contradiction by considering Cantor's proof that there is no greatest cardinal number. I thought, in my innocence, that the number of all the things there are in the world must be the greatest possible number, and I applied his proof to this number to see what would happen. This process led me to the consideration of a very peculiar class. Thinking along the lines which had hitherto seemed adequate, it seemed to me that a class sometimes is, and sometimes is not, a member of itself. The class of teaspoons, for example, is not another teaspoon, but the class of things that are not teaspoons, is one of the things that are not teaspoons. There seemed to be instances that are not negative: for example, the class of all classes is a class. The application of Cantor's argument led me to consider the classes that are not members of themselves; and these, it seemed, must form a class. I asked myself whether this class is a member of itself or not. If it is a member of itself, it must possess the defining property of the class, which is to be not a member of itself. If it is not a member of itself, it must not possess the defining property of the class, and therefore must be a member of itself. Thus each alternative leads to its opposite and there is a contradiction.
At first I thought there must be some trivial error in my reasoning. I inspected each step under logical microscope, but I could not discover anything wrong. I wrote to Frege about it, who replied that arithmetic was tottering and that he saw that his Law V was false. Frege was so disturbed by this contradiction that he gave up the attempt to deduce arithmetic from logic, to which, until then, his life had been mainly devoted. Like the Pythagoreans when confronted with incommensurables, he took refuge in geometry and apparently considered that his life's work up to that moment had been misguided.
Source:Russell, Bertrand. My Philosophical development. Chapter VII Principia Mathematica: Philosophical Aspects. New York: Simon and Schuster, 1959
The reason we've learned how to develop logic and set-theroy with the "depth" thing is precisely to avoid the paradoxes of naive set theory.
One of the key ideas that naive set theory runs with is the idea of equating a logical predicate with the set of all things satisfying the predicate.
This is, I believe, actually an ancient philosophical idea: "What is blue?" "The collection of all things that we would call blue."
With the idea that sets can be used to translate logical notions into actual mathematical objects (sets) that we can then reason with, Cantor gave us (unrestricted) comprehension: for any logical predicate $\varphi$, there is a set of all things satisfying $\varphi$. In class-builder notation, Cantor said the following is a set:
$$ \{ x \mid \varphi(x) \} $$
There is nothing here to prevent a set from containing itself. In fact, we can prove there are sets containing themselves: if you select $\varphi$ to be the predicate "___ is a set", then unrestricted comprehension tells us that there is a set of all sets. And since it is a set, it must be a member of itself.
Zermelo's axioms for set theory are based on constructions; e.g. the axiom of pairing says that if $x$ and $y$ are sets, then $\{ x,y \}$ is a set. All sets we can explicitly construct using these constructions do have 'depth', but Zermelo's axioms are lacking any sort of induction principle that would allow us to prove that all sets are 'constructible', or even that they have a 'depth'.
And, in fact, Z set theory is consistent with the existence of sets that contain themselves. In fact, if you remove the axiom of foundation from ZFC, then that too is consistent with the existence of sets that contain themselves.