The "functions" of untyped lambda calculus are not (set theoretic) functions so what are they?
Actually it is possible to interpret $\lambda$-terms (I assume you use $\lambda$-function as a synonym for $\lambda$-term) as some sort of functions.
This is due to a result of Dana Scott, who introduced domain theory exactly for the purpose of giving a function-like based semantics for the (untyped) $\lambda$-calculus.
The idea is to model the universe of the $\lambda$-terms as a very special domain $D$, which is a poset satisfying some properties. One of these particular properties of this domain is that there is an isomorphism between the domain $D$ and the domain $[D,D]$ of the scott continuous functions from $D$ into itself.
Through the said isomorphism is possible to interpret each $\lambda$-term (i.e. an element of $D$) as a function (as an element of $[D,D]$) and, on the other hand, every function in $[D,D]$ is (i.e. corresponds via the isomorphism to) a $\lambda$-term.
The critical point here is that $[D,D]$ is not the set of functions from $D$ into itself: it would not be possible to represent all these functions through $\lambda$-terms due to cardinality reasons, but it is possible to represent $\lambda$-terms as a very well suited subset of functions over some set.
I've added a section to the bottom of my answer on a more general approach that shows how you can treat functions that can be applied to themselves within set theory, representing them as something other than sets of ordered pairs.
First, in spite of the name, a $\lambda$-function isn't a function in the mathematical sense. It's a string (or some equivalent mathematical object) that represents an algorithm — in other words, it's a computer program. (This is the similar to the sense in which the word "function" is used in computer languages, which is different from the usual mathematical usage.)
Mathematical functions are extensional; in other words, if $f$ and $g$ have the same values on all their arguments, then $f=g.$ This is why functions can be considered to be sets of ordered pairs. Only the argument-value pairs matter, not how they're computed or determined.
Computer programs, in contrast, are intensional. Two different programs that compute the same values on all their arguments are still different programs. In fact we have no way of even determining (algorithmically) whether two computer programs compute the same function.
A better name than "$\lambda$-function" is "$\lambda$-expression" because these things are expressions or strings in a language, not actual functions themselves.
Formally, if you have an alphabet $A$ and if $S$ is the set of strings on that alphabet, a $\lambda$-expression $e$ is a member of $S$ satisfying certain syntactical requirements. It isn't a function, but it does describe an algorithm that computes a certain function $\phi_e\colon D_e\to S$ for some set $D_e \subseteq S$ (the domain $D_e$ may not be all of $S$ because the computer program that $e$ describes may not halt on all possible input strings). Since $e\in S,$ you can compute $\phi_e(e)$ just as you can compute $\phi_e(x)$ for any $x\in S$ (of course, any such computation may or may not halt, so it's possible that $\phi_e(x)$ doesn't happen to have a value).
Now $\phi_e$ is just a normal mathematical function; you can regard it as a set of ordered pairs. From that point of view, $\phi_e \subseteq S\times S.$ Notice that $e$ itself is a member of just $S.$ You can have $e_1\ne e_2$ (different programs) but $\phi_{e_1}=\phi_{e_2}$ (the different programs happen to always compute the same output if given the same input).
So we never apply a function to itself. We apply a function to a string, and we also happen to use strings as computer programs that can be used to compute functions.
Incidentally, you can also define the application function $\alpha$ mapping some subset of $S\times S$ to $S$ by setting $\alpha(x,y)=\phi_x(y);$ the ordered pair $\langle x,y \rangle$ belongs to the domain of $\alpha$ iff the string $x$ is a $\lambda$-expression and the computation $\phi_x(y)$ halts.
And there's no problem coding all of this inside ZF.
There's a somewhat different approach to all this which is more general. Let's view the question as asking: How can functions that can be applied to themselves be handled, in light of the fact that a function in set theory is never a member of its own domain?
To avoid confusion, I'll call these hypothetical new functions *-functions just to distinguish them from the actual functions of set theory.
To do this, we need to start with a set $F$ (which is intended to serve eventually as both the domain of these *-functions and the set of all *-functions), a subset $D$ of $F\times F,$ and and a function $\alpha\colon D\to F.$ Note that these are defined in set theory; in particular, $\alpha$ is a function, not a *-function. The function $\alpha$ is intended to represent the application of a *-function to an argument.
So now we'll define a *-function to be a member of $F.$ If $f$ and $g$ are *-functions, we'll define $f(g) = \alpha(f, g)$ if $\langle f,g \rangle \in D;$ if $\langle f,g \rangle \not\in D,$ then $f(g)$ is undefined. This essentially means that we can use each *-function as a partial "function" from $F$ to $F,$ that is, a "function" from some subset of $F$ to $F.$ (I've put "function" in quotes here since these aren't sets of ordered pairs, as functions are usually defined, but they can be used just like functions because we have a definition of what $f(x)$ means for any $f$ and $x$ in $F$. On the other hand, for each $f\in F,$ the function $x \mapsto \alpha(f,x)$ is just a regular mathematical function; we could call it, say, $\varphi_f,$ and treat it as a set of ordered pairs, as usual. We could even have started with the function $f\mapsto \varphi_f$ instead of $\alpha.$ But notice that, in line with the extensionality/intensionality distinction in the first part of this answer, it's possible to have $\varphi_f=\varphi_g$ even when $f\ne g.)$
Incidentally, if you want to avoid having partial "functions," just distinguish one element in $F,$ which I'll call $\uparrow.$ Make $\alpha$ a total function now, with domain all of $F\times F.$ If $\alpha(x,y)$ was undefined, set it equal to $\uparrow$ now. Also set $\alpha(x,\uparrow)=\alpha(\uparrow,x)=\,\uparrow$ for all $x\in F.$ But I think the approach with partial "functions" is clearer, since it better represents the situation.
What really underlies this entire approach is that the definition of function as a set of ordered pairs isn't sacrosanct; it's just a convenient coding that has been adopted to represent functions as sets. Some other coding would be fine too (just as real numbers can be represented as Cauchy sequences or Dedekind cuts or something else, as long as they have the right properties). The key thing for functions is that we need to have a way to apply a function to an argument.
We can define composition
$$ f \circ g = \lambda x. f(gx) $$
(where $x$ is chosen so that it does not appear in $f$ or $g$)
If you take $\alpha$, $\beta$, and $\eta$ reduction as specifying that two terms are equal, then composition is an associative, binary operation with identity
$$ i = \lambda x.x $$
and thus the lambda terms with this operation form a monoid $\Lambda$, and so you can do all of the usual monoid things with it.
Monoids can act on sets. Any monoid can act on its set of elements (e.g. where the action is given by the monoid operation). Application of lambda terms gives another interesting way for $\Lambda$ to act on itself.
Since you bring up category theory, the simply typed $\lambda$-calculus is very important — in a suitable sense, simply typed $\lambda$ calculi and cartesian closed categories are equivalent notions.