What is "code" in "toric code"?
Let me try to give you the answer in just the right amount of generality. A quantum code is just a short way to say a quantum error-correcting code. It is a special embedding of one vector space into another larger one that satisfies some additional properties. If we start with a Hilbert space $H$, then a code is a decomposition into $H = (A \otimes B) \oplus C$. The quantum information is encoded into system $A$. In the event that $B$ is trivial, then indeed this is just a subspace of $H$. When $B$ is nontrivial, we say call it a subsystem code. Let's specialize to the case of $n$ qubits, so that $H = (\mathbb{C}^2)^{\otimes n}$, and it is easiest to imagine that the dimensions of $A$, $B$, and $C$ are all powers of 2, though of course this discussion could be generalized.
Let $P$ be the orthogonal projector onto $A\otimes B$, and let $\mathcal{E}$ be an arbitrary quantum channel, i.e. a completely positive trace preserving linear map. We say that $\mathcal{E}$ is recoverable if there exists another quantum channel $\mathcal{R}$ such that for all states $\rho_A \otimes \rho_B$, we have $$\mathcal{R}\circ\mathcal{E}(\rho_A \otimes \rho_B) = \rho_A \otimes \rho'_B,$$ where $\rho'_B$ is arbitrary. This says that for any state which is supported on $A\otimes B$ and is initially separable, we can reverse the effects of $\mathcal{E}$ up to a change on system $B$.
Fortunately, there are simpler equivalent conditions that one can check instead. For example, an equivalent condition can be stated in terms of the Kraus operators $E_j$ for the channel $\mathcal{E}$. The subsystem $A$ is correctable for $\mathcal{E}(\rho) = \sum_j E_j \rho E_j^\dagger$ if an only if for all $i,j$, there exists a $g^{ij}_B$ on subsystem $B$ such that $$ P E_i^\dagger E_j P = 1\hspace{-3pt}\mathrm{l}_A \otimes g^{ij}_B.$$ You can interpret this condition as saying that no error process associated to the channel $\mathcal{E}$ can gain any information about subsystem $A$.
Consider error channels which consist of Kraus operators that, when expanded in the Pauli basis, only have support on at most $d$ of the $n$ qubits in our Hilbert space. If every such channel is correctable for subsystem $A$, then we say our code has distance $d$. The largest such $d$ is called the distance of the code. For the toric code, this is the linear size of the lattice.
In general a "code" in quantum information is a collection of "codewords". One takes a (relatively large) number of physical qubits and considers only a limited number of states (i.e. a low-dimensional subspace) out of the full Hilbert space (formally, the code is the subspace). For stabilizer codes, all of these states are ground states of some gapped hamiltonian. Because there is a good deal of redundancy, the system is more robust against errors and decoherence.
Perhaps a clearer example would be the repetition code. Say you have six qubits, but you restrict yourself to the subspace spanned by $$\{|000,000\rangle,|000,111\rangle,|111,000\rangle,|111,111\rangle\}.$$ You can then think of these long combinations as "codewords" for the logical states $$\{|00\rangle,|01\rangle,|10\rangle,|11\rangle\},$$ but they can still be recognized if any given qubit suffers some error.