Why is kubernetes source code an order of magnitude larger than other container orchestrators?
First and foremost: don't be misled by the number of lines in the code, most of it are dependencies in the vendor
folder that does not account for the core logic (utilities, client libraries, gRPC, etcd, etc.).
Raw LoC Analysis with cloc
To put things into perspective, for Kubernetes:
$ cloc kubernetes --exclude-dir=vendor,_vendor,build,examples,docs,Godeps,translations
7072 text files.
6728 unique files.
1710 files ignored.
github.com/AlDanial/cloc v 1.70 T=38.72 s (138.7 files/s, 39904.3 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Go 4485 115492 139041 1043546
JSON 94 5 0 118729
HTML 7 509 1 29358
Bourne Shell 322 5887 10884 27492
YAML 244 374 508 10434
JavaScript 17 1550 2271 9910
Markdown 75 1468 0 5111
Protocol Buffers 43 2715 8933 4346
CSS 3 0 5 1402
make 45 346 868 976
Python 11 202 305 958
Bourne Again Shell 13 127 213 655
sed 6 5 41 152
XML 3 0 0 88
Groovy 1 2 0 16
--------------------------------------------------------------------------------
SUM: 5369 128682 163070 1253173
--------------------------------------------------------------------------------
For Docker (and not Swarm or Swarm mode as this includes more features like volumes, networking, and plugins that are not included in these repositories). We do not include projects like Machine, Compose, libnetwork, so in reality the whole docker platform might include much more LoC:
$ cloc docker --exclude-dir=vendor,_vendor,build,docs
2165 text files.
2144 unique files.
255 files ignored.
github.com/AlDanial/cloc v 1.70 T=8.96 s (213.8 files/s, 30254.0 lines/s)
-----------------------------------------------------------------------------------
Language files blank comment code
-----------------------------------------------------------------------------------
Go 1618 33538 21691 178383
Markdown 148 3167 0 11265
YAML 6 216 117 7851
Bourne Again Shell 66 838 611 5702
Bourne Shell 46 768 612 3795
JSON 10 24 0 1347
PowerShell 2 87 120 292
make 4 60 22 183
C 8 27 12 179
Windows Resource File 3 10 3 32
Windows Message File 1 7 0 32
vim script 2 9 5 18
Assembly 1 0 0 7
-----------------------------------------------------------------------------------
SUM: 1915 38751 23193 209086
-----------------------------------------------------------------------------------
Please note that these are very raw estimations, using cloc. This might be worth a deeper analysis.
Roughly, it seems like the project accounts for half of the LoC (~1250K LoC) mentioned in the question (whether you value dependencies or not, which is subjective).
What is included in Kubernetes that makes it so big?
Most of the bloat comes from libraries supporting various Cloud providers to ease the bootstrapping on their platform or to support specific features (volumes, etc.) through plugins. It also has a Lot of Examples to dismiss from the line count. A fair LoC estimation needs to exclude a lot of unnecessary documentation and example directories.
It is also much more feature rich compared to Docker Swarm, Nomad or Dokku to cite a few. It supports advanced networking scenarios, has load balancing built-in, includes PetSets, Cluster Federation, volume plugins or other features that other projects do not support yet.
It supports multiple container engines, so it is not exclusively running docker containers but could possibly run other engines (such as rkt).
A lot of the core logic involves interaction with other components: Key-Value stores, client libraries, plugins, etc. which extends far beyond simple scenarios.
Distributed Systems are notoriously hard, and Kubernetes seems to support a majority of the tooling from key players in the container industry without compromise (where other solutions are making such compromise). As a result, the project can look artificially bloated and too big for its core mission (deploying containers at scale). In reality, these statistics are not that surprising.
Key idea
Comparing Kubernetes to Docker or Dokku is not really appropriate. The scope of the project is far bigger and it includes much more features as it is not limited to the Docker family of tooling.
While Docker has a lot of its features scattered across multiple libraries, Kubernetes tends to have everything under its core repository (which inflates the line count substantially but also explains the popularity of the project).
Considering this, the LoC statistic is not that surprising.
Aside from the reasons given by @abronan, the Kubernetes codebase contains lots of duplication and generated files which will artificially increase the code size. The actual size of the code that does "real work" is much smaller.
For example, take a look at the staging directory. This directory is 500,000 LOC but nothing in there is original code; it is all copied from elsewhere in the Kubernetes repo and rearranged. This artificially inflates the total LOC.
There's also things like Swagger API generation which are auto-generated files that describe the Kubernetes API in the OpenAPI format. Here are some places where I found these files:
kubernetes/api/
Kubernetes/federation/apis/swagger-spec
kubernetes/federation/apis/openapi-spec
Together these files account for ~116,000 LOC and all they do is describe the Kubernetes API in OpenAPI format!
And these are just the OpenAPI definition files - the total number of LOC required to support OpenAPI is probably much higher. For instance, I've found a ~12,000 LOC file and a ~13,000 LOC file that are related to supporting Swagger/OpenAPI. I'm sure there are plenty more files related to this feature as well.
The point is that the code that does the actual heavy lifting behind the scenes might be a small fraction of the supporting code that is required to make Kubernetes a maintainable and scalable project.