Git-Based Source Control in the Enterprise: Suggested Tools and Practices?
I'm the SCM engineer for a reasonably large development organization, and we converted to git from svn over the last year or so. We use it in a centralized fashion.
We use gitosis to host the repositories. We broke our monolithic svn repositories up into many smaller git repositories as git's branching unit is basically the repository. (There are ways around that, but they're awkward.) If you want per-branch kinds of access controls, gitolite might be a better approach. There's also an inside-the-firewall version of GitHub if you care to spend the money. For our purposes, gitosis is fine because we have pretty open permissions on our repositories. (We have groups of people who have write access to groups of repositories, and everyone has read access to all repositories.) We use gitweb for a web interface.
As for some of your specific concerns:
- merges: You can use a visual merge tool of your choice; there are instructions in various places on how to set it up. The fact that you can do the merge and check its validity totally on your local repo is, in my opinion, a major plus for git; you can verify the merge before you push anything.
- GUIs: We have a few people using TortoiseGit but I don't really recommend it; it seems to interact in odd ways with the command line. I have to agree that this is an area that needs improvement. (That said, I am not a fan of GUIs for version control in general.)
- small-group tracking branches: If you use something that provides finer-grained ACLs like gitolite, it's easy enough to do this, but you can also create a shared branch by connecting various developers' local repositories — a git repo can have multiple remotes.
We switched to git because we have lots of remote developers, and because we had many issues with Subversion. We're still experimenting with workflows, but at the moment we basically use it the same way as we used to use Subversion. Another thing we liked about it was that it opened up other possible workflows, like the use of staging repositories for code review and sharing of code among small groups. It's also encouraged a lot of people to start tracking their personal scripts and so forth because it's so easy to create a repository.
Against the common opinion, I think that using a DVCS is an ideal choice in an enterprise setting because it enables very flexible workflows. I will talk about using a DVCS vs. CVCS first, best-practices and then about git in particular.
DVCS vs. CVCS in an enterprise context:
I wont talk about the general pros/cons here, but rather focus on your context. It is the common conception, that using a DVCS requires a more disciplined team than using a centralized system. This is because a centralized system provides you with an easy way to enforce your workflow, using a decentralized system requires more communication and discipline to stick to the established of conventions. While this may seem like it induces overhead, I see benefit in the increased communication necessary to make it a good process. Your team will need to communicate about code, about changes and about project status in general.
Another dimension in the context of discipline is encouraging branching and experiments. Here's a quote from Martin Fowler's recent bliki entry on Version Control Tools, he has found a very concise description for this phenomenon.
DVCS encourages quick branching for experimentation. You can do branches in Subversion, but the fact that they are visible to all discourages people from opening up a branch for experimental work. Similarly a DVCS encourages check-pointing of work: committing incomplete changes, that may not even compile or pass tests, to your local repository. Again you could do this on a developer branch in Subversion, but the fact that such branches are in the shared space makes people less likely to do so.
DVCS enables flexible workflows because they provide changeset tracking via globally unique identifiers in a directed acyclic graph (DAG) instead of simple textual diffs. This allows them to transparently track the origin and history of a changeset, which can be quite important.
Workflows:
Larry Osterman (a Microsoft dev working on the Windows team) has a great blog post about the workflow they employ at the Windows team. Most notably they have:
- A clean, high quality code only trunk (master repo)
- All development happens on feature branches
- Feature teams have team repos
- They do regularily merge the latest trunk changes into their feature branch (Forward Integrate)
- Complete features must pass several quality gates e.g. review, test coverage, Q&A (repos on their own)
- If a feature is completed and has acceptable quality it is merged into the trunk (Reverse Integrate)
As you can see, having each of these repositories live on their own you can decouple different teams advancing at different paces. Also the possibility to implement a flexible quality gate system distinguishes DVCS from a CVCS. You can solve your permission issues at this level too. Only a handful of people should be allowed access to the master repo. For each level of the hierachy, have a seperate repo with the corresponding access policies. Indeed, this approach can be very flexible on the team level. You should leave it up to each team to decide wether they want to share their team repo among themselves or if they want a more hierachical approach where only the team lead may commit to the team repo.
(The picture is stolen from Joel Spolsky's hginit.com.)
One thing remains to be said at this point though:- even though DVCS provides great merging capabilities, this is never a replacement for using Continuous Integration. Even at that point you have a great deal of flexibility: CI for the trunk repo, CI for team repos, Q&A repos etc.
Git in an enterprise context:
Git is maybe not the ideal solution for an enterprise context as you have already pointed out. Repeating some of your concerns, I think most notably they are:
Still somewhat immature support on Windows (please correct me if that changed recently)Now windows has github windows client , tortoisegit , SourceTree from atlassian- Lack of mature GUI tools, no first class citizen vdiff/merge tool integration
- Inconsistent interface with a very low level of abstractions on top of its inner workings
- A very steep learning curve for svn users
- Git is very powerful and makes it easy to modify history, very dangerous if you don't know what you are doing (and you will sometimes even if you thought you knew)
- No commercial support options available
I don't want to start a git vs. hg flamewar here, you have already done the right step by switching to a DVCS. Mercurial addresses some of the points above and I think it is therefore better suited in an enterprise context:
- All plattforms that run python are supported
- Great GUI tools on all major plattforms (win/linux/OS X), first class merge/vdiff tool integration
- Very consistent interface, easy transition for svn users
- Can do most of the things git can do too, but provides a cleaner abstraction. Dangerous operations are are always explicit. Advanced features are provided via extensions that must explicitly be enabled.
- Commercial support is available from selenic.
In short, when using DVCS in an enterprise I think it's important to choose a tool that introduces the least friction. For the transition to be successful it's especially important to consider the varying skill between developers (in regards to VCS).
Reducing friction:
Ok, since you appear to be really stuck with the situation, there are two options left IMHO. There is no tool to make git less complicated; git is complicated. Either you confront this or work around git:-
- Get a git introductory course for the whole team. This should include the basics only and some exercises (important!).
- Convert the master repo to svn and let the "young-stars" git-svn. This gives most of the developers an easy to use interface and may compensate for the lacking discipline in your team, while the young-stars can continue to use git for their own repos.
To be honest, I think you really have a people problem rather than a tool problem. What can be done to improve upon this situation?
- You should make it clear that you think your current process will end up with a maintainable codebase.
- Invest some time into Continous Integration. As I outlined above, regardless which kind of VCS you use, there's never a replacement for CI. You stated that there are people who push crap into the master repo: Have them fix their crap while a red alert goes off and blames them for breaking the build (or not meeting a quality metric or whatever).