Psychedelic Panorama of Foo

Á¦ À̸§ÀÌ Inigo Montoya ÀÔ´Ï´Ù. ´ç½ÅÀÌ ³» ¾Æ¹öÁö¸¦ Á׿´¾î. Á×À» ÁغñÇØ.

ÀÏ¿äÀÏ, 8¿ù 03, 2008

 

Impressions of Git VCS

¾Æ³çÇϼ¼¿ä. People have been asking a lot lately about git. Well, the questions have been about distributed VCS in general (git, mercurial, darcs, etc...) I figured that I would just answer some of that in this blog. Distributed VCS is becoming increasingly popular. So much, that companies owning PVCS, Perforce, and ClearCase are looking into distributed features for their products. Even Atlassian cannot ignore it. The shortcomings of traditional VCS are underscored by Fisheye.

Before getting started, the VCS I have used are SVN, CVS, SCCS, RCS, PVCS, CMVC, ClearCase, and git. Ones I definitely have no experience in are darcs, perforce, bitkeeper, and mercurial.

What is Distributed VCS?

You can get really semantic and technical here, but I'm going to keep it simple. Think of distributed VCS as P2P and your traditional VCS as client-server. That's about as simple as it gets. In fact, the advantages/disadvantages of both parallel that of P2P v. client-server architectures.

Compared to Traditional VCS (Pros and Cons)

I'm not going to separate these into pros/cons because depending on your situation, that can vary. You'll see what I mean.
  • Distributed bandwidth If you are on remote networks, the bandwidth can become distributed so that everyone is not pushing/pulling source from the same server, cluster of servers, or storage. If you're on the same network (an intranet for example,) this is less helpful because then distributed bandwidth is no longer as distributed because it's all shared across the same network.
  • Security This is kind of a double-edged sword. Distributed or not, it can go either way for you.
    • Distributed
      When distributed, you don't have to worry about things like access control or user privileges. Everyone is treated the same and treated as though they are on a remote system. No one needs to have access to the main repository because the repository is actually here and there.

      Sounds like it's actually a good thing. There are still some logistical issues. You now have to work out how to distribute the changes. It's easy to get a repository started. Usually, you have at the very least read access to someone else's repository. You clone it. Ok, now what? What if I want to send someone changes? There are a number of ways to do this, and it's really up to you how you do that. I was recently on a project where some of us had write access to each other's repositories and others did not. That meant that some of us had to do things a little differently. The way it worked out is that we ended up mailing each other patches. We were a small group, so that worked out.

      I have also worked on a project where we distribute patches via RSS feed. I saw a project on google code that actually used SMTP-to-NNTP with git's git-send-email for delivering patches over email in order to publish patches via newsfeed. Mail works pretty well for me. I recently developed a procmail script that allows me to do what is equivalent to SVN's update which grabs all the changes that were not applied, and applies them automatically. Something like that would take a little more work for RSS or newsfeed.
    • Traditional
      With traditional VCS, there are a number of ways to host a repository securely. IMHO, the best way to do it is through SSH. Of course, SSH requires anyone that gets access to the repository to also have access to the machine. From this spans a number of system administration headaches.

      If the users get access through single-signon access control, then your problems are actually probably lessened. It is likely that you can rely on your organization's identity management to handle this. Even if you are using LDAP or Kerberos, your life is muuuuch easier. It's when you decide to do something different that things get complicated. Suppose you decided to use local authn/authz. That means you are on your own for managing access control and user credentials. If you absolutely have to do this, I recommend using SSH and setting up a jail. That way you can be pretty liberal with access and just manage credentials.
  • Storage Normally, who cares about storage? Does a source code repository really take up that much space? No, not really, but it's a point that should be brought up.
    • Distributed
      Of course the amount of storage used is proportionate to the number of users. On the upside though we have a form of virtualized backup system. Since the amount of storage is proportionate to the number of users, the strength of the backup redundancy is also proportionate to the number of users. That's actually pretty cool. Get a VCS and backups all-in-one. If you're already getting backup virtualization like I am, this is really no gain at all.
    • Traditional
      Gets points for more central storage, but then your industrial projects will require redundancy and backups. This is stuff you really get from distributed VCS easily. With traditional VCS, you have to pay out a good deal of money for this.

Beyond Distributed or Traditional

Enough of the Distributed/Traditional blah. Let's talk about git and SVN. How do I think they match up outside of the other stuff?

Git

First, I'd like to declare that I like SVN better than git. There. I said it. My reasons are not because of distributed/traditional VCS though. My reasons are because of the user interface of git. I don't mean GUI either. I mean the command-line interface and the intuitive or non-intuitive nature of the commands. It may just be me because I come from using SVN for the last several years now.

Git also likes to store hashes of each file in its index. How is this a bad thing? Well, it's not. This is actually pretty helpful with managing binary files. I'm not sure if I can say either way (SVN or git) is better in this regard. All VCS really need to index in some way. Handling so many file changes, differences, and history is really difficult without some kind of quick reference map like an index. Indexes are also nice because they aren't mandatory. They can always be rebuilt, so if it gets corrupt, it's not a big deal.

I'm not really sure about this, but it looks like git isn't really efficient about the way it stores files either. Of course, it uses the index and stores them by hash, but I'm not exactly clear on checkins. It looks like it copies the checkin. This is something I consider to be undesirable. At least, for me it is.

All the bad stuff aside, I still really like git. I use git for smaller/personal projects. I know. That's really weird because I'm totally not taking advantage of the distributed nature of git. Well, that's just it. I never would. What I really like about git isn't that it's distributed. What I really like is how easy it is to get setup. Creating a project and creating a repository are the same command. Sharing my code is cake work. It's really perfect for maintaining versions for small, personal projects. This is particularly so if I decide I want to share later on or not. I just don't have to think about things like that and concentrate more on writing the software.

Definitely simplifies branching, merging, and tagging. When I clone a repository, I can branch directly from there. I can easily merge my changes back to the repository I cloned from. This really promotes more branching than anything. It really benefits projects like the linux kernel project. Normally, you wouldn't want to branch very often, but the linux kernel has a branch for just about each committer. This really works well for the linux kernel project.

Subversion

First, let's talk about what I like about SVN. All of my projects and source code exist in one location. I don't have to worry about managing where my projects are because they are all contained within the repository. With git, I tend to create a new repository per project. Probably not the best thing to do, but I can't help it. It's so easy.

It is also extremely secure. It's as secure as I want to make my server that it's hosted on. The way I tend to use SVN is I don't use it remotely. This is where SVN really shines. I use SSH with SVN, but the way I use SSH with SVN is that SVN commands are piped through SSH to svnserve. This is essentially the same as logging into the machine, and running svnserve to handle the version control of the source code. This basically, contains everything within the server. The client/server communication is managed exclusively through SSH. The authz/authn is managed by whatever I'm using on my server. Finally, I use SVN like I'm local to the machine.

SVN, unlike git, likes to store everything as a revision. This actually contrasts a number of other VCS. What it means is that only incremental changes are stored. Even the first version of a file is stored as a revision. I really like this concept. Even though I shouldn't be concerned about storage, I am. When I see that files just get copied, it disturbs me. I feel like I should only committing what changed.

Now, for the downsides. Setting up a repository is a pain. If I make a mistake, I have to start over. For example, if I created a repository using Berkeley DB instead of filesystem, converting is much harder than just starting over. On the upside, I only have to do it once, but then again only doing it once makes it easy to forget for the next time. If it weren't for the convention of branches/ trunk/ tags/, I fear setting up a new project would be difficult as well because of the svnadmin tool.

Conclusion

To put it plainly, SVN is probably best for organizations that have internal projects, and need to manage them with VCS. git is probably better for far spread OSS projects that require melee branching and merging. git is also very good for personal projects, but has management overhead that needs to be considered.

I like git, but I just probably would never find as much use for it as I do SVN.

ű×: , ,





This page is powered by Blogger. Isn't yours?

¿¡ °¡ÀÔ °Ô½Ã¹° [Atom]