Git, (mostly) nontechnically, for beginners
I wrote this up as a proxy for actually teaching a friend a how Git works the other day. I figure it might as well exist on the internet. A lot of git tutorials focus on the “how”, but I find that too few focus on the “why”. This tries to suggest the how while grounding the why. You definitely want to read something on the “how” after this though.
A lot of the time, people working on computers are working on files. Files, over the course of their existence, have a tendency to change. Keeping track of these changes can help you maintain sanity, because sometimes you care about things like:
- What exactly changed?
- When did it change?
- Who changed it?
- Did anything else change at the same time?
For people familiar with things like “Track changes” in Word or the revision history feature of Google Docs, you already know how good this can be. Specially when collaborating with others, but frequently also when you are working alone.
Git is version control software, meaning it’s software that helps you control versions of a collection of files. Such a collection is called a repository, and may be just a single file or millions of files.
Git prefers plain text files (like plain text files or source code), but it can also handle binary files (like images) as long as they’re not huge and don’t change very often. (There are other version control systems, like SVN, which are better at handling binary data, but let’s not get into that.)
When you are collaborating with other people on a common repository, you may want to have a common upstream repository where everybody’s work goes. Often people use services like Gitlab or Github to store these upstream repositories, but technically they can be anywhere as long as everybody who needs to can access them.
To work on the repository, you need to have a local copy of it on your computer, which you normally obtain by cloning the upstream repository.
Then you make your changes to the files as you want, and every now and then you will want to commit them to your local repository for safe keeping. In order to do that, you first stage the files you want to include in this individual commit, and then you commit them.
Now, the fact that you’ve commited them to your local repository doesn’t mean your teammates can access the changes. In order to do that, you need to push them to the upstream repository. However, as a rule, you’ll want to start by doing a pull on the upstream to check if they have made any changes, and merge them into your collection first, before pushing.
A typical workflow
Here we’ll show the workflow using the
git command line, but there are lots of other graphical tools available of those who like such things – the terminology should translate to those reasonably well.
In practice, you might start the day by checking if somebody has pushed anything while you weren’t looking:
Then you can start working, and every now and then you might decide to add your changes, first by staging them (with the
add command) and then by committing them with a message:
After doing a few commits, you might want to share your progress with the team. First though, we should pull from them in case anybody else has been busy:
There are a lot more details, but this should be enough to get you started.
Basic concept recap
To recap, the most basic concepts are:
- Repository - a collection of files
- Clone - a copy of another repository, perhaps somewhere on the Internet
- Upstream repository - a repository that your repository is a clone of
- Pull - the action of fetching changes from an upstream repository
- Push - the action of sending your changes to an upstream repository
- Stage - the act of specifying which changes will be included in a commit
- Commit - the act of adding a specific set of changes to a repository
Other uses for version control systems
While the primary use for a version control system, whether it is Git, SVN, or something else entirely, is to keep track of changes made to a set of files, there are a number of other uses that present themselves once you have that kind of repository set up anyway.
A few of them are:
- Continuous Integration (CI) - the automatic execution of a set of validation tests on anything that gets committed, so as to verify that nothing dumb was done. This can be as simple as forbidding code that has a “@nocommit” message somewhere in it, as a personal reminder that work has yet to be done, or as complex as running a whole suite of tests and alerting you if something fails that should be passing.
- Continuous Delivery (CD) - automatically deploying whatever you have committed to some kind of live service, whether this means updating a website or delivering new features to your users.
- Event hooks - many version control systems support firing certain kinds of events when certain things happen. You can use this to trigger any manner of automation, and can make your repository go from being a collection of text files to being a powerful mechanism for getting your job done.
An example of CI is that I locally enforce the “@nocommit” rule on my code, and for larger projects in the past I have had test servers that pull any new code and run tests on it, and message me about failures. I’m not doing this on any current projects, although I will be setting this up soon for some things.
An example of CD is that my website lives in a Git repository, and on my webserver there is a script that fetches and publishes new versions whenever they become available.
This is not the end
This has not been a technical tutorial, but rather a guided whetting of the appetite. There are lots of good technical tutorials out there. The important thing to understand is that while 90% of the time you’re using a version control system, you’ll be doing the same things again and again – and that’s a good thing. The other 10% of the time is a learning opportunity. Version control systems may sound like a simple thing, but this is a pretty deep rabbit hole to go down if you want to.