Git and GitHub

In CS121 and CS122, you became familiar with using Subversion (SVN) to handle revision control (or version control) of your projects and assignments. In this class, we will be using Git instead, which has become the most popular revision control system in the open-source community in recent years. The goals of this lab are to:

Resources

Git

In the olden days (~10 years ago), most open-source projects used primarily CVS or SVN, the latter being a newer replacement of the former. These systems follow a client-server model, where you commit and check out the source code to and from a single server. In our case, that server was PhoenixForge, hosted by the university.

In the last 5-10 years, there has been a move to using distributed revision control systems, where there is no technical distinction between client and server. In this paradigm, each computer has its own self-contained copy of the version history, and updates can be shared between any two computers. This offers a lot more flexibility, that you may or may not experience during this course. I say that, since it can still be used in a client-server manner, which is mainly how we will use it. Luckily, these newer systems also offer many other improvements over their predecessors.

Three of the most popular such systems today emerged in 2005 and are called Git, Mercurial and Bazaar. We will be using Git, because of its tight integration with a great place called GitHub.

GitHub

GitHub (https://github.com) is a website that offers free hosting of Git repositories for any open-source project, with plenty of additional features that grease the wheels of collaborative work.

For instance, GitHub makes it easy to fork (clone) someone else's open-source project, which means you create your own copy of it, where you have free rein. In your cloned version, you can fix bugs, add features or simply experiment in any way that you want. Anyone can use your cloned version, but if you really think your changes would be useful to everyone, then you can open a pull request, to have those changes merged back into the main repository. GitHub makes it easy to manage your repositories, discuss changes, track bugs/issues, create wikis and visualize statistics for your project, and follow the progress of other developers and their projects.

Git basics

Before we start, Git would like to know your name and e-mail, which will help identify you in the version history. It is important in this step to specify the same e-mail as when signing up to GitHub later in this lab, so please decide which e-mail address you want to use now. To set this up, please run the following:

$ git config --global user.name "Your Name Here"
$ git config --global user.email "your_email@example.com"

These commands will add this information to the file ~/.gitconfig (at least on Linux and OS X). If you start interacting with your repo from a new computer, it is important to repeat this step, or simply copy the .gitconfig file.

Now, let us look at some of the basics of Git. Since Git is decentralized, we do not need a server to get started. To create a new repository, simply do:

$ mkdir new-project
$ cd new-project
$ git init

That's it, we now have an empty local repository. Try:

$ git status

It will say nothing to commit, so let us add a file. Create a text file, add some random text and save it as file1.txt. If you run git status again, you will see that it lists file1.txt under Untracked files. Just like in SVN, run:

$ git add file1.txt

If you run git status yet again, you will see that the file is now listed under Changes to be committed. Now, go ahead and commit:

$ git commit -m "Added file1.txt"

As with SVN, if you omit -m, it will open up a text editor, which allows you to write longer multi-line commit messages. You can specify which editor you want to use by changing the environment variable GIT_EDITOR.

Now, modify file1.txt and try running commit again. It will say no changes added to commit, even though we modified the file. This is because Git works a bit different from SVN and requires you to add all modified files for staging, meaning that they will be acknowledged by a commit. This can be done by running git add file1.txt again, but instead we shall use the shorthand -a:

$ git commit -a -m "Fixed typo in file1.txt"

This will automatically stage all tracked files before committing, giving a similar workflow as svn commit, with one very important distinction: git commit only updates the local repository, whereby svn commit would automatically send the updates to the server. In this example, the local repository is all we have, so let us defer further discussion of this until later.

To remove a file:

$ git rm file1.txt

If you have modified the file first without committing, Git will complain much like Subversion would. To circumvent this and remove the file anyway, add the flag -f as instructions will suggest.

Instead of having only a local repository, we will let GitHub host our projects and use it in a client-server manner.

GitHub account

Everyone in the class will need a GitHub account. If you already have one, you can skip this step and simply use your existing account.

Go to https://github.com/ and follow the instructions of how to create an account. This is straightforward. Remember to use the same e-mail as you specified earlier in the lab.

Create a repository

Since we will be using GitHub for hosting, instead of running git init on a directory, we will create our Git repository through GitHub's web interface. Once logged in, you should see three icons in the top right corner, next to your username. Click on the left-most icon, which should say "Create a new repo". Pick a name for your project and write a short description. Since you might not have an idea of what your project will be, you can pick a temporary name and write a placeholder descriptor; both can be changed later on.

GitHub only offers free hosting for Public (open-source) projects, and if you select Private you will see that payment options appear. Select Public and also tick the box Initialize this repository with a README.

Finally, GitHub can add an appropriate .gitignore for your project if you know what programming language you will be using. The .gitignore file will tell Git to hide certain files from git status that should not be commited, such as .class files in Java or log files. If you do not know the language yet, you can leave this option as is and we can help set things up later if you want.

Once the project is created, we need to create a local clone on your lab computer.

Cloning from GitHub

On your repo's front page, you will see something like the following:

screen-clone.png

GitHub gives you a choice of three different protocols for handling the transfers between GitHub and your local computer:

  • Git read-only This gives you a read-only copy of the repo, for instance if you want to install the latest version of someone else's repository.
  • HTTP/SSH If you have the right permissions, this will allow you to push changes back to GitHub.

Select HTTP and copy the URL to clipboard. Now, go to an appropriate folder in your home directory where you want to put your local copy. Have in mind that when we clone (check out) a repository, Git will create a folder with the repo's name for us, so there is no need to do that separately. Now, enter git clone followed by pasting from your clipboard, e.g.:

$ git clone https://github.com/<username>/<reponame>.git

Now, you are all set up to interact with the repository as previously described. Please add a file similar to file1.txt. This can be a temporary file, in which case you should remove it later.

When the file has been added and committed, it will not automatically appear on GitHub, as it would on PhoenixForge for SVN. Since Git is distributed, committing only means accepting the changes to the local repository. The next step is to push these changes to GitHub. This is done by:

$ git push origin master

The parameter origin tells us to push the change to the place from where we cloned it, and master refers to the branch.

Note

By default, a repo has a single branch and it is called master. Branches are used to keep several parallel versions of your source code. You can for instance use a branch for each new feature, and then merge them into master once they are done. A more elaborate example of using branches can be seen in the blog post A successful Git branching model. Using branches in this way requires a lot more Git fluency, so you will probably just stick with one master branch for now.

The fact that we can work on our project and commit several times without the need to be connected to our SVN server is one of the benefits of distributed revision control.

Finally, to pull changes from GitHub, essentially performing the equivalent of svn up:

$ git pull origin master

If the remote repository (on GitHub) has changed since you last pulled from it, you will need to run git pull before you can run git push. At this point, there might be a conflict between your changes and someone else's that Git can't merge automatically. In this case, it will include both versions in the file, looking something like:

This is a text file.
<<<<<<< HEAD:file1.txt
This is version 1.
=======
This is version 2.
>>>>>>>
Here is some more text.

To resolve this conflict, all you have to do is edit the file in any way that you want as long as you remove the lines with <<<<<<<, ======= and >>>>>>>. When you are done, call git commit -a as usual. If you run into problems, please post a question on Piazza.

Committing etiquette

Remember to commit often with descriptive commit messages (remember, these will be public for anyone to see now). Try to avoid committing code that doesn't run into the master branch. This will make your partner and anyone who wants to try your project frustrated when it doesn't work. If it will take a while for your project to be running at all, you should relax this suggestion and still commit often. Please avoid committing files that does not belong in the repository (binaries, logs, etc.), and remove them if you accidentally add them at some point.

Update your README.md file as the project evolves. Notice that it will be visible on the front page of your repo, and should offer a good description and introduction of your project.

Extra

These are not essential parts of the lab and here mostly for future reference.

Collaboration

If you work in a group with someone, then let one user create the repository as explained above. Once that is done, go to the main page of your repository and click Settings in the menu bar. You should see Collaborators on a menu to the left. Add your partner using their GitHub username.

Password caching

If you are getting tired of entering your password every time you push your changes to GitHub, you can locally cache your GitHub password as explain in the following link:

Make sure the appropriate help section for your computer system is selected. Scroll down to Password caching and follow the instructions.