Everyone loves GitHub. No one likes to be tricked. That’s why you should always use their convenient fork button to maintain your own copy of a project. This sounds obvious but you’d be surprised how many people are maintaining duplicate copies of popular repositories in non-forked form. Sometimes this is a simple mistake or misunderstanding of how Git and GitHub actually work but unfortunately there have been so many cases of abuse on GitHub involving “code theft” that maintaining your own copy of a repo any other way will be considered abuse by default.
I recently wrote about a case where one of my open source projects was being misrepresented as someone else’s work. This got me thinking about forking and what the MIT license really means. Today I want to get into more detail about why forking is so important and what the MIT license allows you to do with someone else’s work.
The first thing I want to do is make sure we all understand that Git and GitHub are two completely different things. Git has become synonymous with GitHub in the same way people think that “Rollerblade” and “inline skates” are the same thing.
Git is an open source distributed version control system. It is a program that can be installed on any OS (even Windows now, I think). Git, the VCS, allows groups of people to work on the same codebase and merge their changes into a single unified project by pushing and pulling to and from remote repositories. The whole point of Git is to allow development teams to track the history of a project through a series of commits (or, in simple terms, “checkpoints”). These commits, composed of multiple files grouped together, described as one checkpoint, allow you to go back in a project’s history and revert changes to the codebase or view what a project looked like at a certain point in time. That’s Git in a nutshell. It’s a tool for tracking the history of a codebase and collaborating with multiple people on one shared codebase.
GitHub on the other hand is like a social network for code. GitHub allows you to host Git repositories publicly (for free) or privately (for a monthly fee). GitHub hosts remote repositories and makes it easy for anyone to share their code online. GitHub has been a game changer for the open source community. Almost every popular open source project is hosted on GitHub these days. GitHub is basically a UI for Git repos with social features added in on top of existing Git functionality. Rather than exploring a remote repository using the Git command line tool, GitHub lets you browse Git repos visually.
Now that we know what the difference between Git and GitHub is, let’s talk about Forking and forking. Of the additional features built on top of Git by GitHub, the idea of Forking is, in my opinion, the most important.
“forking” vs. “Forking”
Forking is a concept more than a feature when using Git on its own. When combined with GitHub, Forking becomes a social feature. So in the same way it’s important to understand the difference between Git and GitHub we need to explore the differences between “forking” a project and “Forking” a project.
forking (lower-case “F” forking – the concept)
The idea of forking is that you clone a public Git repository and host it yourself for the purpose of maintaining a slightly divergent version of a project than the original. This is done by cloning a Git repository, updating its remote address to point to a server or repository you control, and from then on you maintain a different version of the code or use it as the starting point for your own software built on top of the original codebase.
Forking (capital “F” forking – the GitHub way)
When you fork a project on GitHub there’s this implied social contract that basically states “I am keeping a copy of someone else’s code for the purpose of either customizing it to my own needs or contributing to the original in the future”. GitHub has a nice little “Fork” button that creates a copy of another user’s public Git repository on your own public profile. When you visit your Fork of the repository on your GitHub profile page you’ll see a little indicator that the project is a fork and it links to the original project. That’s important.
Why you should always capital “F” Fork
As we’ve already established, Forking on GitHub is a social thing. The idea is that you’re going to maintain a parallel version of the repository so that you can contribute code back to the original project using GitHub’s “Pull Request” feature.
Maybe you want to fix a bug or add a feature. Great! Fork and make a Pull Request. Maybe you want to use the project but maintain slightly different functionality for your own personal use. Good for you! Go for it! Just be sure you Fork the repository.
What happens when you don’t Fork a GitHub repo
Remember that social contract I talked about earlier? If you don’t use that Fork button then you’re breaking the contract. In a scenario where you want your own copy of a project’s code to make it better but don’t want to contribute your changes back to the original project, you need to make it clear who the original author is.
Why is this important? Suppose you’re looking at my open source insurance quote engine and you think you can do a better job. Suppose you want to create a competing version of the software using the original as a base. I may be slightly insulted but there’s nothing wrong with that. But now let’s suppose that instead of properly Forking my repository, you simply download or clone the project and publish it to your GitHub profile under the same name. At this point how could someone tell which project is the original one and which is the fork? Let’s take Wordpress as an example. What happens when someone copies the Wordpress repository and changes nothing about it? The issue now becomes one of user confusion. Unless users pay real close attention to detail they may find themselves using some other person’s “fork” of Wordpress that’s three versions behind, buggy, and broken.
Properly forking a project puts a little icon and a link to the original project on the forked project’s GitHub page. It lets the world know that if they’re looking for the original copy of a project they should follow the link. Without that icon and link to the original users can be misled into using code that isn’t what they were looking for.
But it’s open source!
Yes, open source licensing lets you use the code in a variety of ways but it doesn’t give you the right to misrepresent it as your own. Simply forgetting to properly fork is a forgivable sin but removing the original author’s name from the license file and not giving credit to the original author turns a simple mistake into stealing.
The MIT License
Open source doesn’t mean you get to take credit for someone else’s work. It doesn’t mean do whatever the fuck you want to unless it’s specifically licensed that way. There are a number of open source licenses to choose from but I’m going to focus on the MIT license for now. I use the MIT license for almost all of my work.
When my code was stolen the person who did it had changed the Readme file to make it sound like it was their own work and replaced my name in the license file with their name. The MIT license doesn’t allow for that. Here’s what it really means.
What you can do with the code
- Use it in commercial work. This includes using it on its own or as part of another project that’s commercial in nature. In short, you can charge money for work that uses the MIT license.
- You can distribute the software. Again, this pertains to the original software or derivatives of it.
- You can modify the original source code however you like
- You can keep it for private use. This means that you can include it as part of a larger project and not distribute the original source code. This would mainly apply to compiled software as most web apps don’t get compiled so it would be pretty hard to hide the source code in your privately maintained copy.
Conditions of use
The MIT license and most open source licenses in general aren’t free-for-alls. There are conditions to the use of MIT licensed software.
The one and only condition of MIT licensed software is that you must include a copy of the license and copyright notice if you publish the code. This is the big one in my opinion. Break this condition and you’re stealing.
Along with the conditions and freedoms there are limitations to open source software. With MIT licensed software that limitation is that you cannot hold the original author liable for any damages and the code includes no warranties or guarantees. This sounds to me like you could write a virus, distribute it, use it, modify it, and release it and the original author isn’t liable because they provided the source code under the MIT license. I’m not a lawyer though so maybe I’m wrong about distributing viruses.
The open source community is awesome. People build such cool stuff and there’s a ton of it that I, like many others, want to contribute to or use in our own projects or just keep a fork around in case the original gets taken down one day. Open source software lets us do just that. But remember, there’s a fine line between using open source code and stealing it. The next time you find a project you want to fix, backup, or use, keep it legit and capital-F Fork it.