Version Control: GitHub
Abstract:
What is Version Control?
Version control is a system that helps you track changes in your code or data over time. It lets you:
- Save different versions of your project.
- Collaborate with others without overwriting each other’s work.
- Revert to earlier states when something breaks.
- Understand the evolution of your project through clear history.
The most widely used version control system today is Git . And the most popular platform for hosting Git repositories is GitHub .
What is Git?
Git is a free and open-source version control system created by Linus Torvalds (the creator of Linux). Git runs locally on your machine and helps manage your project's history.
Git works by recording snapshots of your files (called commits), allowing you to move between different versions or branches of your code.
What is GitHub?
GitHub is a cloud-based platform built around Git. It allows you to:
- Store your Git repositories online
- Collaborate with other developers
- Use powerful tools like pull requests, issues, wikis, and CI/CD workflows
You can think of Git as the engine , and GitHub as the car that helps you drive and manage it with others on the road.
Why Data Scientists Should Learn Git & GitHub
- Collaborate with teammates (engineers, analysts, or fellow data scientists)
- Keep track of experiments and model versions
- Share notebooks or datasets
- Make your work reproducible
- Contribute to open-source data science projects
Core Concepts and Terminology
Term | Description |
---|---|
Repository (repo) | A folder or project tracked by Git |
Commit | A snapshot of your changes |
Branch | A parallel version of your codebase |
Merge | Combining changes from one branch into another |
Clone | Downloading a GitHub repository to your local machine |
Push/Pull | Sending or receiving changes between local and remote |
Fork | Creating a copy of someone else's repository (on GitHub) |
Getting Started with Git and GitHub
Step 1: Install Git
-
macOS:
brew install git
-
Linux:
sudo apt install git
- Windows: Download from https://git-scm.com
Verify installation:
git --version
Step 2: Set Up Git
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
Step 3: Create a GitHub Account
Go to https://github.com and sign up.
Step 4: Create a New Repository on GitHub
-
Click the
+
icon in the top-right corner →New repository
-
Give it a name (e.g.,
my-first-project
) - Choose public or private
- Add a README (optional but recommended)
- Click Create repository
Step 5: Connect Local Project to GitHub
Option A: Clone an Existing Repository
git clone https://github.com/username/repo-name.git
cd repo-name
Option B: Create a Local Project and Push to GitHub
mkdir my-project
cd my-project
git init
touch README.md
git add README.md
git commit -m "Initial commit"
git remote add origin https://github.com/username/repo-name.git
git push -u origin main
Common Git Commands You’ll Use
Command | Purpose |
---|---|
git init
|
Start a new Git repository |
git status
|
Show current changes |
git add filename
|
Stage a file for commit |
git commit -m "message"
|
Commit staged changes |
git log
|
Show commit history |
git push
|
Send changes to GitHub |
git pull
|
Get latest changes from GitHub |
git branch
|
Show or create branches |
git checkout branch-name
|
Switch branches |
git merge branch-name
|
Merge another branch into current one |
Visualizing the Workflow
[Your Code] → git add → git commit → git push → [GitHub Repository]
When working with others:
[GitHub Repo] ← git pull ← [Teammate’s Code]
Working with Branches
Branching is key for working on features or experiments without disturbing the main project.
git checkout -b feature-model-tuning
# Work and commit on this branch
git checkout main
git merge feature-model-tuning
Best Practices for GitHub in Data Science
- Commit often with meaningful messages
-
Use
.gitignore
to avoid tracking large or unnecessary files (e.g.,.DS_Store
,.ipynb_checkpoints
,data/
) - Don’t push raw data or secrets (like API keys) to GitHub
- Use branches for experiments and merge only when stable
- Include a README to explain your project
Using GitHub with Jupyter Notebooks
GitHub can render
.ipynb
files directly, allowing you and collaborators to view notebooks online without running them locally.
To share notebooks:
- Push them to a GitHub repo
- Share the link
Learn More:
https://docs.github.com/en/get-started/using-git/about-git
https://www.youtube.com/watch?v=8JJ101D3knE
https://www.youtube.com/watch?v=mJ-qvsxPHpY
https://education.github.com/git-cheat-sheet-education.pdf
Leave a Comment