git
and operating it with command line arguments;Github
;Note that the purpose of this notebook is to serve as a cheatsheet/reference for future use. Though there are better one out there than what I've outlined here. I've listed a few below for your use.
The command line offers an easy way in which to navigate the computer. From it, we can:
R
or python
The "command line" line can differ, however, given what machine you're running.
If you're on a Mac a unix command line comes installed on your machine. This is your terminal
, which is an application available on all macs.
If you're on a Windows machine, you'll need to activate your Ubuntu terminal by turning on the developer mode on your computer. Instructions on how to do that can be found here. (Note that there are also other alternatives, such as putty)
The command line offers more control when interacting with your machine. Moreover, we'll need to leverage the command line when using most cloud computing connections. It takes some getting used to, but well worth it once you get the hang of it.
The point of it (w/r/t our purposes) is that it'll help us:
The following outlines a few common commands that will be useful as you move forward. Disclaimer: some of these commands may differ given your operating system, but it's only quick Google search to find out how things are done on your machine.
The true power of Git shines as a tool for project collaboration and coordination. Often we want to make local changes to a file and then push those changes to the online remote. In order to push our files to the remote, we'll need to merge our version of the repository with the current state of the repository. If none of the files we changed were changed previously by others, then a merging of files will occur smoothly and automatically.
However, sometimes there are conflicts between branches or remote versions of a repository. Say you changed the some part of a file by deleting a function and a colleague changed the same file by modifying the function. This would be an example of a conflict. Git does not know which version is the correct one, so it will mark the file as having a conflict using a special delimiter.
<<<<<<< HEAD
def my_function(x):
a = []
for i in x:
a.append(i)
return a
=======
def my_function(x):
for i in x:
print(i)
>>>>>>> new-branch
It's up to you to manually decide the appropriate path of resolution. Above we have an example where one user changed the internal layout of a function. We'll now have both versions of that section of code and will need to manually edit which version we wish to keep (e.g. the upper or bottom part). The point is that Git is very careful to force you to check when and where discrepancies exist and resolve them yourself.
When updating a local repository, we need to pull
or fetch
changes made to the remote. Note that fetch
will download the available data without merging it into your current workflow, whereas pull
will download and then integrate the versions.
Sometimes we do not want to track certain file types.
For example, Github has an upload rate of 100mb, meaning that we wouldn't want to push really big data sources up to the repository. We might want to avoid uploading any data files to our Github repository for this reason. To do this, we may want to ignore specific file types, such as .csv
(comma separated values) or .Rdata
(an R data file type). To do this, we need to make a special file that Git reads to tell it which files not to track.
We can exclude these files by adding a .gitignore
file to our project folder.
*.ipynb_checkpoints
*.Rdata
*.csv
Keep in mind that there are graphical ways to probe a repositories record:
gitk
: initiated in the commandline.Scott and Ben Straub. (2014). ‘Pro Git’. Ed. 2: https://git- scm.com/book/en/v2