Software and Installation
Software and Installation#
Most of the software you will need is available through Anaconda, a distribution of scientific software for Python (and R). It is the easiest way to install Python with the packages we need. The one additional piece of software we will need is Git.
We are using Python 3.8 (or newer — Python 3.9 or 3.10 is fine) in this class. Older versions of Python may work, but I will be testing my example code and instructions with Python 3.8.
If you want to use the department’s computer lab for your work, see the Onyx setup instructions as well as the instructions for remotely using Onyx.
Our primary software is:
The main Anaconda distribution includes all software we will be using
plotnine Python package and git. If you are using
your own computer, and you don’t know what else to do, install
Anaconda and you will be good to go.
If you are using the department’s Onyx computers to complete your assignment, you will need:
An SSH client (MobaXterm for Windows,
sshon Mac or Linux)
The Boise State VPN for convenient access to Onyx nodes
Install Anaconda Python in your Onyx home directory, following the Linux instructions.
For building more advanced workflows, there are many text editors you can use. I use Visual Studio Code, which has very good remote editing support and also directly supports Jupyter notebooks (although I do not use this feature much).
The rest of this document walks through some details.
Anaconda (and Miniconda, described at the end of this document) is a scientific software distribution. That is, it is a collection of software such as Python, Jupyter, and Pandas for supporting scientific computation. It is built around the Conda package manager, which it uses to actually install the software.
Conda installs precompiled binary versions of software, including but not limited to Python packages. The versions of Scientific Python software distributed through Conda are compiled against more optimized versions of the core math libraries than the packages available through standard Python channels (the Python Package Index).
When you are looking online for instructions for Python software, it will usually tell you to
install it with
pip. However, since we are using Conda, it works better to install the Conda
version of a package (with
conda) if possible. It will usually, but not always, have the same name
as the Python package. You can search for Conda packages on anaconda.org.
Installing Anaconda Python#
If you don’t know what to do, and are working on your own computer, do this.
To install Anaconda on your computer:
Download the appropriate installer for your platform from Anaconda Python.
If your computer has a 64-bit operating system (most do, except for Windows on ARM platforms like the Surface X), download the 64-bit version of Anaconda. Make sure you get the Python 3 verision (it is the default).
Run the installer. On macOS and Windows, the installer is a normal installer that you can run by double-clicking. On Linux it is a shell script, which you can run with the following command in your terminal:
Replace the file name with the name of the installer file you downloaded.
Install the additional packages we need by running the following at terminal (normal terminal on macOS and Linux, an “Anaconda Prompt” on Windows):
conda install -c conda-forge plotnine
Saving Space with Miniconda#
We don’t need all of Anaconda. If you want to save disk space, Miniconda is a much smaller distribution that just contains Python and the Conda package manager, so you can install the packages you need yourself.
Download the installer for your platform and run it.
Install the base packages we will need:
conda install -c conda-forge pandas scipy scikit-learn notebook ipython \ seaborn statsmodels plotnine
Installing Additional Packages#
If you need to install additional packages, I recommend using
conda to install
the Conda packages, when they are available. The name usually, but not always,
matches the name in PyPI (used by
pip). If a package is available in Conda,
any binary components are pre-compiled (so you don’t need a working C compiler)
and are usually more optimized than the precompiled wheels available via
You can search Conda packages at anaconda.org; the
main channel contains the packages available through the default Anaconda or
Miniconda installation, and the
conda-forge channel is a community-maintained
repository of packages that has most of what’s in
main along with many
packages not yet in the main repository.
plotnine is only (currently)
conda-forge, but packages can usually be mixed and matched, so
you can install it in a default environment.
It is fine to use
pip to install packages that are not available in Conda. It
also works to use it to install packages that are, but I do not recommend this,
particularly for core compute packages such as
scikit-learn — the versions in Conda are more optimized.
I personally just use the
conda-forge channel for all of my Anaconda environments, but the
primary installers default to main.
seedbank in many of my examples for seeding the random number generator. Seedbank is not currently available through the main Anaconda channel, so you will need
to install it with
pip install seedbank
Do this after installing your other packages with
conda, so seedbank doesn’t try to pull in a non-Conda NumPy.
Installing Git and Command-Line Tools#
We will be using Git later in the semester.
To install Git, the instructions differ based on your platform:
On Windows, download and run the installer.
On macOS, install Xcode from the App Store.
On Linux, install
gitfrom your package manager.
Installing on Onyx#
Unlike a lot of other software, Anaconda is designed to be installed separately by each user.
To setup on Anaconda on Onyx, you need to download the installer and run it on Onyx (or an Onyx node). Once you have set it up on any Onyx node, it will be available across Onyx, no matter which node you log in to. Due to Onyx’s network file system performance, I recommend using Miniconda.
You can do this with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh /bin/bash Miniconda3-latest-Linux-x86_64.sh rm Miniconda3*.sh
This will install Miniconda (which by default will also configure your shell to activate the Miniconda environment). Log out and log back in to Onyx and Conda will be active. Install the base packages:
conda install pandas scipy scikit-learn notebook ipython \ seaborn statsmodels
Onyx already has Git installed.
On Linux and Mac, the installer will, by default, modify your Bash startup scripts to activate Anaconda.
On Windows, it will install a separate ‘Anaconda Prompt’ and ‘Anaconda PowerShell’ that you can use from the start menu.