Conda

Conda is a package manager and installation tool available on Mac, Windows and Linux. It makes it easy to find packages (and version of packages) you want to install and installs them, in addition to all their dependencies, for you. It can also help you keep software tools up to date. Conda is also very useful for managing environments. A common issue in bioinformatics is the need to switch between python 2 and python 3. This can get really cumbersome - a good alternative is to have a separate environment for a specific version of python and switch into that when you need to work with that tool.

Installing Conda

First, download the installer from https://conda.io/miniconda.html. The table on this page contains the installers for Mac, Windows, and Linux. These instructions will be for a Linux OS.

cd
curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Then follow the prompts on the screen. The default settings should be good for most installations, but we will also have conda add itself to our PATH. The PATH is a list of directories that your computer uses to search for software. To do this, you will:

  1. press enter to go to the license agreement screen
  2. press q when you finish reading the license
  3. type yes and press enter to agree to the license
  4. press enter to agree to the default installation directory
  5. type yes and press enter to automatically update your PATH.

After conda is installed, it will update a hidden file in your home directory called .bashrc (possibly .bash_profile). This file contains the definition of the PATH. To update your current environment with the new value of the PATH variable, conda recommends logging out and logging back in. However, you can also run the command:

. .bashrc

which will add the updated variables in bashrc to your environment.

Test the installation

Now if you enter the command conda, you should get a help message about how to run conda.

Now you may want to try to install a piece of software like bwa. Try running

conda install bwa

At this point in the process, however, we get an error message:

Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - bwa

Current channels:

  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

Channels and Bioconda

This error is telling us that conda cannot find a package called bwa. That’s because conda maintains a list of “channels” where it looks for software to install, and BWA lives in a channel called bioconda that isn’t included by default.

We can install bwa by specifying the channel name with the -c flag:

conda install -c bioconda bwa

However, it is useful to install the channel so we don’t have to remember the names of channels for all the software we want to install.

To make Bioconda (and some other common ones) a default channel, run the following commands:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

The order here is important as it specifies the priority of the channels. You should now be able to install bioconda packages without needing to specify the bioconda channel with -c

For example:

conda install bwa

should now run and properly install BWA.

:::warning Challenge:

R and R utility software like Rstudio and Rstudio server are in a channel called r. How would you add the r channel to your conda configuration? How would you install a package in this channel without adding it to your configuration? :::

Searching for conda packages

You can search for conda packages with the conda search command.

For example, you can run the following to search for samtools and see the different versions available and which channel the package is in:

conda search samtools

You can install a specific version of a package by typing an equal sign and the version number after the package name. For example, to install samtools version 1.2, rather than the most recent version (1.8), run

conda install samtools=1.2

Why might you want to install a specific version of a piece of software instead of the most recent one? You might be trying to access a function that has been deprecated in current versions (it’s really good practice to read the documentation when following a workflow like this - a lot of times functions are deprecated for very good reasons.). You may also be following an older workflow or analysis and trying to reproduce it. This is really common in bioinformatics as we often look to previously performed analyses that can work for a specific type of data and replicate them with a current dataset. There are endless more specific examples that I won’t get into right now, but suffice to say that accessing prior versions of software is a very handy skill to have!