Getting started with Python for data analysis.

Dataframes are 2-dimensional labeled data structures with columns of potentially different types. You can think of it like a spreadsheet. Numpy and Pandas are two very powerful and commonly used libraries used for datasets in bioinformatics. If you don’t have these installed, you can get them as part of the SciPy bundle. You can also use conda to install them individually.

Numpy

Numpy is a library for arrays.

Pandas

Pandas is a python library for data frames. Understanding the basics of numpy will be helpful before getting into pandas.

Tutorial challenge from an introduction to biocomputing class - Prompt and corresponding code

To follow along you can download the dataset by pasting this code into your command line: curl -L https://osf.io/kges5/download -o wages.csv