A note on Pandas library for Python

Pandas is a super-useful python library. It is one of the most widely used tools for data science and data analysis. Once you have your data imported in Pandas, you can do all sorts of data manipulation and data wrangling stuff.

The Data Source

To simplify things, I have manually typed the top 5 wine producing countries in 2014 on a Google Sheet from the Wikipedia link below:

source: https://en.wikipedia.org/wiki/List_of_wine-producing_regions

This is how the data looks on my Google Sheet:


To make the Google Sheet available for anyone to use, I have published the Google Sheet as a csv to the web with the following url:


The Python Code to Read the CSV from URL

To be able to read the data from the published csv file, all you need to do is import the pandas library in python, assign the csv url to a string variable and use this string variable as a parameter to the read_csv() method of the Pandas library.

Finally, you can verify if the data has been successfully loaded using the head() method of the Pandas library.

Here's the actual code snippet used:

import pandas as pd
wine_csv_url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRXEsWuHw6pj3zWvWJKSqva2PSsaIEVQVXgILSFxQpcQaPejmwKk3AM0bEXNIyEPyCMV7kFPSIc6chm/pub?gid=0&single=true&output=csv'
wine_data = pd.read_csv(wine_csv_url)

The Output of the Python Code

I use Jupyter notebook for data analysis using python and pandas. More on Jupyter notebook later.

This is how the code and the output looks like on my Jupyter notebook:


Hope this post helped you get some sort of an idea about how Pandas library for Python is used inside a Jupyter notebook to read and display data from a csv file that is published on the web.

In later posts, I'll write about how we can use Pandas to perform tasks that Data Scientists usually perform in their day to day work.