#100Days of Data Science/Machine Learning (Introduction to Pandas — Series) — Day 1

Check the updated DevOps Course.

Course Registration link:

Course Link:

YouTube link:

Pandas is an open source library built on top of NumPy. It allows for fast analysis and data cleaning and preparation. Lot of people called Pandas as Python version of Excel or R. Pandas can work with data from a wide variety of sources

Installing Pandas

# In case of anacondas
conda install pandas
OR pip install pandas

Panda Series

A series is very similar to Numpy array(Actually it’s built on top of Numpy Array Object) but what differentiate Panda series from Numpy array is that a series can have axis label i.e it can be indexed by a label.

>>> import numpy as np>>> import pandas as pd>>> my_labels = ['a','b','c']>>> my_data = [1,2,3]>>> pd.Series(my_data)0    11    22    3dtype: int64# Key to panda series is that you can actually specify what you want that index to be>>> pd.Series(my_data,my_labels)a    1b    2c    3dtype: int64

Other ways to create series is that we can pass numpy array

>>> my_data = [1,2,3]>>> my_arr = np.array(my_data)>>> pd.Series(my_arr)0 11 22 3dtype: int64

Another cool thing we can do is we can pass dictionary and what panda is going to do is automatically take the keys of that dictionary and set it as an index and then set the value of that key to the corresponding data points

>>> my_dict = {‘a’:1,’b’:2,’c’:3}
>>> pd.Series(my_dict)
a 1b 2c 3dtype: int64

Now key to using a series is understanding its index in Pandas make use of these index names or numbers by allowing for very fast lookups of information and it works just like a hash table or a dictionary.

>>> my_series1 = pd.Series([1,2,3,4],[‘a’,’b’,’c’,’d’])>>> my_series2 = pd.Series([1,2,5,4],[‘a’,’b’,’e’,’d’])>>> my_series1a 1b 2c 3d 4dtype: int64>>> my_series2a 1b 2e 5d 4dtype: int64# dtype is int64 because all my data points are integer

Now to grab information from a series it’s similar to grabbing information out of Python Dictionary

>>> my_series1[‘a’]1

What will happen if I try to add two series?

>>> my_series1 + my_series2a 2.0b 4.0c NaNd 8.0e NaNdtype: float64

So what’s going on here, it’s trying to match up the operation based on the index, so wherever it find similar index it will add that but in the case where it doesn’t find a match(eg: c and e ) it put null there.

NOTE: One thing to note that here you are seeing output as a float(i.e integer converted to float) and the reason it’s converted to float so that you don’t accidentally loose any information based on some weird division.

AWS Community Builder, Ex-Redhat, Author, Blogger, YouTuber, RHCA, RHCDS, RHCE, Docker Certified,4XAWS, CCNA, MCP, Certified Jenkins, Terraform Certified, 1XGCP