Python and Pandas on Jupyter

Maybe it should be in Jupyter??? In any case, I’ve been studying using python in jupyter notebooks and it’s some pretty radical stuff. Using numpy and %matplotlib inline can yield some incredible results. This is a list of the commonly used features and samples thereof.

Loading Dataset

import pandas as pd

df18 = pd.read_csv('all_alpha_18.csv')
df18.head()

Consise summary of columns and rows – pandas.DataFrame.info

df18.info()
duplicate = df08[df08.duplicated()]
print("Duplicate Rows :")
duplicate

Count duplicate lines – pandas.DataFrame.duplicated

print(df08.duplicated().sum())
null_data = df08[df08.isnull().any(axis=1)]
print(null_data)

Column data types – pandas.DataFrame.dtypes

df08.dtypes

Distinct values in columns – pandas.DataFrame.unique

SmartWayColCnt08 = df08['SmartWay'].unique()<br>SmartWayColCnt08.size

Dropping columns – pandas.DataFrame.drop

df_08.drop(['Stnd', 'Underhood ID', 'FE Calc Appr', 'Unadj Cmb MPG'], axis=1, inplace=True)
df_18.drop(['Stnd', 'Stnd Description', 'Underhood ID', 'Comb CO2'], axis=1, inplace=True)

Rename columns – pandas.DataFrame.rename

df_08.rename(columns={'Sales Area': 'Cert Region'})

Replace spaces with underscores, lowercase labels

df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)

Confirm column lables are identical

df_08.columns == df_18.columns

Comments are closed.