Maybe it should be in Jupyter??? In any case, I’ve been studying using python in jupyter notebooks and it’s some pretty radical stuff. Using numpy and %matplotlib inline can yield some incredible results. This is a list of the commonly used features and samples thereof.
Loading Dataset
import pandas as pd
df18 = pd.read_csv('all_alpha_18.csv')
df18.head()
Consise summary of columns and rows – pandas.DataFrame.info
df18.info()
Print duplicate lines – pandas.DataFrame.duplicated
duplicate = df08[df08.duplicated()]
print("Duplicate Rows :")
duplicate
Count duplicate lines – pandas.DataFrame.duplicated
print(df08.duplicated().sum())
Print/Count lines missing data – pandas.DataFrame.isnull
null_data = df08[df08.isnull().any(axis=1)]
print(null_data)
Column data types – pandas.DataFrame.dtypes
df08.dtypes
Distinct values in columns – pandas.DataFrame.unique
SmartWayColCnt08 = df08['SmartWay'].unique()<br>SmartWayColCnt08.size
Dropping columns – pandas.DataFrame.drop
df_08.drop(['Stnd', 'Underhood ID', 'FE Calc Appr', 'Unadj Cmb MPG'], axis=1, inplace=True)
df_18.drop(['Stnd', 'Stnd Description', 'Underhood ID', 'Comb CO2'], axis=1, inplace=True)
Rename columns – pandas.DataFrame.rename
df_08.rename(columns={'Sales Area': 'Cert Region'})
Replace spaces with underscores, lowercase labels
df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)
Confirm column lables are identical
df_08.columns == df_18.columns