Category Archives: Python

Batch Python

Jupyter Notebook starting directory

We love Jupyter notebooks for data manipulation but it’s default starting directory is a bit of a pain on Windows. But you can override this behavior and specify the directory for it to start up with the following command, do make note of the backward slashes, not sure what that is all about but find and replace in normal mode in NPP and you’re golden.

jupyter notebook --notebook-dir="Z:/onedrive_tt/Archive/client/TCC_Touchpoints-Decommissioning/data/outbound"
Python

Python – Read CSV

One of the more important things I need to attend to is reading a CSV file and examining it. While there is a plethora of documentation on this, since this is my blog I’m documenting my most used cases.

dfOriginalCSV = pd.read_csv("csvFile.csv", sep=",", dtype=str, keep_default_na=False, encoding='utf-8')

So the file is csvFile.csv, while we don’t have to declare it the sep provides the separator character in case of those pesky pipes. By declaring the dtype of str we’re saying the whole thing is a string so it doesn’t do odd tricks with numbers. The keep default na suppresses pythons overwhelming desire to put nan into anything that doesn’t seem like a proper value and of course always account for the encoding.

Python

Python – FutureWarning

Being new to python and running it in the Jupyter notebooks, sometimes you get errors that just don’t make sense and it’s a bit frustrating when you can’t make sense of the error. Let’s take this gem:

/home/aron/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3397: FutureWarning: In a future version of pandas all arguments of read_csv except for the argument 'filepath_or_buffer' will be keyword-only.
  exec(code_obj, self.user_global_ns, self.user_ns)

So what the hell does that mean? I understand pandas, arguments, and what read_csv is but what is keyword-only???

It turns out that this keyword only means that instead of relying on the order of the argument, you simply need to use the keyword for anything except for filepath or buffer…

So this piece of code throws the error:

# Load in the general demographics data.
azdias = pd.read_csv('Udacity_AZDIAS_Subset.csv', ';')

And this is the error fixed:

# Load in the general demographics data.
azdias = pd.read_csv('Udacity_AZDIAS_Subset.csv', delimiter=';')
Python

Python – Dataframe – Unique Value in Column

This is the select distinct to get the individual values of a column:

dataFrame.column.unique()
Python

Python and Pandas on Jupyter

Maybe it should be in Jupyter??? In any case, I’ve been studying using python in jupyter notebooks and it’s some pretty radical stuff. Using numpy and %matplotlib inline can yield some incredible results. This is a list of the commonly used features and samples thereof.

read more »
Python

Python – Merge CSV’s

I’m really enjoying python, one of the things that I’m really digging is pandas, this piece lets you work with CSV files to do a multitude of things. Because you have to pull out data in a loop, this little piece of code will allow you to stitch them together. This sample uses pipe delimiters with UNIX line feeds and quoted all.

import os
import glob
import pandas as pd
import csv
#os.chdir("C:\\onedrive_tt\\Testing\python\\CombineCSVs\\testApplicationFiles")
os.chdir("C:\\app\\python\\CombineCSVs\\testApplicationFiles")

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f, sep = '|', dtype=str) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", sep='|', index=False, encoding='utf-8', line_terminator="\n", quoting=csv.QUOTE_ALL)

Python

Python – Reading CSV as input

One of the more fascinating aspects of python is it’s file manipulation capabilities. I have from time to time needed to process (create) 160 or so new batch files to call warehouse integrations. With proper naming convention in development, this process is handled with 21 lines of python code. Love it!!!

with open('_RecruitingNewExport.csv') as file:
  for line in file:
    line = line.replace("\n", "")
    outputLineTwo = line[:-6]
    with open("Recruiting_"+str(line)+".bat", "w") as file:
        file.write("REM This batch file executes the "+str(line)+" integration\n")        
        file.write("\n")
        file.write("REM Set directories and variables\n")
        file.write("@ECHO OFF\n")
        file.write("cd /d \"%~dp0\"\n")
        file.write("Call Environment.bat\n")
        file.write("TITLE %~nx0 – %TALEO_HOST% – %date% %time%\n")
        file.write("SET timeStamp=%date:~10,4%-%date:~4,2%-%date:~7,2%_%time:~0,2%-%time:~3,2%-%time:~6,2%\n")
        file.write("SET timeStamp=%timestamp: =0%\n")
        file.write("\n")
        file.write("REM Run the export integration\n")
        file.write("Call core\TCC.bat \"%SCRIPTS_FOLDER%\\CandidateExportScripts\\"+str(line)+"\\"+str(line)+"_cfg.xml\" \"%SCRIPTS_FOLDER%\\CandidateExportScripts\\"\
                   +str(line)+"\\"+str(line)+"_sq.xml\" \"%OUTBOUND_FOLDER%\\"+str(outputLineTwo)+"_%timestamp%.csv\"\n")
        file.write("Exit /B %ERRORLEVEL%\n")
        file.write("\n")
  print()


Python

Python – CSV to Pipe delimiter

So why would a perfectly sane person want to have a pipe delimited CSV verses a properly escaped comma separated value file? Yeah beats me too, glutton for punishment, not working with the right parser, drug addiction? All could be reasons. But fear not, if this is the road you’d like to go down there’s a python script for it and here it is so I don’t have to recreate the wheel next time I need it.

import glob
import csv
import os

for entry in glob.glob('applicationResumeAttachmentsManifest/*.csv'):
    #outputFile = (entry.strip(".csv")+"-Pipe.csv")
    outputFile = ("C:\\python\\pipe\\"+entry.strip(".csv")+"-Pipe.csv")
    os.makedirs(os.path.dirname(outputFile), exist_ok=True)
    #print(entry)
    #print(outputFile)
    with open(entry, encoding='utf-8') as inputFile:
        with open(outputFile, 'w', encoding='utf-8', newline='') as writeFile:
            reader = csv.DictReader(inputFile, delimiter=',')
            writer = csv.DictWriter(writeFile, reader.fieldnames, delimiter='|')
            writer.writeheader()
            writer.writerows(reader)
print("Conversion complete.")
Python

Python

I know that the current buzzword for IT is python and I’m going to have to jump on that bandwagon. In my role as an integration engineer, there are a lot of times that I have to manipulate file data and I’m super impressed by the ease and power of this language.

So to this end I’ll be creating a category for python on this blog and storing code snippets that I find handy and useful. Hope you find it as helpful and enjoyable as I do.

I know a lot of people hear python and think, yeah, a big snake that will constrict you until you’re dead and then swallow you whole. A perfect simile for programming. But here’s the truth, Guido van Rossum developed the language starting in 1989 and the name python comes not from the snake but Monty Python’s Flying Circus which the developers loved to watch while they were coming up with the language. Knowledge is power and sometimes funny too.