June 18, 2017data processing python
At work, I was tasked with filtering library course guide data. The filtering involved removing guides named [Deleted] and associating guides to the librarian in charge of maintaining a specific guide. Since I didn’t want to look through 200 odd records, so I decided to use Python! I can’t show you the code, but I can tell you how I coded up a solution.
The first task was to remove the [Deleted] guides which took just three clicks in Excel. With the [Deleted] guides removed, I saved the Excel sheet so I could process the data using Python’s csv module.
With the .csv file, I started copying the guides each librarian was responsible for into their respective .txt files. Despite copying the guides from a website, it was well formatted making the copy-paste a breeze.
The next step was to add the librarian responsible for maintaining a given guide to each record. I thought it would be as simple as appending the librarian falling under a given if condition, but there was a big problem- some library guide titles in the comma-separated values file had commas in their title, creating a whole new row in the destination comma-separated values file.
With that discovery, I had to come up with a way of writing the comma into the comma-separated file without creating a new column. I tried to escape the comma using regular expressions, but I ended up with a lot of backslashes in the file which tried to escape all the commas. The same thing happened when I tried to write the rows into the csv file pythonically1 with the module’s csvwriter.
As I was about to give up, I realized that find and replace exists and replaced the backslashes with
‘s. After adding the guide’s of another librarian, the data was all filtered for my supervisor to review. The data is still being processed, but this was a good start. I’m just wondering if the time I took to make this was much quicker than doing it manually. To be honest, I think it took about the same time. Still, it’s a good way of using Python to make daily tasks much simpler.