Overview
This project covers data scraping and processing of live parking data collected over a period of time.
Skills involved:
- Data collection
- Automation
- Python
- Cron jobs
- Bash scripting
- Statistical analysis
Data source
The data used in this project comes from the city of Los Angeles Open Data website. Specifically, it is the Los Angeles International Airport (LAX) - Parking Lots Current Status page. This page gives live data of the current status of the parking garages for LAX airport. The data updates every five minutes.
For this project, I opted to collect the live data in CSV format with the following URL: https://data.lacity.org/resource/dik5-hwp6.csv
Collection process
Instead of just utilizing the live data, I wanted to save the live data as a CSV file every five minutes over period of time. In order to automate this process, I created a cron job in a Linux environment that would execute a python script that downloaded the current data as a CSV file with the time and date included in the file name. Once per day day, I had another cron execute a Python script that concatenated all the CSV files collected over the past 24 hour period into a single CSV file with the date included in its filename.
Compiling the collected data
Once I concluded the collection process, I brought all of the collected data into a Jupyter notebook using Python to put all of the data into a Pandas dataframe to do some data exploration and decide which data I would use to export another CSV file that I would then go on to use in a future project for use in a Tableau visualization and dashboard.