Overview

This project covers data scraping and processing of live parking data collected over a period of time.

Skills involved:

  • Data collection
  • Automation
  • Python
  • Cron jobs
  • Bash scripting
  • Statistical analysis

Data source

The data used in this project comes from the city of Los Angeles Open Data website. Specifically, it is the Los Angeles International Airport (LAX) - Parking Lots Current Status page. This page gives live data of the current status of the parking garages for LAX airport. The data updates every five minutes.

For this project, I opted to collect the live data in CSV format with the following URL: https://data.lacity.org/resource/dik5-hwp6.csv

Collection process

Instead of just utilizing the live data, I wanted to save the live data as a CSV file every five minutes over period of time. In order to automate this process, I created a cron job in a Linux environment that would execute a python script that downloaded the current data as a CSV file with the time and date included in the file name. Once per day day, I had another cron execute a Python script that concatenated all the CSV files collected over the past 24 hour period into a single CSV file with the date included in its filename.

Compiling the collected data

Once I concluded the collection process, I brought all of the collected data into a Jupyter notebook using Python to put all of the data into a Pandas dataframe to do some data exploration and decide which data I would use to export another CSV file that I would then go on to use in a future project for use in a Tableau visualization and dashboard.