// The Comment #9: Aye, Yo…TrafficFlow?

Bone chilling temperatures, but a dope sunset.

The Comment is a weekly digest of the stuff that grabbed my attention or occupied some part my mind during the past week. Normally, it’ll be one thing that’s really been on my mind, followed by a handful of things that I found interesting. The Comment will be published each Monday at 10:30AM EST. 

Thanks for reading.

## What’s up with TrafficFlow?

TrafficFlow was a side project I took on to deepen my understanding of machine learning and TensorFlow with some hands on experience. I started out with the goal of being able to train a neural network to tell me if an image from a traffic camera shows traffic congestion. Initially, I did not think that this was an ambitious goal, but it turns out it’s more challenging than I initially thought. I started this project in the summer of 2017 and just got around to training a neural network on the collected data. I haven’t reached my goals, but I have a few takeaways.

Where am I at now?

I have done a first pass at training a neural network with the data I collected and classified.  The Keras code I am using is very similar to a tutorial that walks through training a neural network to recognize cats and dogs.  The network tells me there’s congestion in every image I run inference (prediction) on, even in some of the classified training and validation data. Something is very wrong.

Went well – Programming things

Part of the reason TrafficFlow got off to a great start was because I scripted the data collection aspects. I wrote an Android app for the classification stage. Finally I scripted the data preparation steps. All of the manual work was configuring the neural network (more on this in a later article) and using the app tp classify the data.

Not so well – Data is Key!

The largest portion of TrafficFlow was data collection and classification. I setup a script that automatically saved an image from a traffic camera every 3 minutes. The script worked flawlessly. The challenge I immediately experienced was dealing with data from rotating cameras. I wanted to add a few constraints to minimize effort, one being I would train a neural network to recognize congestion on one side of a street or highway during the daytime. It was really easy to throw out images captured at night. It wasn’t as easy throwing away data from a rotated or zoomed camera. The cameras never returned to the previous position perfectly. Sometimes it would be off or zoomed in (or out). I had trouble determining if I should keep this sample or toss it.

Another challenge I experienced was the lack of data. I captured over 9,500 images. This was not enough. Over half of these images were thrown out because it was night or the camera’s perspective changed. When it came time to train, I had ~270 samples of data showing traffic congestion and ~2,500 samples of data containing no traffic congestion. I estimate that I’d need a magnitude or more of data, (2700 samples of congestion, 25000 samples of no congestion) for me to have a shot at a reasonably trained network.

Where do I go from here?

I’ll need to really dive in to the configuration of my neural network. I re-purposed a configuration from a tutorial thats used to determine whether or not the picture has a cat or dog in it. I have a hunch that I’ll need something more purpose built.  This is the reason why I got into this side project, to really understand why I would use certain configurations of a neural network over another.

In the mean time, I’ll be uploading the scripts and code I wrote to Github sometime this week.

In the meantime, enjoy a time-lapse generated from the collected data.

/* fini */

TrafficFlow – An Update

Early this year I started a project, TrafficFlow.  The goal was to train a neural network to recognize, with reasonable accuracy, congested traffic conditions.  I needed data, so every 3 minutes I was sampling a NC DOT traffic camera at I-40, exit 289.

This camera, and many others, can rotate.

(Above are images taken from the same traffic camera, one when it’s facing West, the other Facing East)

It would be interesting to see how the neural network responds to this type of data.  My guess was I would have ended up with a low accuracy because of the differing angles and most times, there was not a lot of data samples depicting traffic congestion.  I would have needed a lot more training data.  This forced me to find a new camera and one that is fixed.  I found one in Downtown Durham.

It’s even reasonably lit at night.  I’ll be sampling this traffic camera every 3 mins, meaning I’ll have enough data (hopefully) to start training a neural network in a week or two.  In the meantime, I’ll be doing a few exercises in deep learning, starting with this, Building powerful image classification models using very little data.

TrafficFlow – Classifying Data

As a follow up to my initial TrafflicFlow post, I have built some more software to help me classify the dataset I collected over the past few weeks.

TrafficFlow is a project where I develop an algorithm that can “look” at a still image pulled from a traffic camera and determine whether or not traffic is congested.  I am using the deep learning framework, TensorFlow, to build the model that will house this algorithm.

Over the past few weeks, I have collected 4,966 still images from a NCDOT traffic camera.  I wrote a Python script that took a snapshot.  I cron’d that Python script to run every 4 minutes.  Now that I have all of this data, how can I efficiently classify it?  A few ideas came to mind:

  • Python script that loaded the image in a picture viewer and presented a question in terminal.  This worked, but the picture viewer grabbed focus. I also couldn’t close the picture viewer automatically.  I determined that the extra interaction involved would make classifying the data this way, inefficient.  This also limited me to classifying data on my MacBook Pro only.
  • AngularJS web app that allowed me to classify images in a desktop web browser.  This was interesting, but I didn’t know a ton of Angular and this limited me to classifying data on my MacBook Pro only.

I’m an Android developer by day (checkout RadioPublic 😉 ).  I figured I’d just build an Android app that would allow me to classify the data, so I did.  But first, I needed to gather the collected data into a format that is easily usable in this app.  So I wrote a Python script:

This script simply reads a list of files from a directory, creates an entry in a dictionary (saving some additional metadata in the process), and exports that object to JSON.

A snippet from the exported data looks like:

Next, I uploaded this JSON file to Firebase.  Firebase is a backend as a service that allows app developers to quickly build apps without needing to spin up servers or turn into “devops”.  Best of all, it’s free to get started and use.

Finally, I uploaded 4,966 images to my web server so that my app can access them.

Now on to the app.  It’s nothing special and particularly ugly, but it works.

It allows me to quickly classify an image as congested (1), not congested (0), or ditch/don’t include (-1).  Once I classify an image, it saves the result (and my progress) to Firebase, then automatically loads the next one.  It turns this exercise into a single tap adventure, well a 4,966 series-of-single-taps adventure.

I’ve uploaded the Python script and Classify Android app to GitHub (https://github.com/emuneee/trafficflow).  I hope to make my dataset available soon as well.

Now onto classification.

Hello TrafficFlow

I am interested in machine learning.  I’ve finished most of the Udacity “Intro to Machine Learning Course”.  I’ve been thinking of ways to get my feet wet in machine learning.  A practical project that I can start and finish that will give me some hands on experience.

Hello TrafficFlow (Traffic + TensorFlow)

I-40 at Wade Avenue in Raleigh, North Carolina

I’ve built an Android app, Traffcams, that lets people view traffic images from traffic cameras.  I’ve done the TensorFlow tutorial walking through image recognition.  So I’m thinking that I can modify that tutorial to tell me if an image from a traffic camera contains a lot of traffic.  My first step in training a TensorFlow model is collecting the data.  I wrote a Python script that simply saves an image to disk from a given URL.

I have this script cron’d on a Ubuntu server.  It runs every 4 minutes saving an image from this camera, which means I’ll save 360 images per day.  I’ll probably throw away the night pictures (sunset to sunrise is about 8 hours)…so I’ll acquire about 240 usable pictures per day.  I’m predicting I’ll need about 2,000 to 3,000 images to train a model.  I’ll play it safe and say I’ll need 3,000 images.  In 12 and a half days, I’ll have enough data to train.

My next step is to manually classify these images as having a lot of traffic (1) or not (0).  Sounds monotonous.