TrafficFlow – Classifying Data

As a follow up to my initial TrafflicFlow post, I have built some more software to help me classify the dataset I collected over the past few weeks.

TrafficFlow is a project where I develop an algorithm that can “look” at a still image pulled from a traffic camera and determine whether or not traffic is congested.  I am using the deep learning framework, TensorFlow, to build the model that will house this algorithm.

Over the past few weeks, I have collected 4,966 still images from a NCDOT traffic camera.  I wrote a Python script that took a snapshot.  I cron’d that Python script to run every 4 minutes.  Now that I have all of this data, how can I efficiently classify it?  A few ideas came to mind:

  • Python script that loaded the image in a picture viewer and presented a question in terminal.  This worked, but the picture viewer grabbed focus. I also couldn’t close the picture viewer automatically.  I determined that the extra interaction involved would make classifying the data this way, inefficient.  This also limited me to classifying data on my MacBook Pro only.
  • AngularJS web app that allowed me to classify images in a desktop web browser.  This was interesting, but I didn’t know a ton of Angular and this limited me to classifying data on my MacBook Pro only.

I’m an Android developer by day (checkout RadioPublic 😉 ).  I figured I’d just build an Android app that would allow me to classify the data, so I did.  But first, I needed to gather the collected data into a format that is easily usable in this app.  So I wrote a Python script:

This script simply reads a list of files from a directory, creates an entry in a dictionary (saving some additional metadata in the process), and exports that object to JSON.

A snippet from the exported data looks like:

Next, I uploaded this JSON file to Firebase.  Firebase is a backend as a service that allows app developers to quickly build apps without needing to spin up servers or turn into “devops”.  Best of all, it’s free to get started and use.

Finally, I uploaded 4,966 images to my web server so that my app can access them.

Now on to the app.  It’s nothing special and particularly ugly, but it works.

It allows me to quickly classify an image as congested (1), not congested (0), or ditch/don’t include (-1).  Once I classify an image, it saves the result (and my progress) to Firebase, then automatically loads the next one.  It turns this exercise into a single tap adventure, well a 4,966 series-of-single-taps adventure.

I’ve uploaded the Python script and Classify Android app to GitHub (https://github.com/emuneee/trafficflow).  I hope to make my dataset available soon as well.

Now onto classification.

Hello TrafficFlow

I am interested in machine learning.  I’ve finished most of the Udacity “Intro to Machine Learning Course”.  I’ve been thinking of ways to get my feet wet in machine learning.  A practical project that I can start and finish that will give me some hands on experience.

Hello TrafficFlow (Traffic + TensorFlow)

I-40 at Wade Avenue in Raleigh, North Carolina

I’ve built an Android app, Traffcams, that lets people view traffic images from traffic cameras.  I’ve done the TensorFlow tutorial walking through image recognition.  So I’m thinking that I can modify that tutorial to tell me if an image from a traffic camera contains a lot of traffic.  My first step in training a TensorFlow model is collecting the data.  I wrote a Python script that simply saves an image to disk from a given URL.

I have this script cron’d on a Ubuntu server.  It runs every 4 minutes saving an image from this camera, which means I’ll save 360 images per day.  I’ll probably throw away the night pictures (sunset to sunrise is about 8 hours)…so I’ll acquire about 240 usable pictures per day.  I’m predicting I’ll need about 2,000 to 3,000 images to train a model.  I’ll play it safe and say I’ll need 3,000 images.  In 12 and a half days, I’ll have enough data to train.

My next step is to manually classify these images as having a lot of traffic (1) or not (0).  Sounds monotonous.

New Android Stuff Part 1 😍

I’m barely into the things that were released or announced at Google I/O 2017.  I’ve already got a list of stuff that I need to watch and review.  It’s really a lot of stuff and it’s only day 1!

What’s New In Android

After watching the Google I/O Keynote, this is normally the video I watch next.

Kotlin is Officially Support for Android Development

I’ve been holding off on doing anything major in Kotlin until it was blessed with official support from the Android team.  Well, I’m out of excuses.  Kotlin is an officially supported language for Android development.  It’s necessary dependencies and plugins are being integrated into Android Studio, beginning with version 3.0.

Kotlin and Android | Android Developers

New Android Studio Profilers

There are a ton of re-designed profilers for CPU, memory, and network operations in Android Studio 3.0.  I’ll let the pictures do the talking (all taken from Android Developers).

I’m especially pumped about the network profiler!

Android Studio 3.0 Canary 1 | Android Developers

Android O Beta

The next version of the Android O Beta was released today.  If you have a Nexus 5X, 6P, Pixel, Pixel XL, Nexus Player, or Pixel C, you can enroll your device at android.com/beta.  I’ve been using it for a few hours.   The only issues I’ve seen are Android Pay doesn’t work (it politely lets you know with a splash screen) and the Google Play Music playback notification just re-appears from time to time.

Android O Developer Preview | Android Developers

Android Architecture Components

The Android team has started putting together new tools and guidelines to help Android developers properly architect their app to prevent memory leaks, make lifecycle management easier (!), and reduce boiler plate code.

A new SQLite object mapper from the Android team, called Room.

Screenshot from the Architecture Components Talk

 

Android Architecture Components | Android Developers

These are just a few of the things that immediately stood out to me as an Android Developer.  I’m looking forward to doing a deeper dive into all of it.

Traffcams, now serving Georgia and Washington

Beginning today, Traffcams now has traffic cameras from Washington (WSDOT) and Georgia (GDOT).  Traffcams is a powerful app that puts the traffic cameras around you in your hand.  Traffcams now contains over 3,700 traffic cameras in 3 states.  More locations are on the way.

Get Traffcams free, from the Google Play Store today.

Get it on Google Play

A Case for User Data Regulation?

Carole Cadwalladr published a really fascinating piece on disinformation, propaganda, and it’s influence on the Brexit referendum.  An excerpt I found particularly interesting:

Paul and David, another ex-Cambridge Analytica employee, were working at the firm when it introduced mass data-harvesting to its psychological warfare techniques. “It brought psychology, propaganda and technology together in this powerful new way,” David tells me.

And it was Facebook that made it possible. It was from Facebook that Cambridge Analytica obtained its vast dataset in the first place. Earlier, psychologists at Cambridge University harvested Facebook data (legally) for research purposes and published pioneering peer-reviewed work about determining personality traits, political partisanship, sexuality and much more from people’s Facebook “likes”. And SCL/Cambridge Analytica contracted a scientist at the university, Dr Aleksandr Kogan, to harvest new Facebook data. And he did so by paying people to take a personality quiz which also allowed not just their own Facebook profiles to be harvested, but also those of their friends – a process then allowed by the social network.

Facebook was the source of the psychological insights that enabled Cambridge Analytica to target individuals. It was also the mechanism that enabled them to be delivered on a large scale.

There is no one, true Federal policy or law in the United States regulating how companies collect, store, and distribute user data.  There are a handful of regulations that guide the storage and distribution of medical and financial data.  When it comes to the data Facebook, Google, Amazon, and other tech companies collects on their users, it essentially comes down to best practices.  Congress recently prevented the FCC from enforcing privacy regulation that was close to going into effect.  This regulation would have prevented ISP (internet service providers) from selling your user information without consent.

As an observer and participant, I’m sure its a matter of when, not if, with regard to regulation of the tech industry and its handling of user data.  As a software engineer in the tech industry, how user data is collected and protected (from external parties and internal employees) essentially comes down to the culture established by that organization. Some organizations, like Uber, don’t seem to do a good job at this.  As more of our data is, whether voluntarily or involuntarily, is collected and stored in the cloud, it will be a target for hackers and enterprising individuals / organizations looking to exploit that data.  Revelations, like the ones present in the quoted article, reports of companies being hacked, and the exposure of internal user data scandals will increase the desire for some sort of regulation.

Read more at The great British Brexit robbery: how our democracy was hijacked.