// The Comment #12: Crawling, Downloading, Parsing, Notifying

Durm

The Comment is a weekly digest of the stuff that grabbed my attention or occupied some part my mind during the past week. Normally, it’ll be one thing that’s really been on my mind, followed by a handful of things that I found interesting. The Comment will be published each Monday at 10:30AM EST. 

Thanks for reading.

# Open sourced some infrastructure

In 2014 to 2015, I spent a lot of my spare time building a podcast app for Android. I probably spent hundreds of hours coding the Android app and the parsing / syncing backend.  I shut the app down a few months after I released it because I failed to quickly find a good “product market fit”.  Anyway, I open sourced the Android app a year ago or so and it even runs as a standalone Android app (no backend needed).

Today I’m open sourcing the backend apps and APIs that ran everything.  There’s nothing spectacular here.  It’s just a bunch of Java and Node.JS code.  There are several apps here:

  • PodcastCrawler – Crawled iTunes podcasts for all the podcasts.
  • Retriever – Pulled podcasts RSS urls from the database, retrieved the feeds data, and stuck the data into a Redis queue.
  • Parser – Parsed feed data from a Redis Queue and stuck Podcast and Episode objects into another Redis queue.
  • Storer – Wrote Podcast and Episode records into the database.  It also built notification records (for push messaging) and stuck those into a Redis queue.
  • Notifier – Sent Google Cloud Messages to devices for push-to-sync purposes.
  • Billboard – Built the trending podcast lists by crunching user listen data.
  • API – Source of all podcast and episode data as well as an avenue for account creation and syncing

Looking back, this was the first large scale backend I built by myself.  It parsed 200k+ podcasts, multiple times an hour, making available millions of podcast episodes.  I learned all about DevOps, setting up VPN, scripting deployments, and the MEAN stack while working on this project.

Check the project out on GitHub.