It’s been a long time since I’ve posted here. I’ve been really busy in general, but I’m excited to publish this post (I actually forced myself to publish this post).
Over the past few weeks, since I returned from vacation, I’ve been working on building a backend that scales. I’m building an Android podcast app, PrēmoFM. My podcast app needs a backend parsing infrastructure that scales to parse hundreds of thousands of podcast XML files constantly. As I type this, I’m about to unleash hundreds of thousands of podcast channels onto my beta users. Up to this point, they’ve only had access to 75, which mostly consisted of my favorite podcasts. That number will be increasing to north of 200,000. I’m ecstatic about this because these are the first pieces of software that I built that operates at this scale.
In order to provide a good user experience to PrēmoFM users, I need to be able to parse at least 5,000 podcasts per minute. On the surface, this would allow me to run through the entire catalog one time in 40 minutes. I’ve made several architectural decisions that allow me to achieve this. I’ve built highly multi-threaded Java apps that serve a single purpose:
- HTTP XML data retrievers
- XML Parsers
- Push message senders
Each portion of the chain has it owns challenges and constraints, so I needed to build apps that were focused in their function. This also allows me to spin up instances of each app to keep my XML parsing pipeline saturated and operating at peak performance, which is exactly what I did. Secondarily, I’m able to iterate and update each component separately. I can only imagine the frustration if I had one Java app that did the retrieval, XML parsing, and push message sending. I’m still doing to a bit of testing before I move everything up to my production VPSes (virtual private servers), but I’ve already blew past my baseline goal of 5,000 podcasts per minute and am peaking around 12,000+ (Update 5/25/2015 @ 1AM – deployed to DigitalOcean – peaking at 27,000 channels per min, bonkers, to me anyway), which is at least two orders of magnitude increase from where I was a week ago. In fact, things are moving so fast, I’m going to have to significantly cut down on the logging or it’ll begin eating up all of the available hard drive space.
I’ll provide more detail on my backend architecture in the future, when I have time to document. For right now, I’m marching towards release.
Are you an Android user? Sign up for the beta at Prēmo.FM