Roku selects AppDynamics to monitor performance

Roku is the creator of the most popular streaming platform for delivering video, music and casual games to the TV. As the market leader in streaming entertainment devices for television, Roku's streaming players are renowned for their simplicity, variety of entertainment choices and value. With millions of users worldwide, the Roku software had to work seamlessly around the clock.

Challenge: Existing monitoring tools proved inefficient

Nils Pommerien, Manager of Network Engineering at Roku, was responsible for making sure Roku delivered an end-user experience second to none. However, he had very little visibility into the performance of his applications running in production. Pommerien and his team used tools like Nagios and the log4net framework to monitor the health of their applications, but found this was not an efficient way to solve problems. “About a year ago, we had a performance problem that would occasionally lock up one of our production nodes,” Pommerien said. “There was some convoluted set of behaviors in our application that basically caused an infinite loop in our code.” Every so often a user would set off this process, and Pommerien's team would need to restart the affected server. “We'd recycle the app pool, but we were getting no data about the problem, so solving it was tricky,” he said. “We had five engineers in a room and ten theories about what was going on.” Ultimately Pommerien and his team decided to attach profilers to each production node. When they saw one of the cores begin to lock up, they began collecting data with the profiler and they found the line of code that was being executed over and over. “It was a one-line problem,” Pommerien said. “It took us 10 days between noticing the problem and getting out a fix. This would have taken five or ten minutes to resolve with an AppDynamics solution.”

AppDynamics streamlined troubleshooting and provided third-party data

management (APM) solutions. Because Roku primarily sold hardware devices, its release schedule was different from other web applications and the version of software that shipped with the product needed to perform flawlessly—if there was a problem caused by an obscure use case, Pommerien needed the visibility to quickly identify and fix it. Roku decided to purchase AppDynamics APM for its fine-grained performance analytics, low-overhead and intuitive interface. The AppDynamics solution not only helped Pommerien and his team locate application performance problems on Roku's end, but also provided insight into third-party issues that could affect the user experience. Pommerien was responsible for an application that relied heavily on content from third-party service providers, where website performance was variable. Visibility into web service calls helped Pommerien ensure that these providers were meeting their SLAs. This visibility also guarded the Roku brand, which could be adversely affected by inferior third-party performance. For example, after one of Roku's third-party providers experienced a 24-hour outage and claimed a week later to be fully functional again, the AppDynamics solution showed that this wasn't entirely true. “It seemed that every once in a while an API call would simply fail,” he said. “So we put together some graphs from AppDynamics APM to show all transactions with this particular provider over the past week. Out of 10,000 API calls, 200 failed completely.”

Roku eliminated application performance problems

Excluding the occasional issues with third party service providers, Pommerien has not had any performance problems since deploying the AppDynamics platform. The solution has also helped Roku's development team perfect their code. “Our developers love it,” Pommerien said. “Now they can see what parts of their code execute the most, or the slowest.” His developers have also used the platform to test the effectiveness of their memcached implementations, by looking at the difference between total requests, cache hits and connections to the database.

It took us 10 days between noticing the problem and getting out a fix. This would have taken five or ten minutes to resolve with the AppDynamics solution.

Even with performance optimized, Pommerien found he monitored the applications more than he did before. “Whenever I have a spare moment, I check-in using the AppDynamics solution,” he said. “I'm definitely spending more time monitoring performance than before, but I don't see that as a bad thing. I know exactly what's happening in my application all the time. That's awesome.”

I know exactly what's happening in my application all the time. It's awesome.