Best of: Monitorama 2018
4-5th September 2018, Amsterdam. This is the second time this occurs in Europe and my first time attending. The previous one was in Berlin in 2015 and one of my friends referred to it as the best experience he’d had. I had to see!
Monitorama is a conference about Observability (current term) and all the strategies and tools you can use to understand what’s happening to your systems. We are currently rebuilding our monitoring infrastructure at work to move away from check-based alerts to a more metric-driven visibility, so the topic had extra attention!
The conference ethos #
This isn’t your regular kind of conference. Despite its popularity it keeps being a single-track event over 2 or 3 days. This conference runs every year in the US and some years it runs in Europe as well. Another particularity is that it encourages many kinds of talks, not only technical.
My take after attending is that you’ll get very nice talks, most inspiring and, if technical, very high level. Don’t expect nitty-gritty details on a setup or technical copy-paste designs, but expect thoughtful consideration on what to consider and why when creating those designs.
A possibly contentious point is that the conference encourages diversity and does so to make a point. This is surprising and can be mildly annoying at times, either by the selection of topics or by how they are presented. Nonetheless the point they want to make is clear: this is annoying because we are not used to it, because it’s not the norm and because we usually ignore it.
Being annoying is, well, annoying, but I felt quite pleased with how they did it once the immediate reaction settled down. I guess getting used to these situations will be the real way to accept it as normal, as it should.
First day talks #
Monica Sarbu - You don’t look like an Engineer #
Women in tech and her path from developer to founder, and then manager.
TL;DW Some insight and general useful pointers for management.
Rick Rackow - Change of framework does not change mindset #
Experiences on eBay Classifieds transition from Nagios to Sensu and now Prometheus.
TL;DW What worked, what didn’t and what they learned along the way. Not technically detailed.
Radu Gheorghe and Rafal Kuc - Is Observability good for our brains? #
Psychology on learning, on adjusting your schedule and decluttering your mind and on running meetings. Based on research papers.
Quintessence Anx - Unquantified Serendipity: Diversity in Development #
Diversity benefits in development, how you can’t plan on it and her experience as woman on the field.
TL;DW Some broad pointers on onboarding and mentoring.
Yan Cui - How to build observability into a serverless application #
Strategies for managing logs with AWS Lambda.
TL;DW Simple tracing with custom request headers is worth it. Structured logging is a must.
Stephen Strowes - Monitoring what you don’t own #
RIPE Atlas. Monitoring internet’s performance and behaviour and optimizing your global routes.
Michael Kehoe, Nina Mushiana - What the NTSB teaches us about incident management & postmortems #
Methodologies on incident handling for effective post-mortems at LinkedIn derived from aviation and transport incident investigation experience.
Alexis Le-Quoc - A thousand and one postmortems: Lessons learned from running complex systems at scale #
Take-aways from analyzing 6 years of post-mortems at Datadog and why you should be doing that too.
Referenced a talk on Data-driven Postmortems
Lightning talks #
RRDTool 2018 - Fabien Wernli A tool to live-downsample metrics, feed them to Elasticsearch and seamlessly query across all of them.
Monitoring @ Adobe - Alex Birca Adobe’s AAM product migration to Prometheus and the plethora of other things they need to converge.
Nida - Rant on Alerts from an SRE Define an alert as something a human must take action on. Now only alert when it’s worth it. And provide some context!
*Live monitoring and Auto-remediation - * Monitoring blood sugar for diabetics and taking actions to correct unhealthy-dangerous health situations.
How to become CTO by killing your software - Mark Actually, it’s important to kill your programs proactively on error conditions to prevent data corruption. Do it well and you may show underlying infrastructure issues.
Second day talks #
Marcus Barczak - Prometheus for Practitioners: Migrating to Prometheus at Fastly #
Transition from Ganglia+Icinga to SaaS and to Prometheus. Challenges faced and lessons learned.
Referenced a talk at PDX 2018: Observability: The Hard Parts Slides: https://speakerdeck.com/ickymettle/prometheus-for-practitioners-migrating-to-prometheus-at-fastly
Stephen Boak - Rethinking UX for AI-driven Monitoring tools #
Maybe the future of monitoring? And how to show the user what the data says. Very interesting but mainly UX centric.
Gregory Parker, Trevor Morgan - Building a monitoring strategy and gaining consensus #
Monitoring for a bank. High-level, management-y vision aligning a big and fractured ecosystem.
Alexey Velikiy - Self-hosted & open-source time series for your infrastructure #
Pet project shows promise, if it can get out of alpha state. Pattern detection, prediction and anomaly analysis as Grafana plugin (time series DB agnostic).
Mandy Waite - Monitoring serverless things #
History of serverless turns into GCP-centric view of how to monitor with barely any details.
Dominic Wellington - How AI helps observe decentralised systems #
Joe Ross - Incremental-decremental methods for real-time monitoring #
Math-heavy talk on adequate/adaptive forecasting for anomaly detection and trend alerting with lightweigt computation.
Panel on Observability - Marcus Barczak, Bram Lhodes, Rick Rackow and #
Recap #
Those are all the notes I took during the conference. Hope it helps you somehow and I’d be very glad to attend again if it’s repeated in Europe for a 3rd time as I doubt I’ll be attending in the US. I had a very pleasant experience which I hope to repeat.