SRE Prodcast brings Google's experience with Site Reliability Engineering together with special guests and exciting topics to discuss the present and future of reliable production engineering!
…
continue reading
1
Embracing Complexity with Christina Schulman & Dr. Laura Maguire
33:59
33:59
Play later
Play later
Lists
Like
Liked
33:59
In this episode of the Prodcast, we are joined by guests Christina Schulman (Staff SRE, Google) and Dr. Laura Maguire PhD (Principal Engineer, Trace Cognitive Engineering). They emphasize the human element of SRE and the importance of fostering a culture of collaboration, learning, and resilience in managing complex systems. They touch upon topics …
…
continue reading
1
Maglev: load balancing at Google with Cody Smith and Trisha Weir
32:53
32:53
Play later
Play later
Lists
Like
Liked
32:53
In this episode, Cody Smith (CTO and Co-founder, Camus Energy) & Trisha Weir (SRE Department Lead, Google) join hosts Steve McGhee and Jordan Greenberg, to discuss their experience developing Maglev, a highly available and distributed network load balancer (NLB) that is an integral part of the cloud architecture that manages traffic that comes in t…
…
continue reading
1
Profiling data with Pat Somaru and Narayan Desai
42:22
42:22
Play later
Play later
Lists
Like
Liked
42:22
In this episode, guests Narayan Desai (Principal SRE, Google) and Pat Somaru (Senior Production Engineer, Meta) join hosts Steve McGhee and Florian Rathgeber to discuss the challenges of observability and working with profiling data. The discussion covers intriguing topics like noise reduction, workload modeling, and the need for better tools and t…
…
continue reading
1
Google Public DNS (8.8.8.8) with Wilmer van der Gaast and Andy Sykes
32:07
32:07
Play later
Play later
Lists
Like
Liked
32:07
This episode features Google engineers Wilmer van der Gaast (Production on-tall) and Andy Sykes (Senior Staff Systems Engineer, SRE), joining hosts Steve McGhee and Jordan Greenberg, to discuss the development and maintenance of Google Public DNS (8.8.8.8). They highlight the initial motivations for creating the service, technical challenges like c…
…
continue reading
1
SRE in the Retail and Gaming Worlds with Jordan Chernev & Scott Bowers
33:40
33:40
Play later
Play later
Lists
Like
Liked
33:40
Guests Jordan Chernev (Senior Technology Executive) and Scott Bowers (SRE, Gearbox Software) who hail from the retail and gaming industries, respectively, join hosts Steve McGhee and Jordan Greenberg to discuss the unique challenges of Site Reliability Engineering in their industries. They share the importance of aligning SLOs with user experience,…
…
continue reading
1
Incident Response with Sarah Butt and Vrai Stacey
43:53
43:53
Play later
Play later
Lists
Like
Liked
43:53
Sarah Butt (Principal Engineer, Centralized Incident Response, Salesforce) and Vrai Stacey (Staff Software Engineer, Google) join hosts Steve McGhee and Jordan Greenberg to dive into incident response—particularly tooling and software for reliability incidents. Tune in for an in-depth discussion on topics such as the importance of communication and…
…
continue reading
1
Building Reliable Systems with Silvia Botros and Niall Murphy
42:06
42:06
Play later
Play later
Lists
Like
Liked
42:06
Silvia Botros (SRE Architect, Twilio | Author of "High Performance MySQL, 4th edition”) and Niall Murphy (Co-founder & CEO, Stanza) join hosts Steve McGhee and Jordan Greenberg, to discuss cultural shifts in database engineering, rate limiting, load shedding, holistic approaches to reliability, proactive measures to build customer trust, and much m…
…
continue reading
1
Creating Systems that are Safe with Liz Fong-Jones
28:40
28:40
Play later
Play later
Lists
Like
Liked
28:40
Liz Fong-Jones (former Google SRE and current Field CTO at honeycomb.io) joins hosts Steve McGhee and Jordan Greenberg for a lively discussion centered around observability, its evolution from monitoring, and its role in modern software development. Tune in for more on the importance of observability as a spectrum, the evolving role of SREs, and ad…
…
continue reading
1
Production Problems Are For All! with Ben Treynor Sloss
31:21
31:21
Play later
Play later
Lists
Like
Liked
31:21
Ben Treynor Sloss (VP of Engineering, Google) joins hosts Steve McGhee and Dr. Jennifer Petoff (Director of Technical Infrastructure Education, Google) to share the evolution of SRE and its impact on software development, how AI and ML significantly impacts SRE practices, and the future of SRE. Ben coined the term "Site Reliability Engineering" for…
…
continue reading
1
There Remains a Huge Amount of Work to Do, with Healfdene Goguen
26:14
26:14
Play later
Play later
Lists
Like
Liked
26:14
In this episode, Healfdene Goguen (Principal Engineer, Google) joins hosts Steve McGhee and Jordan Greenberg to discuss the vast amount of work to be done by SREs, and the fascinating challenges to tackle with clear real-world implications. It's a truly exciting time to be an SRE at Google!By Google Prodcast Team
…
continue reading
1
SRE, a Basis of Influence, with Amy Tobey & Vladyslav Ukis
41:02
41:02
Play later
Play later
Lists
Like
Liked
41:02
In this season of Google Prodcast, current and former SREs, both within and outside of Google, chat with hosts Steve McGhee and Jordan Greenberg to discuss software systems designed and built by SREs. For "episode zero", guests Amy Tobey (Live Services SRE, Netflix) and Dr. Vladyslav Ukis (Head of R&D, Siemens Healthineers, Author of "Establishing …
…
continue reading
1
Life of An SRE: Life after Google SRE, with Carla Geisser, Cody Smith, and Laura Nolan
46:32
46:32
Play later
Play later
Lists
Like
Liked
46:32
Former Google SREs, or "Xooglers", talk with hosts MP and Steve McGhee about site reliability engineering outside of Google. What’s the difference in scale? What skills are generally valuable? And why can’t you build “SRE in a box” that jump-starts pretty much any organization? Join Carla Geisser, Cody Smith, and Laura Nolan in their lively convers…
…
continue reading
Sabrina Farmer, VP of Engineering at Google, talks about her career journey through Site Reliability Engineering. What does management mean? What’s involved in being an effective manager? and what’s a feasibility study? Hear some great advice on how to get what you expect out of a role, wherever on the ladder it is.…
…
continue reading
Dave Reisner talks about his path to Staff SRE, from ArchLinux contributor through DevOps to software engineer. This episode emphasizes the value of strong mentoring and manager relationships, and the challenges of work-life balance.By Salim Virji
…
continue reading
Explore the role and responsibilities of an SRE manager with Stephen Benjamin.By Salim Virji
…
continue reading
Explore the role and responsibilities of a Senior SRE with Jessica Theodat, as she discusses life-work balance, the value of mentoring, and being a Black woman in SRE.By Salim Virji
…
continue reading
1
Life of An SRE with Shannon Brady and Theo Klein
44:01
44:01
Play later
Play later
Lists
Like
Liked
44:01
Explore the career path of SREs Shannon Brady and Theo Klein as they discusses their paths to Site Reliability Engineering and finding their areas of expertise.By Salim Virji
…
continue reading
1
Life of An SRE with Mariuxi Vasconez and Julian Alarcon
34:30
34:30
Play later
Play later
Lists
Like
Liked
34:30
In this episode, Mariuxi and Julian discuss their paths to SRE: what drew them initially to SRE, and what motivates them to continue developing skillsBy Salim Virji
…
continue reading
1
Life of An SRE Episode 1: Tom Cranitch and Megan Yin
27:14
27:14
Play later
Play later
Lists
Like
Liked
27:14
How does one become an SRE? And what’s the career like? In this episode, Tom and Megan discuss their path to SRE.By Salim Virji
…
continue reading
1
Creating the SRE Prodcast with John Reese (JTR)
10:55
10:55
Play later
Play later
Lists
Like
Liked
10:55
Host MP English and former Google SRE John Reese (JTR) chat about the creation of the Prodcast. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy MP English
…
continue reading
Ayelet Sachto offers advice on creating an actionable, transparent, and blameless postmortem culture. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy MP English, Viv
…
continue reading
Adrienne Walcer discusses how to approach and organize incident management efforts throughout the production lifecycle. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy Viv, MP English
…
continue reading
1
On-Call Rotations with Andrew Widdowson (APW)
43:58
43:58
Play later
Play later
Lists
Like
Liked
43:58
Andrew Widdowson (APW) shares strategies for successful on-call rotations. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy MP English, Viv
…
continue reading
Pierre Palatin dives into different automation strategies, how to build confidence in your system, and why designing the UI may be your biggest challenge. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy Viv, MP English
…
continue reading
1
Client-Transparent Migrations with Pavan Adharapurapu
40:28
40:28
Play later
Play later
Lists
Like
Liked
40:28
Pavan Adharapurapu details how to approach large-scale migrations while optimizing for user experience. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy MP English, Viv
…
continue reading
Narayan Desai explains why SLOs can be problematic and proposes alternative methods for monitoring complex, large-scale systems. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy Viv, MP English
…
continue reading
Amelia Harrison advises on when and how to alert, ideal coverage, and tuning. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy MP English, Viv
…
continue reading
1
Customer-Centric Monitoring with Silvia Esparrachiari
31:05
31:05
Play later
Play later
Lists
Like
Liked
31:05
Silvia Esparrachiari talks about the challenges of monitoring and the importance of understanding your users. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy Viv, MP English
…
continue reading
1
SRE Philosophy with Jennifer Mace (Macey)
33:04
33:04
Play later
Play later
Lists
Like
Liked
33:04
What is SRE, anyway? Jennifer Mace (Macey) gives us her definition of "site reliability engineer," discusses how to manage risk, and shares key questions to ask developers. Visit https://sre.google/prodcast for transcripts and links to further reading. View transcriptBy MP English, Viv
…
continue reading