| Page 176 | Kisaco Research

Following the MLCommons MLPerf Inference v5.1 Results on the morning of Tuesday 9th September on the keynote stage, Miro Hodak, Senior Member of Technical Staff, AI Performance Engineering at AMD will deliver a detailed analysis of the results followed by a Q&A session from the audience. 

Author:

Miro Hodak

Senior Member of Technical Staff, AI Performance Engineering
AMD

Miro Hodak is a Principal Member of Technical Staff at AMD, where he focuses on AI performance and benchmarking. Prior to joining AMD, he served as an AI Architect at Lenovo and was a professor in physics at North Carolina State University before that. 

Miro has been actively involved with MLPerf and MLCommons since 2020, contributing to the development of multiple MLPerf benchmarks and submitting results across several rounds of Inference and Training. Since 2023, he has served as co-chair of the MLPerf Inference Working Group.

He has authored peer-reviewed publications in fields ranging from artificial intelligence and computer science to materials science, physics, and biochemistry, with his work cited over 2,500 times.

Miro Hodak

Senior Member of Technical Staff, AI Performance Engineering
AMD

Miro Hodak is a Principal Member of Technical Staff at AMD, where he focuses on AI performance and benchmarking. Prior to joining AMD, he served as an AI Architect at Lenovo and was a professor in physics at North Carolina State University before that. 

Miro has been actively involved with MLPerf and MLCommons since 2020, contributing to the development of multiple MLPerf benchmarks and submitting results across several rounds of Inference and Training. Since 2023, he has served as co-chair of the MLPerf Inference Working Group.

He has authored peer-reviewed publications in fields ranging from artificial intelligence and computer science to materials science, physics, and biochemistry, with his work cited over 2,500 times.

Distributed training jobs are brittle; a single node failure can halt progress and waste expensive GPU cycles. This technical demo dives into Cluster Director, focusing on how engineers can automate resilient, large-scale GPU infrastructure. We'll start with a declarative YAML configuration to define and provision a multi-node GPU cluster, optimized with the ideal network topology for NCCL communication. The core of the demo will be a live failure simulation. You will see Cluster Director automatically detect a preempted node, perform remediation, and maintain the integrity of the running workload with minimal disruption.

Author:

Ilias Katsardis

Senior Product Manager
Google Cloud

Ilias Katsardis is a Senior Product Manager based in Sunnyvale, CA, driving the future of AI infrastructure at Google Cloud. He is responsible for Cluster Director and the Cluster Toolkit, two key components of Google's supercomputing architecture. Passionate about making large-scale AI and HPC more accessible, Ilias focuses on creating solutions that automate complex configurations and provide a seamless user experience. His work enables researchers and developers to spend less time on infrastructure management and more time on scientific breakthroughs. With a rich background that includes roles at Cray Inc. and ClusterVision, along with founding two tech startups, Ilias brings over 15 years of deep industry expertise to his role.

Ilias Katsardis

Senior Product Manager
Google Cloud

Ilias Katsardis is a Senior Product Manager based in Sunnyvale, CA, driving the future of AI infrastructure at Google Cloud. He is responsible for Cluster Director and the Cluster Toolkit, two key components of Google's supercomputing architecture. Passionate about making large-scale AI and HPC more accessible, Ilias focuses on creating solutions that automate complex configurations and provide a seamless user experience. His work enables researchers and developers to spend less time on infrastructure management and more time on scientific breakthroughs. With a rich background that includes roles at Cray Inc. and ClusterVision, along with founding two tech startups, Ilias brings over 15 years of deep industry expertise to his role.

Author:

Abhijith Prabhudev

Product Manager
Google Cloud

Abhijith Prabhudev is a Product Manager based in Sunnyvale, CA, leading the AI infrastructure observability and monitoring at Google Cloud. He is responsible for GPU infrastructure reliability, monitoring and resiliency capabilities. His work enables researchers and developers to spend less time on infrastructure management and more time on building and training AI models. With over 15+ years of infrastructure industry experience that includes leading VMware vSphere product team and a full stack engineer, Abhijith is passionate about solving infrastructure problems that hinder developer and administrator productivity. 

Abhijith Prabhudev

Product Manager
Google Cloud

Abhijith Prabhudev is a Product Manager based in Sunnyvale, CA, leading the AI infrastructure observability and monitoring at Google Cloud. He is responsible for GPU infrastructure reliability, monitoring and resiliency capabilities. His work enables researchers and developers to spend less time on infrastructure management and more time on building and training AI models. With over 15+ years of infrastructure industry experience that includes leading VMware vSphere product team and a full stack engineer, Abhijith is passionate about solving infrastructure problems that hinder developer and administrator productivity. 

Large language models can now power capable software agents, yet real‑world success comes from disciplined engineering rather than flashy frameworks. Most reliable agents are built from simple, composable patterns instead of heavy abstractions.

The talk will introduce patterns to add complexity and autonomy only when it pays off. Attendees should leave with a practical decision framework for escalating from a single prompt to multi‑step agents, also keeping in mind guardrails for shipping trustworthy, cost‑effective agents at scale. 

Author:

Sushant Mehta

Research Engineer
Google Deepmind

Sushant Mehta

Research Engineer
Google Deepmind

Author:

Sherman Ikemoto

Group Director
Cadence

Sherman Ikemoto is the Sales Development Group Director at Cadence Design Systems, where he leads global business development for the innovative Reality DC Digital Twin solution. With a passion for addressing challenges in data center design, performance, and sustainability, Sherman brings extensive expertise to the forefront of this critical industry. Previously, Sherman served as Managing Director and Board Member at Future Facilities, the pioneer of the original data center Digital Twin, and as North America Sales and Marketing Director at Flomerics, where he helped introduce computational fluid dynamics modeling to electronics cooling design. During his tenure at Future Facilities, Sherman was a sought-after speaker at prominent industry events like ITW, Data Center World, Uptime Symposium, and Data Center Dynamics. Sherman holds a Bachelor of Science in Mechanical Engineering (BSME) from San Jose State University, where he was a member of the Tau Beta Pi engineering honor society, and a Master of Science in Mechanical Engineering (MSME) from Santa Clara University. His career reflects a deep commitment to advancing sustainable and efficient technologies for the data center industry.

Sherman Ikemoto

Group Director
Cadence

Sherman Ikemoto is the Sales Development Group Director at Cadence Design Systems, where he leads global business development for the innovative Reality DC Digital Twin solution. With a passion for addressing challenges in data center design, performance, and sustainability, Sherman brings extensive expertise to the forefront of this critical industry. Previously, Sherman served as Managing Director and Board Member at Future Facilities, the pioneer of the original data center Digital Twin, and as North America Sales and Marketing Director at Flomerics, where he helped introduce computational fluid dynamics modeling to electronics cooling design. During his tenure at Future Facilities, Sherman was a sought-after speaker at prominent industry events like ITW, Data Center World, Uptime Symposium, and Data Center Dynamics. Sherman holds a Bachelor of Science in Mechanical Engineering (BSME) from San Jose State University, where he was a member of the Tau Beta Pi engineering honor society, and a Master of Science in Mechanical Engineering (MSME) from Santa Clara University. His career reflects a deep commitment to advancing sustainable and efficient technologies for the data center industry.

Revterra’s Kinetic Stabilizer is engineered to handle the massive and volatile power swings demanded by large-scale AI workloads. AI is bottlenecked by infrastructure and requires a rapidly scalable, high-performance power quality solution that can be deployed without fear of supply chain disruption. Our battery-free technology provides a stable bridge between the grid and AI loads with a physically instantaneous, passive response—no power electronics required. Unlike conventional solutions, the Kinetic Stabilizer offers unmatched cost-effectiveness on a per-kW basis and a functionally infinite cycle life, free from the constraints of chemical storage.

Author:

Ben Jawdat

Founder & CEO
Revterra

Ben Jawdat is the founding CEO of Revterra, where he is working to commercialize a kinetic stabilizer solution to solve power quality challenges at AI datacenters and other commercial/industrial sites. Prior to starting Revterra, he worked on the development of new superconducting materials at the University of Houston where he received his PhD in physics, and completed postdoctoral studies at the Air Force Research Laboratory and Rice University.

Ben Jawdat

Founder & CEO
Revterra

Ben Jawdat is the founding CEO of Revterra, where he is working to commercialize a kinetic stabilizer solution to solve power quality challenges at AI datacenters and other commercial/industrial sites. Prior to starting Revterra, he worked on the development of new superconducting materials at the University of Houston where he received his PhD in physics, and completed postdoctoral studies at the Air Force Research Laboratory and Rice University.