mlperf Archives - AI News

MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks

Ryan Daws — Tue, 12 Sep 2023 11:46:58 +0000

The latest release of MLPerf Inference introduces new LLM and recommendation benchmarks, marking a leap forward in the realm of AI testing.

The v3.1 iteration of the benchmark suite has seen record participation, boasting over 13,500 performance results and delivering up to a 40 percent improvement in performance.

What sets this achievement apart is the diverse pool of 26 different submitters and over 2,000 power results, demonstrating the broad spectrum of industry players investing in AI innovation.

Among the list of submitters are tech giants like Google, Intel, and NVIDIA, as well as newcomers Connect Tech, Nutanix, Oracle, and TTA, who are participating in the MLPerf Inference benchmark for the first time.

David Kanter, Executive Director of MLCommons, highlighted the significance of this achievement:

“Submitting to MLPerf is not trivial. It’s a significant accomplishment, as this is not a simple point-and-click benchmark. It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.”

MLPerf Inference is a critical benchmark suite that measures the speed at which AI systems can execute models in various deployment scenarios. These scenarios span from the latest generative AI chatbots to the safety-enhancing features in vehicles, such as automatic lane-keeping and speech-to-text interfaces.

The spotlight of MLPerf Inference v3.1 shines on the introduction of two new benchmarks:

An LLM utilising the GPT-J reference model to summarise CNN news articles garnered submissions from 15 different participants, showcasing the rapid adoption of generative AI.

An updated recommender benchmark – refined to align more closely with industry practices – employs the DLRM-DCNv2 reference model and larger datasets, attracting nine submissions. These new benchmarks are designed to push the boundaries of AI and ensure that industry-standard benchmarks remain aligned with the latest trends in AI adoption, serving as a valuable guide for customers, vendors, and researchers alike.

Mitchelle Rasquinha, co-chair of the MLPerf Inference Working Group, commented: “The submissions for MLPerf Inference v3.1 are indicative of a wide range of accelerators being developed to serve ML workloads.

“The current benchmark suite has broad coverage among ML domains, and the most recent addition of GPT-J is a welcome contribution to the generative AI space. The results should be very helpful to users when selecting the best accelerators for their respective domains.”

MLPerf Inference benchmarks primarily focus on datacenter and edge systems. The v3.1 submissions showcase various processors and accelerators across use cases in computer vision, recommender systems, and language processing.

The benchmark suite encompasses both open and closed submissions in the performance, power, and networking categories. Closed submissions employ the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models.

As AI continues to permeate various aspects of our lives, MLPerf’s benchmarks serve as vital tools for evaluating and shaping the future of AI technology.

Find the detailed results of MLPerf Inference v3.1 here.

(Photo by Mauro Sbicego on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks appeared first on AI News.

Azure and NVIDIA deliver next-gen GPU acceleration for AI

Ryan Daws — Wed, 09 Aug 2023 15:47:51 +0000

Microsoft Azure users are now able to harness the latest advancements in NVIDIA’s accelerated computing technology, revolutionising the training and deployment of their generative AI applications.

The integration of Azure ND H100 v5 virtual machines (VMs) with NVIDIA H100 Tensor Core GPUs and Quantum-2 InfiniBand networking promises seamless scaling of generative AI and high-performance computing applications, all at the click of a button.

This cutting-edge collaboration comes at a pivotal moment when developers and researchers are actively exploring the potential of large language models (LLMs) and accelerated computing to unlock novel consumer and business use cases.

NVIDIA’s H100 GPU achieves supercomputing-class performance through an array of architectural innovations. These include fourth-generation Tensor Cores, a new Transformer Engine for enhanced LLM acceleration, and NVLink technology that propels inter-GPU communication to unprecedented speeds of 900GB/sec.

The integration of the NVIDIA Quantum-2 CX7 InfiniBand – boasting 3,200 Gbps cross-node bandwidth – ensures flawless performance across GPUs, even at massive scales. This capability positions the technology on par with the computational capabilities of the world’s most advanced supercomputers.

The newly introduced ND H100 v5 VMs hold immense potential for training and inferring increasingly intricate LLMs and computer vision models. These neural networks power the most complex and compute-intensive generative AI applications, spanning from question answering and code generation to audio, video, image synthesis, and speech recognition.

A standout feature of the ND H100 v5 VMs is their ability to achieve up to a 2x speedup in LLM inference, notably demonstrated by the BLOOM 175B model when compared to previous generation instances. This performance boost underscores their capacity to optimise AI applications further, fueling innovation across industries.

The synergy between NVIDIA H100 Tensor Core GPUs and Microsoft Azure empowers enterprises with unparalleled AI training and inference capabilities. This partnership also streamlines the development and deployment of production AI, bolstered by the integration of the NVIDIA AI Enterprise software suite and Azure Machine Learning for MLOps.

The combined efforts have led to groundbreaking AI performance, as validated by industry-standard MLPerf benchmarks:

The integration of the NVIDIA Omniverse platform with Azure extends the reach of this collaboration further, providing users with everything they need for industrial digitalisation and AI supercomputing.

(Image Credit: Uwe Hoh from Pixabay)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Azure and NVIDIA deliver next-gen GPU acceleration for AI appeared first on AI News.

MLCommons releases latest MLPerf Training benchmark results

Ryan Daws — Wed, 30 Jun 2021 18:00:00 +0000

Open engineering consortium MLCommons has released its latest MLPerf Training community benchmark results.

MLPerf Training is a full system benchmark that tests machine learning models, software, and hardware.

The results are split into two divisions: closed and open. Closed submissions are better for comparing like-for-like performance as they use the same reference model to ensure a level playing field. Open submissions, meanwhile, allow participants to submit a variety of models.

In the image classification benchmark, Google is the winner with its preview tpu-v4-6912 system that uses an incredible 1728 AMD Rome processors and 3456 TPU accelerators. Google’s system completed the benchmark in just 23 seconds.

“We showcased the record-setting performance and scalability of our fourth-generation Tensor Processing Units (TPU v4), along with the versatility of our machine learning frameworks and accompanying software stack. Best of all, these capabilities will soon be available to our cloud customers,” Google said.

“We achieved a roughly 1.7x improvement in our top-line submissions compared to last year’s results using new, large-scale TPU v4 Pods with 4,096 TPU v4 chips each. Using 3,456 TPU v4 chips in a single TPU v4 Pod slice, many models that once trained in days or weeks now train in a few seconds.”

Of the systems that are available on-premise, NVIDIA’s dgxa100_n310_ngc21.05_mxnet system came out on top with its 620 AMD EPYC 7742 processors and 2480 NVIDIA A100-SXM4-80GB (400W) accelerators completing the benchmark in 40 seconds.

“In the last 2.5 years since the first MLPerf training benchmark launched, NVIDIA performance has increased by up to 6.5x per GPU, increasing by up to 2.1x with A100 from the last round,” said NVIDIA.

“We demonstrated scaling to 4096 GPUs which enabled us to train all benchmarks in less than 16 minutes and 4 out of 8 in less than a minute. The NVIDIA platform excels in both performance and usability, offering a single leadership platform from data centre to edge to cloud.”

Across the board, MLCommons says that benchmark results have improved by up to 2.1x compared to the last submission round. This shows the incredible advancements that are being made in hardware, software, and system scale.

Victor Bittorf, Co-Chair of the MLPerf Training Working Group, said:

“We’re thrilled to see the continued growth and enthusiasm from the MLPerf community, especially as we’re able to measure significant improvement across the industry with the MLPerf Training benchmark suite.
Congratulations to all of our submitters in this v1.0 round – we’re excited to continue our work together, bringing transparency across machine learning system capabilities.”

For its latest benchmark, MLCommons added two new benchmarks for measuring the performance of performance for speech-to-text and 3D medical imaging. These new benchmarks leverage the following reference models:

Speech-to-Text with RNN-T: RNN-T: Recurrent Neural Network Transducer is an automatic speech recognition (ASR) model that is trained on a subset of LibriSpeech. Given a sequence of speech input, it predicts the corresponding text. RNN-T is MLCommons’ reference model and commonly used in production for speech-to-text systems.

3D Medical Imaging with 3D U-Net: The 3D U-Net architecture is trained on the KiTS 19 dataset to find and segment cancerous cells in the kidneys. The model identifies whether each voxel within a CT scan belongs to a healthy tissue or a tumour, and is representative of many medical imaging tasks.

“The training benchmark suite is at the centre of MLCommon’s mission to push machine learning innovation forward for everyone, and we’re incredibly pleased with the engagement from this round’s submissions,” commented John Tran, Co-Chair of the MLPerf Training Working Group.

The full MLPerf Training benchmark results can be explored here.

(Photo by Alora Griffiths on Unsplash)

Find out more about Digital Transformation Week North America, taking place on November 9-10 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

The post MLCommons releases latest MLPerf Training benchmark results appeared first on AI News.

NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud

Ryan Daws — Tue, 03 Nov 2020 15:55:37 +0000

NVIDIA’s A100 set a new record in the MLPerf benchmark last month and now it’s accessible through Amazon’s cloud.

Amazon Web Services (AWS) first launched a GPU instance 10 years ago with the NVIDIA M2050. It’s rather poetic that, a decade on, NVIDIA is now providing AWS with the hardware to power the next generation of groundbreaking innovations.

The A100 outperformed CPUs in this year’s MLPerf by up to 237x in data centre inference. A single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA, following the benchmark results.

Businesses can access the A100 in AWS’ P4d instance. NVIDIA claims the instances reduce the time to train machine learning models by up to 3x with FP16 and up to 6x with TF32 compared to the default FP32 precision.

Each P4d instance features eight NVIDIA A100 GPUs. If even more performance is required, customers are able to access over 4,000 GPUs at a time using AWS’s Elastic Fabric Adaptor (EFA).

Dave Brown, Vice President of EC2 at AWS, said:

“The pace at which our customers have used AWS services to build, train, and deploy machine learning applications has been extraordinary. At the same time, we have heard from those customers that they want an even lower-cost way to train their massive machine learning models.
Now, with EC2 UltraClusters of P4d instances powered by NVIDIA’s latest A100 GPUs and petabit-scale networking, we’re making supercomputing-class performance available to virtually everyone, while reducing the time to train machine learning models by 3x, and lowering the cost to train by up to 60% compared to previous generation instances.”

P4d supports 400Gbps networking and makes use of NVIDIA’s technologies including NVLink, NVSwitch, NCCL, and GPUDirect RDMA to further accelerate deep learning training workloads.

Some of AWS’ customers across various industries have already begun exploring how the P4d instance can help their business.

Karley Yoder, VP & GM of Artificial Intelligence at GE Healthcare, commented:

“Our medical imaging devices generate massive amounts of data that need to be processed by our data scientists. With previous GPU clusters, it would take days to train complex AI models, such as Progressive GANs, for simulations and view the results.
Using the new P4d instances reduced processing time from days to hours. We saw two- to three-times greater speed on training models with various image sizes while achieving better performance with increased batch size and higher productivity with a faster model development cycle.”

For an example from a different industry, the research arm of Toyota is exploring how P4d can improve their existing work in developing self-driving vehicles and groundbreaking new robotics.

“The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours,” explained Mike Garrison, Technical Lead of Infrastructure Engineering at Toyota Research Institute.

“We are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”

P4d instances are currently available in the US East (N. Virginia) and US West (Oregon) regions. AWS says further availability is planned soon.

You can find out more about P4d instances and how to get started here.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post NVIDIA chucks its MLPerf-leading A100 GPU into Amazon’s cloud appeared first on AI News.

NVIDIA sets another AI inference record in MLPerf

Ryan Daws — Thu, 22 Oct 2020 09:16:41 +0000

NVIDIA has set yet another record for AI inference in MLPerf with its A100 Tensor Core GPUs.

MLPerf consists of five inference benchmarks which cover the main three AI applications today: image classification, object detection, and translation.

“Industry-standard MLPerf benchmarks provide relevant performance data on widely used AI networks and help make informed AI platform buying decisions,” said Rangan Majumder, VP of Search and AI at Microsoft.

Last year, NVIDIA led all five benchmarks for both server and offline data centre scenarios with its Turing GPUs. A dozen companies participated.

23 companies participated in this year’s MLPerf but NVIDIA maintained its lead with the A100 outperforming CPUs by up to 237x in data centre inference.

For perspective, NVIDIA notes that a single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA.

“The work we’ve done to achieve these results on MLPerf gives companies a new level of AI performance to improve our everyday lives.”

The widespread availability of NVIDIA’s AI platform through every major cloud and data centre infrastructure provider is unlocking huge potential for companies across various industries to improve their operations.

The post NVIDIA sets another AI inference record in MLPerf appeared first on AI News.

Nvidia comes out on top in first MLPerf inference benchmarks

Ryan Daws — Thu, 07 Nov 2019 11:19:57 +0000

The first benchmark results from the MLPerf consortium have been released and Nvidia is a clear winner for inference performance.

For those unaware, inference takes a deep learning model and processes incoming data however it’s been trained to.

MLPerf is a consortium which aims to provide “fair and useful” standardised benchmarks for inference performance. MLPerf can be thought of as doing for inference what SPEC does for benchmarking CPUs and general system performance.

The consortium has released its first benchmarking results, a painstaking effort involving over 30 companies and over 200 engineers and practitioners. MLPerf’s first call for submissions led to over 600 measurements spanning 14 companies and 44 systems.

However, for datacentre inference, only four of the processors are commercially-available:

Intel Xeon P9282
Habana Goya
Google TPUv3
Nvidia Turing

Nvidia wasted no time in boasting of its performance beating the three other processors across various neural networks in both server and offline scenarios:

The easiest direct comparisons are possible in the ImageNet ResNet-50 v1.6 offline scenario where the greatest number of major players and startups submitted results.

In that scenario, Nvidia once again boasted the best performance on a per-processor basis with its Titan RTX GPU. Despite the 2x Google Cloud TPU v3-8 submission using eight Intel Skylake processors, it had a similar performance to the SCAN 3XS DBP T496X2 Fluid which used four Titan RTX cards (65,431.40 vs 66,250.40 inputs/second).

Ian Buck, GM and VP of Accelerated Computing at NVIDIA, said:

“AI is at a tipping point as it moves swiftly from research to large-scale deployment for real applications.
AI inference is a tremendous computational challenge. Combining the industry’s most advanced programmable accelerator, the CUDA-X suite of AI algorithms and our deep expertise in AI computing, NVIDIA can help datacentres deploy their large and growing body of complex AI models.”

However, it’s worth noting that the Titan RTX doesn’t support ECC memory so – despite its sterling performance – this omission may prevent its use in some datacentres.

Another interesting takeaway when comparing the Cloud TPU results against Nvidia is the performance difference when moving from offline to server scenarios.

Google Cloud TPU v3 offline: 32,716.00
Google Cloud TPU v3 server: 16,014.29
Nvidia SCAN 3XS DBP T496X2 Fluid offline: 66,250.40
Nvidia SCAN 3XS DBP T496X2 Fluid server: 60,030.57

As you can see, the Cloud TPU system performance is slashed by over a half when used in a server scenario. The SCAN 3XS DBP T496X2 Fluid system performance only drops around 10 percent in comparison.

You can peruse MLPerf’s full benchmark results here.

The post Nvidia comes out on top in first MLPerf inference benchmarks appeared first on AI News.