inference Archives - AI News

Dave Barnett, Cloudflare: Delivering speed and security in the AI era

Ryan Daws — Fri, 13 Oct 2023 15:39:34 +0000

AI News sat down with Dave Barnett, Head of SASE at Cloudflare, during Cyber Security & Cloud Expo Europe to delve into how the firm uses its cloud-native architecture to deliver speed and security in the AI era.

According to Barnett, Cloudflare’s cloud-native approach allows the company to continually innovate in the digital space. Notably, a significant portion of their services are offered to consumers for free.

“We continuously reinvent, we’re very comfortable in the digital space. We’re very proud that the vast majority of our customers actually consume our services for free because it’s our way of giving back to society,” said Barnett.

Barnett also revealed Cloudflare’s focus on AI during their anniversary week. The company aims to enable organisations to consume AI securely and make it accessible to everyone. Barnett says that Cloudflare achieves those goals in three key ways.

“One, as I mentioned, is operating AI inference engines within Cloudflare close to consumers’ eyeballs. The second area is securing the use of AI within the workplace, because, you know, AI has some incredibly positive impacts on people … but the problem is there are some data protection requirements around that,” explains Barnett.

“Finally, is the question of, ‘Could AI be used by the bad guys against the good guys?’ and that’s an area that we’re continuing to explore.”

Just a day earlier, AI News heard from Raviv Raz, Cloud Security Manager at ING, during a session at the expo that focused on the alarming potential of AI-powered cybercrime.

Regarding security models, Barnett discussed the evolution of the zero-trust concept, emphasising its practical applications in enhancing both usability and security. Cloudflare’s own journey with zero-trust began with a focus on usability, leading to the development of its own zero-trust network access products.

“We have servers everywhere and engineers everywhere that need to reboot those servers. In 2015, that involved VPNs and two-factor authentication… so we built our own zero-trust network access product for our own use that meant the user experiences for engineers rebooting servers in far-flung places was a lot better,” says Barnett.

“After 2015, the world started to realise that this approach had great security benefits so we developed that product and launched it in 2018 as Cloudflare Access.”

Cloudflare’s innovative strides also include leveraging NVIDIA GPUs to accelerate machine learning AI tasks on an edge network. This technology enables organisations to run inference tasks – such as image recognition – close to end-users, ensuring low latency and optimal performance.

“We launched Workers AI, which means that organisations around the world – in fact, individuals as well – can run their inference tasks at a very close place to where the consumers of that inference are,” explains Barnett.

“You could ask a question, ‘Cat or not cat?’, to a trained cat detection engine very close to the people that need it. We’re doing that in a way that makes it easily accessible to organisations looking to use AI to benefit their business.”

For developers interested in AI, Barnett outlined Cloudflare’s role in supporting the deployment of machine learning models. While machine learning training is typically conducted outside Cloudflare, the company excels in providing low-latency inference engines that are essential for real-time applications like image recognition.

Our conversation with Barnett shed light on Cloudflare’s commitment to cloud-native architecture, AI accessibility, and cybersecurity. As the industry continues to advance, Cloudflare remains at the forefront of delivering speed and security in the AI era.

You can watch our full interview with Dave Barnett below:

(Photo by ryan baker on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Cyber Security & Cloud Expo, Edge Computing Expo, and Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Dave Barnett, Cloudflare: Delivering speed and security in the AI era appeared first on AI News.

MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks

Ryan Daws — Tue, 12 Sep 2023 11:46:58 +0000

The latest release of MLPerf Inference introduces new LLM and recommendation benchmarks, marking a leap forward in the realm of AI testing.

The v3.1 iteration of the benchmark suite has seen record participation, boasting over 13,500 performance results and delivering up to a 40 percent improvement in performance.

What sets this achievement apart is the diverse pool of 26 different submitters and over 2,000 power results, demonstrating the broad spectrum of industry players investing in AI innovation.

Among the list of submitters are tech giants like Google, Intel, and NVIDIA, as well as newcomers Connect Tech, Nutanix, Oracle, and TTA, who are participating in the MLPerf Inference benchmark for the first time.

David Kanter, Executive Director of MLCommons, highlighted the significance of this achievement:

“Submitting to MLPerf is not trivial. It’s a significant accomplishment, as this is not a simple point-and-click benchmark. It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.”

MLPerf Inference is a critical benchmark suite that measures the speed at which AI systems can execute models in various deployment scenarios. These scenarios span from the latest generative AI chatbots to the safety-enhancing features in vehicles, such as automatic lane-keeping and speech-to-text interfaces.

The spotlight of MLPerf Inference v3.1 shines on the introduction of two new benchmarks:

An LLM utilising the GPT-J reference model to summarise CNN news articles garnered submissions from 15 different participants, showcasing the rapid adoption of generative AI.

An updated recommender benchmark – refined to align more closely with industry practices – employs the DLRM-DCNv2 reference model and larger datasets, attracting nine submissions. These new benchmarks are designed to push the boundaries of AI and ensure that industry-standard benchmarks remain aligned with the latest trends in AI adoption, serving as a valuable guide for customers, vendors, and researchers alike.

Mitchelle Rasquinha, co-chair of the MLPerf Inference Working Group, commented: “The submissions for MLPerf Inference v3.1 are indicative of a wide range of accelerators being developed to serve ML workloads.

“The current benchmark suite has broad coverage among ML domains, and the most recent addition of GPT-J is a welcome contribution to the generative AI space. The results should be very helpful to users when selecting the best accelerators for their respective domains.”

MLPerf Inference benchmarks primarily focus on datacenter and edge systems. The v3.1 submissions showcase various processors and accelerators across use cases in computer vision, recommender systems, and language processing.

The benchmark suite encompasses both open and closed submissions in the performance, power, and networking categories. Closed submissions employ the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models.

As AI continues to permeate various aspects of our lives, MLPerf’s benchmarks serve as vital tools for evaluating and shaping the future of AI technology.

Find the detailed results of MLPerf Inference v3.1 here.

(Photo by Mauro Sbicego on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks appeared first on AI News.

NVIDIA sets another AI inference record in MLPerf

Ryan Daws — Thu, 22 Oct 2020 09:16:41 +0000

NVIDIA has set yet another record for AI inference in MLPerf with its A100 Tensor Core GPUs.

MLPerf consists of five inference benchmarks which cover the main three AI applications today: image classification, object detection, and translation.

“Industry-standard MLPerf benchmarks provide relevant performance data on widely used AI networks and help make informed AI platform buying decisions,” said Rangan Majumder, VP of Search and AI at Microsoft.

Last year, NVIDIA led all five benchmarks for both server and offline data centre scenarios with its Turing GPUs. A dozen companies participated.

23 companies participated in this year’s MLPerf but NVIDIA maintained its lead with the A100 outperforming CPUs by up to 237x in data centre inference.

For perspective, NVIDIA notes that a single NVIDIA DGX A100 system – with eight A100 GPUs – provides the same performance as nearly 1,000 dual-socket CPU servers on some AI applications.

“We’re at a tipping point as every industry seeks better ways to apply AI to offer new services and grow their business,” said Ian Buck, Vice President of Accelerated Computing at NVIDIA.

“The work we’ve done to achieve these results on MLPerf gives companies a new level of AI performance to improve our everyday lives.”

The widespread availability of NVIDIA’s AI platform through every major cloud and data centre infrastructure provider is unlocking huge potential for companies across various industries to improve their operations.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post NVIDIA sets another AI inference record in MLPerf appeared first on AI News.

NVIDIA’s AI-focused Ampere GPUs are now available in Google Cloud

Ryan Daws — Wed, 08 Jul 2020 10:56:12 +0000

Google Cloud users can now harness the power of NVIDIA’s Ampere GPUs for their AI workloads.

The specific GPU added to Google Cloud is the NVIDIA A100 Tensor Core which was announced just last month. NVIDIA says the A100 “has come to the cloud faster than any NVIDIA GPU in history.”

NVIDIA claims the A100 boosts training and inference performance by up to 20x over its predecessors. Large AI models like BERT can be trained in just 37 minutes on a cluster of 1,024 A100s.

For those who enjoy their measurements in teraflops (TFLOPS), the A100 delivers around 19.5 TFLOPS in single-precision performance and 156 TFLOPS for Tensor Float 32 workloads.

Manish Sainani, Director of Product Management at Google Cloud, said:

“Google Cloud customers often look to us to provide the latest hardware and software services to help them drive innovation on AI and scientific computing workloads.
With our new A2 VM family, we are proud to be the first major cloud provider to market NVIDIA A100 GPUs, just as we were with NVIDIA T4 GPUs. We are excited to see what our customers will do with these new capabilities.”

The announcement couldn’t have arrived at a better time – with many looking to harness AI for solutions to the COVID-19 pandemic, in addition to other global challenges such as climate change.

Aside from AI training and inference, other things customers will be able to achieve with the new capabilities include data analytics, scientific computing, genomics, edge video analytics, and 5G services.

The new Ampere-based data center GPUs are now available in Alpha on Google Cloud. Users can access instances of up to 16 A100 GPUs, which provides a total of 640GB of GPU memory and 1.3TB of system memory.

You can register your interest for access here.

The post NVIDIA’s AI-focused Ampere GPUs are now available in Google Cloud appeared first on AI News.