testing Archives - AI News

MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks

Ryan Daws — Tue, 12 Sep 2023 11:46:58 +0000

The latest release of MLPerf Inference introduces new LLM and recommendation benchmarks, marking a leap forward in the realm of AI testing.

The v3.1 iteration of the benchmark suite has seen record participation, boasting over 13,500 performance results and delivering up to a 40 percent improvement in performance.

What sets this achievement apart is the diverse pool of 26 different submitters and over 2,000 power results, demonstrating the broad spectrum of industry players investing in AI innovation.

Among the list of submitters are tech giants like Google, Intel, and NVIDIA, as well as newcomers Connect Tech, Nutanix, Oracle, and TTA, who are participating in the MLPerf Inference benchmark for the first time.

David Kanter, Executive Director of MLCommons, highlighted the significance of this achievement:

“Submitting to MLPerf is not trivial. It’s a significant accomplishment, as this is not a simple point-and-click benchmark. It requires real engineering work and is a testament to our submitters’ commitment to AI, to their customers, and to ML.”

MLPerf Inference is a critical benchmark suite that measures the speed at which AI systems can execute models in various deployment scenarios. These scenarios span from the latest generative AI chatbots to the safety-enhancing features in vehicles, such as automatic lane-keeping and speech-to-text interfaces.

The spotlight of MLPerf Inference v3.1 shines on the introduction of two new benchmarks:

An LLM utilising the GPT-J reference model to summarise CNN news articles garnered submissions from 15 different participants, showcasing the rapid adoption of generative AI.

An updated recommender benchmark – refined to align more closely with industry practices – employs the DLRM-DCNv2 reference model and larger datasets, attracting nine submissions. These new benchmarks are designed to push the boundaries of AI and ensure that industry-standard benchmarks remain aligned with the latest trends in AI adoption, serving as a valuable guide for customers, vendors, and researchers alike.

Mitchelle Rasquinha, co-chair of the MLPerf Inference Working Group, commented: “The submissions for MLPerf Inference v3.1 are indicative of a wide range of accelerators being developed to serve ML workloads.

“The current benchmark suite has broad coverage among ML domains, and the most recent addition of GPT-J is a welcome contribution to the generative AI space. The results should be very helpful to users when selecting the best accelerators for their respective domains.”

MLPerf Inference benchmarks primarily focus on datacenter and edge systems. The v3.1 submissions showcase various processors and accelerators across use cases in computer vision, recommender systems, and language processing.

The benchmark suite encompasses both open and closed submissions in the performance, power, and networking categories. Closed submissions employ the same reference model to ensure a level playing field across systems, while participants in the open division are permitted to submit a variety of models.

As AI continues to permeate various aspects of our lives, MLPerf’s benchmarks serve as vital tools for evaluating and shaping the future of AI technology.

Find the detailed results of MLPerf Inference v3.1 here.

(Photo by Mauro Sbicego on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post MLPerf Inference v3.1 introduces new LLM and recommendation benchmarks appeared first on AI News.

How AI can transform the way enterprises test digital experiences

Mudit Singh — Thu, 06 Apr 2023 16:53:09 +0000

The digital world is evolving rapidly, and businesses need to keep up if they want to offer their customers high-quality digital experiences that meet their needs and expectations. Testing is a crucial component of the digital experience since it makes sure that digital goods and services meet the standards for usability, functionality, and quality.

With the ability to automate and streamline the testing process, enhance accuracy, and reduce costs and time spent on it, artificial intelligence (AI) has the potential to revolutionize how organizations test digital experiences in the coming years. Here are a few important ways AI can transform testing:

Automating the testing process

AI has transformed how businesses test their digital experiences. The application of machine learning algorithms and predictive analytics enables AI-powered testing solutions to simulate user behavior, develop test cases, and execute tests automatically. This automation assists organizations in saving time and costs, reducing errors, and increasing the accuracy of their testing operations.

AI-powered testing systems can automatically build test cases based on established rules or by studying user behavior, guaranteeing that the digital experience fulfils the needs and expectations of customers.

Another area where AI-powered testing solutions shine is test execution. By modeling user behavior, engaging with the digital product, and reporting results, AI can automate the test execution process. It can automatically detect problems, track flaws, and generate reports.

AI can also automate regression testing, which entails testing a digital product after modifications are made to ensure that the changes did not bring new faults. AI can detect portions of a digital product that require regression testing, build test cases, and automatically execute tests.

Another area where AI can aid automated testing is performance testing. AI can automatically simulate user behavior, generate load, and monitor system performance, finding performance issues and bottlenecks.

Finally, AI can enable continuous testing to ensure that it satisfies the appropriate quality, functionality, and user experience criteria.

Optimizing the testing process

Test prioritization is one of the most critical ways that AI may improve the testing process. AI can assess testing data to select tests based on their importance and chance of detecting flaws. This allows organizations to concentrate their testing efforts on the most critical areas, saving time and resources.

Test optimization is another method AI can be used to improve the testing process. It can examine testing data to discover redundant exams that can be deleted to enhance efficiency.

AI can automate test environment creation and configuration, ensuring that the proper environment is available at the right time. Furthermore, AI can develop synthetic test data, automate test data creation and maintenance, and assure data privacy and security.

Finally, artificial intelligence (AI) may evaluate testing results to find patterns and trends, deliver insights like areas where testing needs to be improved, recommend new tests to be added to the testing suite, and suggest process changes.

Improving accuracy

AI can increase test accuracy in a variety of ways. One way is through its ability to swiftly and accurately evaluate large amounts of data. AI can spot patterns, trends, and potential flaws in testing data that people may miss. This helps to ensure that all potential problems are recognized, lowering the chance of releasing a product with unknown flaws.

AI can also help increase testing accuracy by automating the testing process. This improves testing accuracy by ensuring that all tests are conducted consistently and accurately, lowering the possibility of unforeseen faults. AI can decrease the danger of human error and reduce the time and effort required by automating testing.

Reducing time and costs

AI can drastically cut the time and expense of testing in various ways. By automating repetitive and time-consuming processes such as test case generation, execution, and defect identification, AI can free up testers to focus on more complicated duties. This can improve the testing process’s efficiency and minimize the time and expense associated with testing.

Conclusion

Businesses can benefit from AI by testing digital interactions more efficiently, precisely, and affordably. Businesses can then provide high-quality digital experiences that satisfy customers’ expectations and demands. Additionally, businesses can gain a competitive advantage in the rapidly expanding digital market by using AI-powered testing tools.

The post How AI can transform the way enterprises test digital experiences appeared first on AI News.

Xbox’s Matt Booty wants to see QA testers replaced by AI

Ryan Daws — Wed, 07 Sep 2022 09:44:59 +0000

Xbox Game Studios head Matt Booty told a live audience at PAX that he’d like to see QA (Quality Assurance) testers replaced by AI.

Gaming QA testers regularly come under fire as bug-riddled games reach the hands of disappointed gamers with seemingly increasing prevalence.

The backlash QA testers receive from frustrated gamers is often misplaced. Managers are often aware that games are full of bugs but decide to release unfinished games anyway in the hope of starting to recoup development costs.

However, proper QA testing is a time-consuming and resource-intensive process. On the PAX stage, Booty explained how a game needs to be re-tested from start to finish whenever a feature is added.

“Some of the processes that we have [currently] haven’t really kept up with how quickly we can make content. One of those is testing,” Booty said.

“There’s a lot going on with AI and ML right now, and people using AI to generate all these images. What I always say when I bump into the AI folks, is: ‘Help me figure out how to use an AI bot to go test a game.’”

If bugs can be caught earlier in the development process, they have a greater chance of being fixed prior to management feeling the financial pressure to send the game to market regardless.

The full ‘Storytime with Matt Booty’ talk can be viewed here. At around 01:02:00, you can hear Booty explain his vision for AI in game testing.

“I would love to be able to start up ten thousand instances of a game in the cloud – so there are ten thousand copies of the game running – deploy an AI bot to spend all night testing that game, and in the morning we get a report. That would be transformational,” explains Booty.

Booty’s core idea of improving QA testing with AI is solid. However, the way it was presented – of replacing human testers – may not go down too well. Mistreatment of QA testers in the gaming industry has led to a growing call for their members to unionise.

(Photo by Jose Gil on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Xbox’s Matt Booty wants to see QA testers replaced by AI appeared first on AI News.

Eggplant launches AI-powered software testing in the cloud

Ryan Daws — Tue, 06 Oct 2020 11:11:17 +0000

Automation specialists Eggplant have launched a new AI-powered software testing platform.

The cloud-based solution aims to help accelerate the delivery of software in a rapidly-changing world while maintaining a high bar of quality.

Gareth Smith, CTO of Eggplant, said:

“The launch of our cloud platform is a significant milestone in our mission to rid the world of bad software. In our new normal, delivering speed and agility at scale has never been more critical.
Every business can easily tap into Eggplants’ AI-powered automation platform to accelerate the pace of delivery while ensuring a high-quality digital experience.”

Enterprises have accelerated their shift to the cloud due to the pandemic and resulting increases in things such as home working.

Recent research from Centrify found that 51 percent of businesses which embraced a cloud-first model were able to handle the challenges presented by COVID-19 far more effectively.

Eggplant’s Digital Automation Intelligence (DAI) Platform features:

Cloud-based end-to-end automation: The scalable fusion engine provides frictionless and efficient continuous and parallel end-to-end testing in the cloud, for any apps and websites, and on any target platforms.
Monitoring insights: The addition of advanced user experience (UX) data points and metrics, enables customers to benchmark their applications UX performance. These insights, added to the UX behaviour helps improve SEO.
Fully automated self-healing test assets: The use of AI identifies the tests needed and builds and runs them automatically, under full user control. These tests are self-healing, and automatically adapt as the system-under-test evolves.

The solution helps to support the “citizen developer” movement—using AI to enable no-code/low-code development for people with minimal programming knowledge.

Both cloud and AI ranked highly in a recent study (PDF) by Deloitte of the most relevant technologies “to operate in the new normal”. Cloud and cybersecurity were joint first with 80 percent of respondents, followed by cognitive and AI tools (73%) and the IoT (65%).

Eggplant’s combination of AI and cloud technologies should help businesses to deal with COVID-19’s unique challenges and beyond.

(Photo by CHUTTERSNAP on Unsplash)

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post Eggplant launches AI-powered software testing in the cloud appeared first on AI News.

Applause’s new AI solution helps tackle bias and sources data at scale

Ryan Daws — Wed, 06 Nov 2019 14:00:44 +0000

Testing specialists Applause have debuted an AI solution promising to help tackle algorithmic bias while providing the scale of data needed for robust training.

Applause has built a vast global community of testers for its app testing solution which is trusted by brands including Google, Uber, PayPal, and more. The company is leveraging this relatively unique asset to help overcome some of the biggest hurdles facing AI development.

AI News spoke with Kristin Simonini, VP of Product at Applause, about the company’s new solution and what it means for the industry ahead of her keynote at AI Expo North America later this month.

“Our customers have been needing additional support from us in the area of data collection to support their AI developments, train their system, and then test the functionality,” explains Simonini. “That latter part being more in-line with what they traditionally expect from us.”

Applause has worked predominantly with companies in the voice space but also their increasing expansion into things such as gathering and labelling images and running documents through OCR.

This existing breadth of experience in areas where AI is most commonly applied today puts the company and its testers in a good position to offer truly useful feedback on where improvements can be made.

Specifically, Applause’s new solution operates across five unique types of AI engagements:

Voice: Source utterances to train voice-enabled devices, and test those devices to ensure they understand and respond accurately.
OCR (Optimized Character Recognition): Provide documents and corresponding text to train algorithms to recognize text, and compare printed docs and the recognized text for accuracy.
Image Recognition: Deliver photos taken of predefined objects and locations, and ensure objects are being recognized and identified correctly.
Biometrics: Source biometric inputs like faces and fingerprints, and test whether those inputs result in an experience that’s easy to use and actually works
Chatbots: Give sample questions and varying intents for chatbots to answer, and interact with chatbots to ensure they understand and respond accurately in a human-like way.

“We have this ready global community that’s in a position to pull together whatever information an organisation might be looking for, do it at scale, and do it with that breadth and depth – in terms of locations, genders, races, devices, and all types of conditions – that make it possible to pull in a very diverse set of data to train an AI system.”

Some examples Simonini provides of the types of training data which Applause’s global testers can supply includes voice utterances, specific documents, and images which meet set criteria like “street corners” or “cats”. A lack of such niche data sets with the diversity necessary is one of the biggest obstacles faced today and one which Applause hopes to help overcome.

A significant responsibility

Everyone involved in developing emerging technologies carries a significant responsibility. AI is particularly sensitive because everyone knows it will have a huge impact across most parts of societies around the world, but no-one can really predict how.

How many jobs will AI replace? Will it be used for killer robots? Will it make decisions on whether to launch a missile? To what extent will facial recognition be used across society? These are important questions that no-one can give a guaranteed answer, but it’s certainly on the minds of a public that’s grown up around things like 1984 and Terminator.

One of the main concerns about AI is bias. Fantastic work by the likes of the Algorithmic Justice League has uncovered gross disparities between the effectiveness of facial recognition algorithms dependent on the race and gender of each individual. For example, IBM’s facial recognition algorithm was 99.7 percent accurate when used on lighter-skinned males compared to just 65.3 percent on darker-skinned females.

Simonini highlights another study she read recently where voice accuracy for white males was over 90 percent. However, for African-American females, it was more like 30 percent.

Addressing such disparities is not only necessary to prevent things such as inadvertently automating racial profiling or giving some parts of society an advantage over others, but also to allow AI to reach its full potential.

While there are many concerns, AI has a huge amount of power for good as long as it’s developed responsibly. AI can drive efficiencies to reduce our environmental impact, free up more time to spend with loved ones, and radically improve the lives of people with disabilities.

A failure of companies to take responsibility for their developments will lead to overregulation, and overregulation leads to reduced innovation. We asked Simonini whether she believes robust testing will reduce the likelihood of overregulation.

“I think it’s certainly improved the situation. I think that there’s always going to probably be some situations where people attempt to regulate, but if you can really show that effort has been put forward to get to a high level of accuracy and depth then I think it would be less likely.”

Human testing remains essential

Applause is not the only company working to reduce bias in algorithms. IBM, for example, has a tool called Fairness 360 which is essentially an AI itself used to scan other algorithms for signs of bias. We asked Simonini why Applause believes human testing is still necessary.

“Humans are unpredictable in how they’re going to react to something and in what manner they’re going to do it, how they choose to engage with these devices and applications,” comments Simonini. “We haven’t yet seen an advent of being able to effectively do that without the human element.”

An often highlighted challenge with voice recognition is the wide variety of languages spoken and their regional dialects. Many American voice recognition systems even struggle with my accent from the South West of England.

Simonini adds in another consideration about slang words and the need for voice services to keep up-to-date with changing vocabularies.

“Teenagers today like to, when something is hot or cool, say it’s “fire” [“lit” I believe is another one, just to prove I’m still down with the kids],” explains Simonini. “We were able to get these devices into homes and really try to understand some of those nuances.”

Simonini then further explains the challenge of understanding the context of these nuances. In her “fire” example, there’s a very clear need to understand when there’s a literal fire and when someone is just saying that something is cool.

“How do you distinguish between this being a real emergency? My volume and my tone and everything else about how I’ve used that same voice command is going to be different.”

The growth of AI apps and services

Applause established its business in traditional app testing. Given the expected growth in AI apps and services, we asked Simonini whether Applause believes its AI testing solution will become as big – or perhaps even bigger – than its current app testing business.

“We do talk about that; you know, how fast is this going to grow?” says Simonini. “I don’t want to keep talking about voice, but if you look statistically at the growth of the voice market vis-à-vis the growth and adoption of mobile; it’s happening at a much faster pace.”

“I think that it’s going to be a growing portion of our business but I don’t think it necessarily is going to replace anything given that those channels [such as mobile and desktop apps] will still be alive and complementary to one another.”

Simonini will be speaking at AI Expo North America on November 13th in a keynote titled Why The Human Element Remains Essential In Applied AI. We asked what attendees can expect from her talk.

“The angle that we chose to sort of speak about is really this intersection of the human and the AI and why we – given that it’s the business we’re in and what we see day-in, day-out – don’t believe that it becomes the replacement of but how it can work and complement one another.”

“It’s really a bit of where we landed when we went out to figure out whether you can replace an army of people with an army of robots and get the same results. And basically that no, there are still very human-focused needs from a testing perspective.”

The post Applause’s new AI solution helps tackle bias and sources data at scale appeared first on AI News.