voice recognition Archives - AI News

Meta’s open-source speech AI models support over 1,100 languages

Ryan Daws — Tue, 23 May 2023 12:46:19 +0000

Advancements in machine learning and speech recognition technology have made information more accessible to people, particularly those who rely on voice to access information. However, the lack of labelled data for numerous languages poses a significant challenge in developing high-quality machine-learning models.

In response to this problem, the Meta-led Massively Multilingual Speech (MMS) project has made remarkable strides in expanding language coverage and improving the performance of speech recognition and synthesis models.

By combining self-supervised learning techniques with a diverse dataset of religious readings, the MMS project has achieved impressive results in growing the ~100 languages supported by existing speech recognition models to over 1,100 languages.

Breaking down language barriers

To address the scarcity of labelled data for most languages, the MMS project utilised religious texts, such as the Bible, which have been translated into numerous languages.

These translations provided publicly available audio recordings of people reading the texts, enabling the creation of a dataset comprising readings of the New Testament in over 1,100 languages.

By including unlabeled recordings of other religious readings, the project expanded language coverage to recognise over 4,000 languages.

Despite the dataset’s specific domain and predominantly male speakers, the models performed equally well for male and female voices. Meta also says it did not introduce any religious bias.

Overcoming challenges through self-supervised learning

Training conventional supervised speech recognition models with just 32 hours of data per language is inadequate.

To overcome this limitation, the MMS project leveraged the benefits of the wav2vec 2.0 self-supervised speech representation learning technique.

By training self-supervised models on approximately 500,000 hours of speech data across 1,400 languages, the project significantly reduced the reliance on labelled data.

The resulting models were then fine-tuned for specific speech tasks, such as multilingual speech recognition and language identification.

Impressive results

Evaluation of the models trained on the MMS data revealed impressive results. In a comparison with OpenAI’s Whisper, the MMS models exhibited half the word error rate while covering 11 times more languages.

Furthermore, the MMS project successfully built text-to-speech systems for over 1,100 languages. Despite the limitation of having relatively few different speakers for many languages, the speech generated by these systems exhibited high quality.

While the MMS models have shown promising results, it is essential to acknowledge their imperfections. Mistranscriptions or misinterpretations by the speech-to-text model could result in offensive or inaccurate language. The MMS project emphasises collaboration across the AI community to mitigate such risks.

You can read the MMS paper here or find the project on GitHub.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta’s open-source speech AI models support over 1,100 languages appeared first on AI News.

Zoom receives backlash for emotion-detecting AI

Ryan Daws — Thu, 19 May 2022 08:22:19 +0000

Zoom has caused a stir following reports that it’s developing an AI system for detecting emotions.

The system, first reported by Protocol, claims to scan users’ faces and their speech to determine their emotions.

Zoom detailed the system more in a blog post last month. The company says ‘Zoom IQ’ will be particularly useful for helping salespeople improve their pitches based on the emotions of call participants.

Naturally, the system is seen as rather dystopian and has received its fair share of criticism.

On Wednesday, over 25 rights groups sent a joint letter to Zoom CEO Eric Yuan. The letter urges Zoom to cease research on emotion-based AI.

The letter’s signatories include the American Civil Liberties Union (ACLU), Muslim Justice League, and Access Now.

One of the key concerns is that emotion-detecting AI could be used for things like hiring or financial decisions; such as whether to grant loans. That has the possibility to increase existing inequalities.

“Results are not intended to be used for employment decisions or other comparable decisions. All recommended ranges for metrics are based on publicly available research,” Zoom explained.

Zoom IQ tracks metrics including:

Talk-listen ratio
Talking speed
Filler words
Longest spiel (monologue)
Patience
Engaging questions
Next steps set up
Sentiment/Engagement analysis

Esha Bhandari, Deputy Director of the ACLU Speech, Privacy, and Technology Project, called emotion-detecting AI “creepy” and “a junk science”.

(Photo by iyus sugiharto on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Zoom receives backlash for emotion-detecting AI appeared first on AI News.

DeepMind co-founder Mustafa Suleyman launches new AI venture

Ryan Daws — Wed, 09 Mar 2022 12:08:56 +0000

DeepMind co-founder Mustafa Suleyman has joined two other high-profile industry figures in launching a new venture called Inflection AI.

LinkedIn co-founder Reid Hoffman is joining Suleyman on the venture.

“Reid and I are excited to announce that we are co-founding a new company, Inflection AI,” wrote Suleyman in a statement.

“Inflection will be an AI-first consumer products company, incubated at Greylock, with all the advantages and expertise that come from being part of one of the most storied venture capital firms in the world.”

Dr Karén Simonyan, another former DeepMind AI expert, will serve as Inflection AI’s chief scientist and its third co-founder.

“Karén is one of the most accomplished deep learning leaders of his generation. He completed his PhD at Oxford, where he designed VGGNet and then sold his first company to DeepMind,” continued Suleyman.

“He created and led the deep learning scaling team and played a key role in such breakthroughs as AlphaZero, AlphaFold, WaveNet, and BigGAN.”

Inflection AI will focus on machine learning and natural language processing.

“Recent advances in artificial intelligence promise to fundamentally redefine human-machine interaction,” explains Suleyman.

“We will soon have the ability to relay our thoughts and ideas to computers using the same natural, conversational language we use to communicate with people. Over time these new language capabilities will revolutionise what it means to have a digital experience.”

Interest in natural language processing is surging. This month, Microsoft completed its $19.7 billion acquisition of Siri voice recognition engine creator Nuance.

Suleyman departed Google in January 2022 following an eight-year stint at the company.

While at Google, Suleyman was placed on administrative leave following bullying allegations. During a podcast, he said that he “really screwed up” and was “very sorry about the impact that caused people and the hurt people felt.”

Suleyman joined venture capital firm Greylock after leaving Google.

“There are few people who are as visionary, knowledgeable and connected across the vast artificial intelligence landscape as Mustafa,” wrote Hoffman, a Greylock partner, in a post at the time.

“Mustafa has spent years thinking about how technological advances impact society, and he cares deeply about the ethics and governance supporting new AI systems.”

Inflection AI was incubated by Greylock. Suleyman and Hoffman will both remain venture partners at the company.

Suleyman promises that more details about Inflection AI’s product plans will be provided over the coming months.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo. The next events in the series will be held in Santa Clara on 11-12 May 2022, Amsterdam on 20-21 September 2022, and London on 1-2 December 2022.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepMind co-founder Mustafa Suleyman launches new AI venture appeared first on AI News.

Microsoft acquires Nuance to usher in ‘new era of outcomes-based AI’

Ryan Daws — Tue, 08 Mar 2022 15:46:00 +0000

Microsoft has completed its acquisition of Siri backend creator Nuance in a bumper deal that it says will usher in a “new era of outcomes-based AI”.

“Completion of this significant and strategic acquisition brings together Nuance’s best-in-class conversational AI and ambient intelligence with Microsoft’s secure and trusted industry cloud offerings,” said Scott Guthrie, Executive Vice President of the Cloud + AI Group at Microsoft.

“This powerful combination will help providers offer more affordable, effective, and accessible healthcare, and help organisations in every industry create more personalised and meaningful customer experiences. I couldn’t be more pleased to welcome the Nuance team to our Microsoft family.”

Nuance became a household name (in techie households, anyway) for creating the speech recognition engine that powers Apple’s smart assistant, Siri. However, Nuance has been in the speech recognition business since 2001 when it was known as ScanSoft.

While it may not have made many big headlines in recent years, Nuance has continued to make some impressive advancements—which caught the attention of Microsoft.

Microsoft announced its intention to acquire Nuance for $19.7 billion last year, in the company’s largest deal after its $26.2 billion acquisition of LinkedIn (both deals would be blown out the water by Microsoft’s proposed $70 billion purchase of Activision Blizzard).

The proposed acquisition of Nuance caught the attention of global regulators. It was cleared in the US relatively quickly, while the EU’s regulator got in the festive spirit and cleared the deal just prior to last Christmas. The UK’s Competition and Markets Authority finally gave it a thumbs-up last week.

Regulators examined whether there may be anti-competition concerns in some verticals where both companies are active, such as healthcare. However, after investigation, the regulators determined that competition shouldn’t be affected by the deal.

The EU, for example, determined that “competing transcription service providers in healthcare do not depend on Microsoft for cloud computing services” and that “transcription service providers in the healthcare sector are not particularly important users of cloud computing services”.

Furthermore, the EU’s regulator concluded:

Microsoft-Nuance will continue to face stiff competition from rivals in the future.
There’d be no ability/incentive to foreclose existing market solutions.
Nuance can only use the data it collects for its own services.
The data will not provide Microsoft with an advantage to shut out competing software providers.

The companies appear keen to ensure that people are aware the deal is about more than just healthcare.

“Combining the power of Nuance’s deep vertical expertise and proven business outcomes across healthcare, financial services, retail, telecommunications, and other industries with Microsoft’s global cloud ecosystems will enable us to accelerate our innovation and deploy our solutions more quickly, more seamlessly, and at greater scale to solve our customers’ most pressing challenges,” said Mark Benjamin, CEO of Nuance.

Benjamin will remain the CEO of Nuance and will report to Guthrie.

(Photo by Omid Armin on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Microsoft acquires Nuance to usher in ‘new era of outcomes-based AI’ appeared first on AI News.

EU clears $19.7B Microsoft-Nuance deal without any small print

Ryan Daws — Wed, 22 Dec 2021 12:27:33 +0000

The EU has concluded Microsoft’s $19.7 billion acquisition of Nuance doesn’t pose competition concerns.

Nuance gained renown for originally creating the backend of that little old virtual assistant called Siri (you might have heard of it?)

The company has since continued to focus on building its speech recognition capabilities and has a number of solutions which span particular industries such as healthcare to general omni-channel customer experience services.

Earlier this year, Microsoft decided Nuance is worth coughing up $19.7 billion for.

As such large deals often do, the proposed acquisition caught the eyes of several global regulators. In the case of the EU, it was referred to the Commission’s regulators on 16 November.

The regulator said on Tuesday that the proposed acquisition “would raise no competition concerns” within the bloc and that “Microsoft and Nuance offer very different products” after looking at potential horizontal overlaps between the companies’ transcription solutions.

Vertical links in the healthcare space were also analysed but it was determined that “competing transcription service providers in healthcare do not depend on Microsoft for cloud computing services” and that “transcription service providers in the healthcare sector are not particularly important users of cloud computing services”.

Furthermore, the regulator concluded:

Microsoft-Nuance will continue to face stiff competition from rivals in the future.
There’d be no ability/incentive to foreclose existing market solutions.
Nuance can only use the data it collects for its own services.
The data will not provide Microsoft with an advantage to shut out competing software providers.

The EU’s decision mirrors that of regulators in the US and Australia. However, the UK’s Competition and Markets Authority (CMA) announced its own investigation earlier this month.

When it announced the deal, Microsoft said that it aims to complete its acquisition by the end of 2021. The CMA is accepting comments until 10 January 2022 so it seems that Microsoft may have to hold out a bit longer.

(Photo by Annie Spratt on Unsplash)

The post EU clears $19.7B Microsoft-Nuance deal without any small print appeared first on AI News.

IBM enhances Watson Discovery’s natural language processing capabilities

Ryan Daws — Wed, 10 Nov 2021 14:29:46 +0000

IBM has announced enhancements to the natural language processing (NLP) capabilities of Watson Discovery.

Watson Discovery is an AI-powered intelligent search and text-analytics platform that can retrieve critical information buried in enterprise data.

In one case study, Woodside Energy had no way to retrieve the 30 years’ worth of valuable engineering and drilling knowledge that was buried in unstructured documentation. Using the existing NLP capabilities of Watson Discovery, the firm reportedly cut research time by more than 75 percent.

Among the new enhancements planned for Watson Discovery are:

Pre-trained document structure understanding: Watson Discovery’s Smart Document Understanding feature now includes a new pre-trained model designed to automatically understand the visual structure and layout of a document without additional training from a developer or data scientist.
Automatic text pattern detection: A new advanced pattern creation feature is available in beta that helps users to quickly identify business-specific text patterns within their documents. It can start learning the underlying text patterns from as little as two examples and then refines the pattern based on user feedback.
Advanced NLP customisation capabilities: With a new custom entity extractor feature, IBM is simplifying the process of training NLP models to identify highly-customised, business-specific words by reducing the data prep effort, simplifying labeling with active learning and bulk annotation capabilities, and enabling simple model deployment to accelerate training time.

“The stream of innovation coming to IBM Watson from IBM Research is why global businesses in the fields of financial services, insurance, and legal services turn to IBM to help detect emerging business trends, gain operational efficiency and empower their workers to uncover new insights,” said Daniel Hernandez, General Manager of Data and AI, IBM.

“The pipeline of natural language processing innovations we’re adding to Watson Discovery can continue to provide businesses with the capabilities to more easily extract the signal from the noise and better serve their customers and employees.”

(Image Credit: IBM)

Looking to revamp your digital transformation strategy? Learn more about the Digital Transformation Week event taking place in Amsterdam on 23-24 November 2021 and discover key strategies for making your digital efforts a success.

The post IBM enhances Watson Discovery’s natural language processing capabilities appeared first on AI News.

McDonald’s drive-thru AI bot may have broken privacy law

Ryan Daws — Fri, 11 Jun 2021 16:27:04 +0000

McDonald’s announced earlier this month that it was deploying an AI chatbot to handle its drive-thru orders, but it turns out it might break privacy law.

The chatbot is the product of a voice recognition company McDonald’s snapped up in 2019 called Apprente which is now known as McD Tech Labs.

McDonald’s deployed the chatbots to ten of its restaurants in Chicago, Illinois. And there lies the issue.

The state of Illinois has some of the strictest data privacy laws in the country. For example, the state’s Biometric Information Privacy Act (BIPA) states: “No private entity may collect, capture, purchase, receive through trade, or otherwise obtain a person’s or a customer’s biometric identifier or biometric information.”

One resident, Shannon Carpenter, has sued McDonald’s on behalf of himself and other Illinois residents—claiming the fast food biz has broken BIPA by not receiving explicit written consent from its customers to process their voice data.

“Plaintiff, like the other class members, to this day does not know the whereabouts of his voiceprint biometrics which defendant obtained,” the lawsuit states.

The software is said to not only transcribe speech into text but also process it to predict personal information about the customers such as their “age, gender, accent, nationality, and national origin.”

Furthermore, the lawsuit alleges that McDonald’s has been testing AI software at its drive-thrus since last year.

Anyone found to have had their rights under BIPA violated are eligible for up to $5000 each per case. Given the huge number of McDonald’s customers, it’s estimated that damage payouts could exceed $5 million.

Once again, this case shows the need to be certain that any AI deployments are 100 percent compliant with increasingly strict data laws in every state and country they operate.

(Image Credit: Erik Mclean on Unsplash)

Find out more about Digital Transformation Week North America, taking place on November 9-10 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

The post McDonald’s drive-thru AI bot may have broken privacy law appeared first on AI News.

Researchers achieve 94% power reduction for on-device AI tasks

Ryan Daws — Thu, 17 Sep 2020 15:47:52 +0000

Researchers from Applied Brain Research (ABR) have achieved significantly reduced power consumption for a range of AI-powered devices.

ABR designed a new neural network called the Legendre Memory Unit (LMU). With LMU, on-device AI tasks – such as those on speech-enabled devices like wearables, smartphones, and smart speakers – can take up to 94 percent less power.

The reduction in power consumption achieved through LMU will be particularly beneficial to smaller form-factor devices such as smartwatches; which struggle with small batteries. IoT devices which carry out AI tasks – but may have to last months, if not years, before they’re replaced – should also benefit.

LMU is described as a Recurrent Neural Network (RNN) which enables lower power and more accurate processing of time-varying signals.

ABR says the LMU can be used to build AI networks for all time-varying tasks—such as speech processing, video analysis, sensor monitoring, and control systems.

The AI industry’s current go-to model is the Long-Short-Term-Memory (LSTM) network. LSTM was first proposed back in 1995 and is used for most popular speech recognition and translation services today like those from Google, Amazon, Facebook, and Microsoft.

Last year, researchers from the University of Waterloo debuted LMU as an alternative RNN to LSTM. Those researchers went on to form ABR, which now consists of 20 employees.

Peter Suma, co-CEO of Applied Brain Research, said in an email:

“We are a University of Waterloo spinout from the Theoretical Neuroscience Lab at UW. We looked at how the brain processes signals in time and created an algorithm based on how “time-cells” in your brain work.
We called the new AI, a Legendre-Memory-Unit (LMU) after a mathematical tool we used to model the time cells. The LMU is mathematically proven to be optimal at processing signals. You cannot do any better. Over the coming years, this will make all forms of temporal AI better.”

ABR debuted a paper in late-2019 during the NeurIPS conference which demonstrated that LMU is 1,000,000x more accurate than the LSTM while encoding 100x more time-steps.

In terms of size, the LMU model is also smaller. LMU uses 500 parameters versus the LSTM’s 41,000 (a 98 percent reduction in network size.)

“We implemented our speech recognition with the LMU and it lowered the power used for command word processing to ~8 millionths of a watt, which is 94 percent less power than the best on the market today,” says Suma. “For full speech, we got the power down to 4 milli-watts, which is about 70 percent smaller than the best out there.”

Suma says the next step for ABR is to work on video, sensor and drone control AI processing—to also make them smaller and better.

A full whitepaper detailing LMU and its benefits can be found on preprint repository arXiv here.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post Researchers achieve 94% power reduction for on-device AI tasks appeared first on AI News.

Esteemed consortium launch AI natural language processing benchmark

Ryan Daws — Thu, 15 Aug 2019 16:24:15 +0000

A research consortium featuring some of the greatest minds in AI are launching a benchmark to measure natural language processing (NLP) abilities.

The consortium includes Google DeepMind, Facebook AI, New York University, and the University of Washington. Each of the consortium’s members believe a more comprehensive benchmark is needed for NLP than current solutions.

The result is a benchmarking platform called SuperGLUE which replaces an older platform called GLUE with a “much harder benchmark with comprehensive human baselines,” according to Facebook AI.

SuperGLUE helps to put NLP abilities to the test where previous benchmarks were beginning to pose too simple for the latest systems.

“Within one year of release, several NLP models have already surpassed human baseline performance on the GLUE benchmark. Current models have advanced a surprisingly effective recipe that combines language model pretraining on huge text data sets with simple multitask and transfer learning techniques,” Facebook said.

In 2018, Google released BERT (Bidirectional Encoder Representations from Transformers) which Facebook calls one of the biggest breakthroughs in NLP. Facebook took Google’s open-source work and identified changes to improve its effectiveness which led to RoBERTa (Robustly Optimized BERT Pretraining Approach).

RoBERTa basically “smashed it,” as the kids would say, in commonly-used benchmarks:

“Within one year of release, several NLP models (including RoBERTa) have already surpassed human baseline performance on the GLUE benchmark. Current models have advanced a surprisingly effective recipe that combines language model pretraining on huge text data sets with simple multitask and transfer learning techniques,” Facebook explains.

For the SuperGLUE benchmark, the consortium decided on tasks which meet four criteria:

Have varied formats.
Use more nuanced questions.
Are yet-to-be-solved using state-of-the-art methods.
Can be easily solved by people.

The new benchmark includes eight diverse and challenging tasks, including a Choice of Plausible Alternatives (COPA) causal reasoning task. The aforementioned task provides the system with the premise of a sentence and it must determine either the cause or effect of the premise from two possible choices. Humans have managed to achieve 100 percent accuracy on COPA while BERT achieves just 74 percent.

Across SuperGLUE’s tasks, RoBERTa is currently the leading NLP system and isn’t far behind the human baseline:

You can find a full breakdown of SuperGLUE and its various benchmarking tasks in a Facebook AI blog post here.

Interested in hearing industry leaders discuss subjects like this and their use cases? Attend the co-located AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, Blockchain Expo, and Cyber Security & Cloud Expo.

The post Esteemed consortium launch AI natural language processing benchmark appeared first on AI News.

Google details Project Euphonia work to improve voice recognition inclusivity

Ryan Daws — Wed, 14 Aug 2019 12:48:45 +0000

Google has provided details of its Project Euphonia work designed to improve the inclusivity of voice recognition for people with disabilities that impair their speech.

Degenerative diseases like amyotrophic lateral sclerosis (ALS) are known for causing speech impairments. Today’s voice recognition systems often cannot recognise the speech of individuals suffering from such diseases, despite those individuals arguably set to benefit the most from the automation offered by the technology.

Google has set out to solve the problem with Project Euphonia.

Dimitri Kanevsky, a Google researcher who himself has impaired speech, can be seen in the video below using a system called Parrotron to convert his speech into one understandable by Google Assistant:

The researchers provide a background of Project Euphonia’s origins:

“ASR [automatic speech recognition] systems are most often trained from ‘typical’ speech, which means that underrepresented groups, such as those with speech impairments or heavy accents, don’t experience the same degree of utility.
…Current state-of-the-art ASR models can yield high word error rates (WER) for speakers with only a moderate speech impairment from ALS, effectively barring access to ASR reliant technologies.”

As the researchers highlight, part of the problem is that training sets primarily consist of ‘typical speech’ without much-needed variety to represent all parts of society (this even includes heavy accents, to some degree.)

The researchers set out to record dozens of hours of voice recordings from individuals with ALS to help train their AI. However, the resulting training set is still not ideal as each person with ALS sounds unique dependent on the progression of the disease and how it’s affecting them.

Google was able to reduce its word error rate by using a baseline voice recognition model, experimenting with some tweaks, and training it with the new recordings.

The method substantially improved the recognition but the researchers found it could occasionally struggle with phonemes in one of two key ways:

The phoneme isn’t recognised and therefore the word along with it,
The model has to guess at what phoneme the speaker meant.

The second problem is fairly trivial to solve. By analysing the rest of the sentence’s context, the AI can often determine the correct phoneme. For example, if the AI hears “I’m reading off to the cub,” it can probably determine the user meant “I’m heading off to the pub”.

You can read the full paper on arXiv here ahead of its presentation at the Interspeech conference in Austria next month.

The post Google details Project Euphonia work to improve voice recognition inclusivity appeared first on AI News.