reinforcement learning Archives - AI News

Humans struggle to distinguish between real and AI-generated faces

Ryan Daws — Mon, 21 Feb 2022 18:19:36 +0000

According to a new paper, AI-generated faces have become so advanced that humans now cannot distinguish between real and fake more often than not.

“Our evaluation of the photorealism of AI-synthesized faces indicates that synthesis engines have passed through the uncanny valley and are capable of creating faces that are indistinguishable—and more trustworthy—than real faces,” the researchers explained.

The researchers – Sophie J. Nightingale, Department of Psychology, Lancaster University, and Hanry Farid, Department of Electrical Engineering and Computer Sciences, University of California – highlight the worrying trend of “deepfakes” being weaponised.

Video, audio, text, and imagery generated by generative adversarial networks (GANs) are increasingly being used for nonconsensual intimate imagery, financial fraud, and disinformation campaigns.

GANs work by pitting two neural networks – a generator and a discriminator – against each other. The generator will start with random pixels and will keep improving the image to avoid penalisation from the discriminator. This process continues until the discriminator can no longer distinguish a synthesised face from a real one.

Just as the discriminator could no longer distinguish a synthesised face from a real one, neither could human participants. In the study, the human participants identified fake images just 48.2 percent of the time.

Accuracy was found to be higher for correctly identifying real East Asian and White male faces than females. However, for both male and female synthetic faces, White faces were least accurately identified and White males less than White females.

The researchers hypothesised that “White faces are more difficult to classify because they are overrepresented in the StyleGAN2 training dataset and are therefore more realistic.”

Here are the most (top and upper-middle lines) and least (bottom and lower-middle) accurately classified real (R) and synthetic (S) faces:

There’s a glimmer of hope for humans with participants being able to distinguish real faces 59 percent of the time after being given training on how to spot fakes. That’s not a particularly comfortable percentage, but it at least tips the scales towards humans spotting fakes more often than not.

What sets the alarm bells ringing again is that synthetic faces were rated more “trustworthy” than real ones. On a scale of 1 (very untrustworthy) to 7 (very trustworthy), the average rating for real faces (blue bars) of 4.48 is less than the rating of 4.82 for synthetic.

“A smiling face is more likely to be rated as trustworthy, but 65.5 per cent of our real faces and 58.8 per cent of synthetic faces are smiling, so facial expression alone cannot explain why synthetic faces are rated as more trustworthy,” wrote the researchers.

The results of the paper show the importance of developing tools that can spot the increasingly small differences that distinguish the real from synthetic because humans will struggle even if everyone was specifically trained.

With Western intelligence agencies calling out fake content allegedly from Russian authorities to justify an invasion of Ukraine, the increasing ease in which such media can be generated in mass poses a serious threat that’s no longer the work of fiction.

(Photo by NeONBRAND on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo. The next events in the series will be held in Santa Clara on 11-12 May 2022, Amsterdam on 20-21 September 2022, and London on 1-2 December 2022.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Humans struggle to distinguish between real and AI-generated faces appeared first on AI News.

Why AI needs human intervention

Doug Gilbert — Wed, 19 Jan 2022 17:07:47 +0000

In today’s tight labour market and hybrid work environment, organizations are increasingly turning to AI to support various functions within their business, from delivering more personalized experiences to improving operations and productivity to helping organizations make better and faster decisions. That is why the worldwide market for AI software, hardware, and services is expected to surpass $500 billion by 2024, according to IDC.

Yet, many enterprises aren’t ready to have their AI systems run independently and entirely without human intervention – nor should they do so.

In many instances, enterprises simply don’t have sufficient expertise in the systems they use as AI technologies are extraordinarily complex. In other instances, rudimentary AI is built into enterprise software. These can be fairly static and remove control over the parameters of the data most organizations need. But even the most AI savvy organizations keep humans in the equation to avoid risks and reap the maximum benefits of AI.

AI Checks and Balances

There are clear ethical, regulatory, and reputational reasons to keep humans in the loop. Inaccurate data can be introduced over time leading to poor decisions or even dire circumstances in some cases. Biases can also creep into the system whether it is introduced while training the AI model, as a result of changes in the training environment, or due to trending bias where the AI system reacts to recent activities more than previous ones. Moreover, AI is often incapable of understanding the subtleties of a moral decision.

Take healthcare for instance. The industry perfectly illustrates how AI and humans can work together to improve outcomes or cause great harm if humans are not fully engaged in the decision-making process. For example, in diagnosing or recommending a care plan for a patient, AI is ideal for making the recommendation to the doctor, who then evaluates if that recommendation is sound and then gives the counsel to the patient.

Having a way for people to continually monitor AI responses and accuracy will avoid flaws that could lead to harm or catastrophe while providing a means for continuous training of the models so they get continuously better and better. That’s why IDC expects more than 70% of G2000 companies will have formal programs to monitor their digital trustworthiness by 2022.

Models for Human-AI Collaboration

Human-in-the-Loop (HitL) Reinforcement Learning and Conversational AI are two examples of how human intervention supports AI systems in making better decisions.

HitL allows AI systems to leverage machine learning to learn by observing humans dealing with real-life work and use cases. HitL models are like traditional AI models except they are continuously self-developing and improving based on human feedback while, in some cases, augmenting human interactions. It provides a controlled environment that limits the inherent risk of biases—such as the bandwagon effect—that can have devastating consequences, especially in crucial decision-making processes.

We can see the value of the HitL model in industries that manufacture critical parts for vehicles or aircraft requiring equipment that is up to standard. In situations like this, machine learning increases the speed and accuracy of inspections, while human oversight provides added assurances that parts are safe and secure for passengers.

Conversational AI, on the other hand, provides near-human-like communication. It can offload work from employees in handling simpler problems while knowing when to escalate an issue to humans for solving more complex issues. Contact centres provide a primary example.

When a customer reaches out to a contact centre, they have the option to call, text, or chat virtually with a representative. The virtual agent listens and understands the needs of the customer and engages back and forth in a conversation. It uses machine learning and AI to decide what needs to be done based on what it has learned from prior experience. Most AI systems within contact centres generate speech to help communicate with the customer and mimic the feeling of a human doing the typing or talking.

For most situations, a virtual agent is enough to help service customers and resolve their problems. However, there are cases where AI can stop typing or talking and then make a seamless transfer to a live representative to take over the call or chat. Even in these examples, the AI system can shift from automation to augmentation, by still listening to the conversation and providing recommendations to the live representative to aid them in their decisions

Going beyond conversational AI with cognitive AI, these systems can learn to understand the emotional state of the other party, handle complex dialogue, provide real-time translation and even adjust based on the behaviour of the other person, taking human assistance to the next level of sophistication.

Blending Automation and Human Interaction Leads to Augmented Intelligence

AI is best applied when it is both monitored by and augments people. When that happens, people move up the skills continuum, taking on more complex challenges, while the AI continually learns, improves, and is kept in check, avoiding potentially harmful effects. Using models like HitL, conversational AI, and cognitive AI in collaboration with real people who possess expertise, ingenuity, empathy and moral judgment ultimately leads to augmented intelligence and more positive outcomes.

(Photo by Arteum.ro on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Why AI needs human intervention appeared first on AI News.

BT uses epidemiological modelling for new cyberattack-fighting AI

Ryan Daws — Fri, 12 Nov 2021 14:58:18 +0000

BT is deploying an AI trained on epidemiological modelling to fight the increasing risk of cyberattacks.

The first mathematical epidemic model was formulated and solved by Daniel Bernoulli in 1760 to evaluate the effectiveness of variolation of healthy people with the smallpox virus. More recently, such models have guided COVID-19 responses to keep the health and economic damage from the pandemic as minimal as possible.

Now security researchers from BT Labs in Suffolk want to harness centuries of epidemiological modelling advancements to protect networks.

BT’s new epidemiology-based cybersecurity prototype is called Inflame and uses deep reinforcement learning to help enterprises automatically detect and respond to cyberattacks before they compromise a network.

Howard Watson, Chief Technology Officer at BT, said:

“We know the risk of cyberattack is higher than ever and has intensified significantly during the pandemic. Enterprises now need to look to new cybersecurity solutions that can understand the risk and consequence of an attack, and quickly respond before it’s too late.
Epidemiological testing has played a vital role in curbing the spread of infection during the pandemic, and Inflame uses the same principles to understand how current and future digital viruses spread through networks.
Inflame will play a key role in how BT’s Eagle-i platform automatically predicts and identifies cyber-attacks before they impact, protecting customers’ operations and reputation.”

The ‘R’ rate – used for indicating the estimated rate of further infection per case – has gone from the lexicons of epidemiologists to public knowledge over the course of the pandemic. Alongside binge-watching Tiger King, a lockdown pastime for many of us was to check the latest R rate in the hope that it had dropped below 1—meaning the spread of COVID-19 was decreasing rather than increasing exponentially.

For its Inflame prototype, BT’s team built models that were used to test numerous scenarios based on differing R rates of cyber-infection.

Inflame can automatically model and respond to a detected threat within an enterprise network thanks to its deep reinforcement training.

Responses are underpinned by “attack lifecycle” modelling – similar to understanding the spread of a biological virus – to determine the current stage of a cyberattack by assessing real-time security alerts against recognised patterns. The ability to predict the next stage of a cyberattack helps with determining the best steps to halt its progress.

Last month, BT announced its Eagle-i platform which uses AI for real-time threat detection and intelligent response. Eagle-i “self-learns” from every intervention to constantly improve its threat knowledge and Inflame will be a key component in further improving the platform.

(Photo by Erik Mclean on Unsplash)

Looking to revamp your digital transformation strategy? Learn more about the Digital Transformation Week event taking place in Amsterdam on 23-24 November 2021 and discover key strategies for making your digital efforts a success.

The post BT uses epidemiological modelling for new cyberattack-fighting AI appeared first on AI News.

Rishabh Mehrotra, research lead, Spotify: Multi-stakeholder thinking with AI

Fin Strathern — Fri, 24 Sep 2021 13:29:52 +0000

Streaming behemoth Spotify hosts more than seventy million songs and close to three million podcast titles on its platform.

Delivering this without artificial intelligence (AI) would be comparable to traversing the Amazon rainforest armed with nothing but a spoon.

To cut – or scoop – through this jungle of music, Spotify’s research team deploy hundreds of machine learning models that improve the user experience, all the while trying to balance the needs of users and creators.

AI News caught up with Spotify research lead Rishabh Mehrotra at the AI & Big Data Expo Global on September 7 to learn more about how AI supports the platform.

AI News: How important is AI to Spotify’s mission?

Rishabh Mehrotra: AI is at the centre of what we do. Machine learning (ML) specifically has become an indispensable tool for powering personalised music and podcast recommendations to more than 365 million users across the world. It enables us to understand user needs and intents, which then helps us to deliver personalised recommendations across various touch points on the app.

It’s not just about the actual models which we deploy in front of users but also the various AI techniques we use to adopt a data driven process around experimentation, metrics, and product decisions.

We use a broad range of AI methods to understand our listeners, creators, and content. Some of our core ML research areas include understanding user needs and intents, matching content and listeners, balancing user and creator needs, using natural language understanding and multimedia information retrieval methods, and developing models that optimise long term rewards and recommendations.

What’s more, our models power experiences across around 180 countries, so we have to consider how they are performing across markets. Striking a balance between pushing global music but still facilitating local musicians and music culture is one of our most important AI initiatives.

AN: Spotify users might be surprised to learn just how central AI is to almost every aspect of the platform’s offering. It’s so seamless that I suspect most people don’t even realise it’s there. How crucial is AI to the user experience on Spotify?

RM: If you look at Spotify as a user then you typically view it as an app which gives you the content that you’re looking for. However, if you really zoom in you see that each of these different recommendation tools are all different machine learning products. So if you look at the homepage, we have to understand user intent in a far more subtle way than we would with a search query. The homepage is about giving recommendations based on a user’s current needs and context, which is very different from a search query where users are explicitly asking for something. Even in search, users can seek open and non-focused queries like ‘relaxing music’, or you could be searching the name of a specific song.

Looking at sequential radio sessions, our models try to balance familiar music with discovery content, aimed at not only recommending content users could enjoy at the moment, but optimising for long term listener-artist connections.

A good amount of our ML models are starting to become multi-objective. Over the past two years, we have deployed a lot of models that try to fulfil listener needs whilst also enabling creators to connect with and grow their audiences.

AN: Are artists’ wants and needs a big consideration for Spotify or is the focus primarily on the user experience?

RM: Our goal is to match the creators with the fans in an enriching way. While understanding user preferences is key to the success of our recommendation models, it really is a two-sided market in a lot of ways. We have the users who want to consume audio content on one side and the creators looking to grow their audiences on the other. Thus a lot of our recommendation products have a multi-stakeholder thinking baked into them to balance objectives from both sides.

AN: Apart from music recommendations and suggestions, does AI support Spotify in any other ways?

RM: AI plays an important role in driving our algotorial approach – Expert curators with an excellent sense for what’s up and coming, quite literally teach our machine learning system. Through this approach, we can create playlists that not only look at past data but also reflect cultural trends as they’re happening. Across all regions, we have editors who bring in deep domain expertise about music culture that we use proactively in our products. This allows us to develop and deploy human-in-the-loop AI techniques that can leverage editorial input to bootstrap various decisions that various ML models can then scale.

AN: What about podcasts? Do you utilise AI differently when applying it to podcasts over music?

RM: Users’ podcast journeys can differ in a lot of ways compared to music. While music is a lot about the audio and acoustic properties of songs, podcasts depend on a whole different set of parameters. For one, it’s much more about content understanding – understanding speakers, types of conversations and topics of discussions.

That said, we are seeing some very interesting results using music taste for podcast recommendations too. Members of our group have recently published work that shows how our ML models can leverage users’ music preferences to recommend podcasts, and some of these results have demonstrated significant improvements, especially for new podcast users.

AN: With so many models already turning the cogs at Spotify, it’s difficult to see how new and exciting use cases could be introduced. What are Spotify’s AI plans for the coming years?

RM: We’re working on a number of ways to elevate the experience even further. Reinforcement learning will be an important focus point as we look into ways to optimise for a lifetime of fulfilling content, rather than optimise for the next stream. In a sense this isn’t about giving users what they want right now as opposed to evolving their tastes and looking at their long term trajectories.

AN: As the years go on and your models have more and more data to work with, will the AI you use naturally become more advanced?

RM: A lot of our ML investments are not only about incorporating state-of-the-art ML into our products, but also extending the state-of-the-art based on the unique challenges we face as an audio platform. We are developing advanced causal inference techniques to understand the long term impact of our algorithmic decisions. We are innovating in the multi-objective ML modelling space to balance various objectives as part of our two-sided marketplace efforts. We are gravitating towards learning from long term trajectories and optimising for long term rewards.

To make data-driven decisions across all such initiatives, we rely heavily on solid scientific experimentation techniques, which also heavily relies on using machine learning.

Reinforcement learning furthers the scope of longer term decisions – it brings that long term perspective into our recommendations. So a quick example would be facilitating discovery on the platform. As a marketplace platform, we want users to not only consume familiar music but to also discover new music, leveraging the value of recommendations. There are 70 million tracks on the platform and only a few thousand will be familiar to any given user, putting aside the fact that it would take an individual several lifetimes to actually go through all this content. So tapping into that remaining 69.9 million and surfacing content users would love to discover is a key long-term goal for us.

How to fulfil users’ long term discovery needs, when to surface such discovery content, and by how much, not only across which set of users, but also across various recommended sets are a few examples of higher abstraction long term problems that RL approaches allow us to tackle well.

AN: Finally, considering the involvement Spotify has in directing users’ musical experiences, does the company have to factor in any ethical issues surrounding its usage of AI?

RM: Algorithmic responsibility and causal influence are topics we take very seriously and we actively work to ensure our systems operate in a fair and responsible manner, backed by focused research and internal education to prevent unintended biases.

We have a team dedicated to ensuring we approach these topics with the right research-informed rigour and we also share our learnings with the research community.

AN: Is there anything else you would like to share?

RM: On a closing note, one thing I love about Spotify is that we are very open with the wider industry and research community about the advances we are making with AI and machine learning. We actively publish at top tier venues, give tutorials, and we have released a number of large datasets to facilitate academic research on audio recommendations.

For anyone who is interested in learning more about this I would recommend checking out our Spotify Research website which discusses our papers, blogs, and datasets in greater detail.

Find out more about Digital Transformation Week North America, taking place on 9-10 November 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

The post Rishabh Mehrotra, research lead, Spotify: Multi-stakeholder thinking with AI appeared first on AI News.

OpenAI’s latest model can summarise those tl;dr books

Ryan Daws — Fri, 24 Sep 2021 13:05:14 +0000

OpenAI has unveiled a new model that tests scalable alignment techniques by summarising those tl;dr (too long; didn’t read) books.

The model works by first summarising small sections of a book before summarising those summaries into a higher-level summary. It carries on this way – hence being a great scalable alignment test – to summarise as little or as much as desired.

You can view the complete steps on OpenAI’s website but here’s an example of where you can start and end up:

To create the model, a combination of reinforcement learning and recursive text decomposition was used. The model was trained on a subset of the predominantly fiction books in GPT-3’s training dataset.

OpenAI assigned two people to read 40 of the most popular books (according to Goodreads) that were published in 2020 and write a summary of each. The participants were then asked to rate one another’s summaries in addition to that of the AI model.

On average, human-written summaries receive a 6/7 rating. The model received that rating 5 percent of the time and a 5/7 rating 15 percent of the time.

Practical uses

Many won’t have even read this article this far. Most visitors to publications only spend an average of 15 seconds reading around 20 percent of any single article. That’s especially a problem when readers then feel educated on an important topic and can end up spreading misinformation.

Social media platforms have started asking users whether they really want to share an article when they’ve not opened it for any context. Using models like OpenAI is demonstrating, such platforms could at least offer a decent summary to users.

The model was mostly successful but OpenAI concedes in a paper (PDF) that it occasionally generated inaccurate statements. Humans can still generally do a better job most of the time, but it’s an impressive showing nonetheless for an automated solution.

(Photo by Mikołaj on Unsplash)

The post OpenAI’s latest model can summarise those tl;dr books appeared first on AI News.

Janine Lloyd-Jones, Faculty: On the ethical considerations of AI and ensuring it’s a tool for positive change

Ryan Daws — Mon, 06 Sep 2021 13:31:41 +0000

The benefits of AI are becoming increasingly clear as deployments ramp up, but fully considering the technology’s impact must remain a priority to build public trust.

AI News caught up with Janine Lloyd-Jones, director of marketing and communication at Faculty, to discuss how the benefits of AI can be unlocked while ensuring deployments remain a tool for positive change.

AI News: What are the constraints, ethical considerations, and potential for deep reinforcement learning?

Janine Lloyd-Jones: Whilst reinforcement learning can be used in video games, robotics and chatbots, the reality is that we can’t fully unlock the power of these tools as the risks are high and models like these are hard to maintain.

As AI makes more and more critical decisions about our everyday lives, it becomes even more important to know it’s operating safely. We’ve developed a first-of-its-kind explainability tool, which generates explanations quickly, incorporating causality; making it easier to improve the performance of models, because users understand how it makes decisions. This has been integral in our Early Warning System (EWS), allowing NHS staff to understand and interpret each forecast, which has increased the adoption of the tool.

It’s our view that clearer regulation is needed to ensure AI is being used safely, but this needs to be informed by what’s practical and possible to implement. We also need to ensure we don’t stifle innovation. Any regulation needs to be context-dependent. For example, when AI is used to make decisions in a medical diagnostics context, safety becomes far more important than if an AI algorithm is trying to choose which advertisement to show you. As we acquire the right tools and regulation, it’s exciting to see what complex AI like deep reinforcement learning will achieve in our industry and society.

AN: Faculty is among the companies taking admirable steps to offset its carbon emissions. How can AI play a role in combating climate change?

JL: We’re here because we believe that AI can change the world – we want to take this technology and use it to solve real, tangible, important problems.

Like many tech companies, the biggest sources of our carbon emissions are cloud computing (the tech sector has a greater carbon footprint than the aviation industry) but sustainable AI can be part of the solution. Our work includes analysing data for Arctic Basecamp and regulating pressure on the UK gas grid. We’re expanding our sustainable AI work with environmental organisations, supporting them to tackle climate change.

AN: How quickly do you think most factories will either go entirely “dark” – as in having no or very few humans working in them – or at least have a portion of them being fully autonomous? How can the workforce prepare for such changes?

JL: AI is not universal just yet, so we don’t expect we’ll see factories going entirely dark anytime soon. Most companies are using AI to automate, save time and increase productivity, but the potential of AI is huge – it will transform industries. AI can become the unconscious mind of an organisation, processing vast volumes of data quickly, and freeing humans to focus on what they’re best at and where their input is needed; humans have a far greater appreciation for nuance and context for example.

We’ve already helped clients across industries do this from cutting the backlog of cases from four years to just four weeks, developing models which detect harmful content online with a positive rate of 94% to helping large retailers ensure they are marketing to the customers most likely to purchase, increasing profits by 5%.

AN: The NHS was able to enhance its forecasting abilities thanks to its partnership with Faculty. What successes were achieved and was anything learnt from the experience that could improve future predictions?

JL: We’re really proud of our partnership with the NHS; our groundbreaking Early Warning System (EWS) was crucial in the NHS’ nationwide pandemic data strategy, forecasting spikes in Covid-19 cases and hospital admissions weeks in advance. These forecasts allowed the NHS to ensure there were enough staff, beds and vital equipment allocated for patients. There are over 1000 users of the model across the NHS.

Following the success of the tool, we are addressing new areas where AI forecasting can be used to improve service delivery and patient care in the NHS, including predicting A&E demand and winter pressures. The EWS uses our Operational Intelligence software; leveraging Bayesian hierarchical modelling to form forecasts on a national level to an individual trust level. We’ve used the same software in scenarios where demand forecasting is needed, including for consumer goods.

AN: Faculty continues to expand rapidly and recently raised £30m that it expects to use to create 400 new jobs and accelerate its international expansion. What else is a key focus for Faculty over the coming years?

JL: We’re excited to be able to bring the power of AI to even more customers, helping them to make effective decisions with real-world impact. We are enhancing our technology offering, hiring 400 new people over the next few years and accelerating our international expansion. We are also doubling down on our AI safety research programme, so our customers have the assurance that all of our AI models are always performing safely and to the best of their ability.

AN: What will Faculty be sharing with the audience at this year’s AI Expo Global?

We’re glad to be at in-person events again, and we’re looking forward to meeting fellow exhibitors and attendees. Our focus at this year’s AI Expo will be on our Customer Intelligence software – which we are predominantly using within the consumer industries to demonstrate the impact marketing has on individual customer behaviour. Millions of marketing spend is wasted each year, being spent on the wrong people. With our technology, marketers will finally have the insight to know when and who they should be focusing their efforts on.

We’re also sharing more about our Faculty Fellowship, our in-house L&D programme where organisations looking to expand their data science teams can hire top data scientists for six weeks before they decide to hire. This is particularly critical as the UK tech industry looks to hire and attract the top talent. We’ve already had some great companies take part in this programme from Virgin Media and Vodafone, through to leading startups like The Trade Desk and JustEat.

AN: It’s the 20th anniversary of the Faculty Fellowship in October – what’s the focus for the Fellowship over the coming years?

JL: Faculty began with the Fellowship, so it’s a really special milestone to be celebrating the 20th anniversary. With demand for data scientists at an all-time high – with over 100,000 vacancies in 2020 alone, it’s a competitive space. We expanded the programme this year to include an additional fellowship, and we’re continuously working to ensure we are attracting top talent, and making the process as easy as possible for our partner companies.

Overstretched teams are fed up of spending their time on hiring and long interview rounds—the fellowship is designed so companies only invest 2-3 hours in total, but have an elite data scientist embedded in their team within weeks.

(Photo by Clark Tibbs on Unsplash)

Faculty will be sharing their invaluable insights during this year’s AI & Big Data Expo Global which runs from 6-7 September 2021. Faculty’s stand number is 178. Find out more about the event here.

The post Janine Lloyd-Jones, Faculty: On the ethical considerations of AI and ensuring it’s a tool for positive change appeared first on AI News.

Do you even AI, bro? OpenAI Safety Gym enhances reinforcement learning

Ryan Daws — Fri, 22 Nov 2019 12:04:53 +0000

Elon Musk-founded OpenAI has opened the doors of its “Safety Gym” designed to enhance the training of reinforcement learning agents.

OpenAI describes Safety Gym as “a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.”

Basically, Safety Gym is the software equivalent of your spotter making sure you’re not going to injure yourself. And just like a good spotter, it will check your form.

“We also provide a standardised method of comparing algorithms and how well they avoid costly mistakes while learning,” says OpenAI.

“If deep reinforcement learning is applied to the real world, whether in robotics or internet-based tasks, it will be important to have algorithms that are safe even while learning—like a self-driving car that can learn to avoid accidents without actually having to experience them.”

Reinforcement learning is based on trial and error, with AIs training to get the best possible reward in the most efficient way. The problem is, this can lead to dangerous behaviour which could prove problematic.

Taking the self-driving car example, you wouldn’t want an AI deciding to go around the roundabout the wrong way just because it’s the quickest way to the final exit.

OpenAI is promoting the use of “constrained reinforcement learning” as a possible solution. By implementing cost functions, agents consider trade-offs which still achieve defined outcomes.

In a blog post, OpenAI explains the advantages of using constrained reinforcement learning with the example of a self-driving car:

“Suppose the car earns some amount of money for every trip it completes, and has to pay a fine for every collision. In normal RL, you would pick the collision fine at the beginning of training and keep it fixed forever. The problem here is that if the pay-per-trip is high enough, the agent may not care whether it gets in lots of collisions (as long as it can still complete its trips). In fact, it may even be advantageous to drive recklessly and risk those collisions in order to get the pay. We have seen this before when training unconstrained RL agents.
By contrast, in constrained RL you would pick the acceptable collision rate at the beginning of training, and adjust the collision fine until the agent is meeting that requirement. If the car is getting in too many fender-benders, you raise the fine until that behaviour is no longer incentivised.”

Safety Gym environments require AI agents — three are included: Point, Car, and Doggo — to navigate cluttered environments to achieve a goal, button, or push task. There are two levels of difficulty for each task. Every time an agent performs an unsafe action, a red warning light flashes around the agent and it will incur a cost.

Going forward, OpenAI has identified three areas of interest to improve algorithms for constrained reinforcement learning:

Improving performance on the current Safety Gym environments.
Using Safety Gym tools to investigate safe transfer learning and distributional shift problems.
Combining constrained RL with implicit specifications (like human preferences) for rewards and costs.

OpenAI hopes that Safety Gym can make it easier for AI developers to collaborate on safety across the industry via work on open, shared systems.

Interested in hearing industry leaders discuss subjects like this? Attend the co-located 5G Expo, IoT Tech Expo, Blockchain Expo, AI & Big Data Expo, and Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London, and Amsterdam.

The post Do you even AI, bro? OpenAI Safety Gym enhances reinforcement learning appeared first on AI News.

AI enables ‘hybrid drones’ with the attributes of both planes and helicopters

Ryan Daws — Mon, 15 Jul 2019 15:41:36 +0000

Researchers have developed an AI system enabling ‘hybrid drones’ which combine the attributes of both planes and helicopters.

The propeller-forward designs of most drones are inefficient and reduce flight time. Researchers from MIT, Dartmouth, and the University of Washington have proposed a new hybrid design which aims to combine the perks of both helicopters and fixed-wing planes.

In order to support the new design, a new AI system was developed to switch between hovering and gliding with a single flight controller.

Speaking to VentureBeat, MIT CSAIL graduate student and project lead Jie Xu said:

“Our method allows non-experts to design a model, wait a few hours to compute its controller, and walk away with a customised, ready-to-fly drone.
The hope is that a platform like this could make more these more versatile ‘hybrid drones’ much more accessible to everyone.”

Existing fixed-wing drones require engineers to build different systems for hovering (like a helicopter) and flying horizontally (like a plane). Controllers are also needed to switch between.

Today’s control systems are designed around simulations, causing a discrepancy when used in actual hardware in real-world scenarios.

Using reinforcement learning, the researchers trained a model which can detect potential differences between the simulation and reality. The controller is then able to use this model to transition from hovering to flying, and back again, just by updating the drone’s target velocity.

OnShape, a popular CAD platform, is used to allow users to select potential drone parts from a data set. The proposed design’s performance can then be tested in a simulator.

“We expect that this proposed solution will find application in many other domains,” wrote the researchers in the paper. It’s easy to imagine the research one day being scaled up to people-carrying ‘air taxis’ and more.

The researchers will present their paper later this month at the Siggraph conference in Los Angeles.

Interested in hearing industry leaders discuss subjects like this and their use cases? Attend the co-located AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, Blockchain Expo, and Cyber Security & Cloud Expo.

The post AI enables ‘hybrid drones’ with the attributes of both planes and helicopters appeared first on AI News.

Uber’s AI beats troublesome games with new type of reinforcement learning

Ryan Daws — Tue, 27 Nov 2018 14:35:47 +0000

Video games have become a proving ground for AIs and Uber has shown how its new type of reinforcement learning has succeeded where others have failed.

Some of mankind’s most complex games, like Go, have failed to challenge AIs from the likes of DeepMind. Reinforcement learning trains algorithms by running scenarios repeatedly with a ‘reward’ given for successes, often a score increase.

Two classic games from the 80s – Montezuma’s Revenge and Pitfall! – have thus far been immune to a traditional reinforcement learning approach. This is because they have little in the way of notable rewards until later in the games.

Applying traditional reinforcement learning typically results in a failure to progress out the first room in Montezuma’s Revenge, while in Pitfall! it fails completely.

One way researchers have attempted to provide the necessary rewards to incentivise the AI is by adding them in for exploration, what’s called ‘intrinsic motivation’. However, this approach has shortcomings.

“We hypothesize that a major weakness of current intrinsic motivation algorithms is detachment,” wrote Uber’s researchers. “Wherein the algorithms forget about promising areas they have visited, meaning they do not return to them to see if they lead to new states.”

Uber’s AI research team in San Francisco developed a new type of reinforcement learning to overcome the challenge.

The researchers call their approach ‘Go-Explore’ whereby the AI will return to a previous task or area to assess whether it yields a better result. Supplementing with human knowledge to guide it towards notable areas sped up its progress dramatically.

If nothing else, the research provides some comfort us feeble humans are not yet fully redundant and the best results will be attained by working hand-in-binary with our virtual overlords.

The post Uber’s AI beats troublesome games with new type of reinforcement learning appeared first on AI News.

Google improves AI model training by open-sourcing framework

Ryan Daws — Tue, 28 Aug 2018 10:32:23 +0000

Google is helping researchers seeking to train AI models by open-sourcing a reinforcement learning framework used for its own projects.

Reinforcement learning has been used for some of the most impressive AI demonstrations thus far, including those which beat human professional gamers at Alpha Go and Dota 2. Google subsidiary DeepMind uses it for its Deep Q-Network (DQN).

Building a reinforcement learning framework takes both time and significant resources. For AI to reach its full potential, it needs to become more accessible.

Starting today, Google is making an open source reinforcement framework based on TensorFlow – its machine learning library – available on GitHub.

Pablo Samuel Castro and Marc G. Bellemare, Google Brain researchers, wrote in a blog post:

“Inspired by one of the main components in reward-motivated behavior in the brain and reflecting the strong historical connection between neuroscience and reinforcement learning research, this platform aims to enable the kind of speculative research that can drive radical discoveries.

This release also includes a set of collabs that clarify how to use our framework.”

Google’s framework was designed with three focuses: flexibility, stability, and reproducibility.

The company is providing 15 code examples for the Arcade Learning Environment — a platform which uses video games to evaluate the performance of AI technology — along with four distinct machine learning models: C51, the aforementioned DQN, Implicit Quantile Network, and the Rainbow agent.

Reinforcement learning is among the most effective methods of training. If you’re training a dog, offering treats as a reward for the desired behaviour is a key example of positive reinforcement in practice.

Training a machine is a similar concept, only the rewards are delivered or withheld as ones and zeros instead of tasty goods or a paycheck.

“Our hope is that our framework’s flexibility and ease-of-use will empower researchers to try out new ideas, both incremental and radical,” wrote Bellemare and Castro. “We are already actively using it for our research and finding it is giving us the flexibility to iterate quickly over many ideas.”

“We’re excited to see what the larger community can make of it.”

What are your thoughts on Google’s open-sourcing of its reinforcement learning framework? Let us know in the comments.

Interested in hearing industry leaders discuss subjects like this and sharing their use-cases? Attend the co-located AI & Big Data Expo events with upcoming shows in Silicon Valley, London and Amsterdam to learn more. Co-located with the IoT Tech Expo, Blockchain Expo and Cyber Security & Cloud Expo so you can explore the future of enterprise technology in one place.

The post Google improves AI model training by open-sourcing framework appeared first on AI News.