Since the release of Spike Jonze’s 2013 film “Her,” about a man falling in love with an AI like Siri named Samantha, many people have been imagining a humanlike artificial assistant with which they can communicate. The protagonist struggles to accept that Samantha, as real as she might seem, is neither human nor will ever be.
This is no longer science fiction. ChatGPT, Apple’s Siri, and Amazon’s Alexa are digital assistants that use AI to help users get driving directions, create grocery lists, etc. Automatic speech recognition systems, like Samantha, can’t do everything a human listener is capable of.
Have you ever had to repeat yourself when calling your bank or utility provider?
The bot of digital customer service can understand you. You may have spent time on your phone editing garbled words after dictating a note. Researchers in linguistics and computer sciences have found that some people are more susceptible to errors than others. They tend to make a lot more mistakes if you are a
Non-native
Regional accent is
Blackis a speak
In African American Vernacular English ()
If you are a code-switch,is the switch to use.
Womanis
Oldis too
Young and have a
speech impediment.
Tine ear
Automatic speech recognition systems, unlike you and me, are not “sympathetic” listeners. Instead of trying hard to understand you, they give up. Or they make a probabilistic guess which can sometimes lead to an error.
As businesses and public agencies adopt automatic speech recognition software to reduce costs, people are forced to interact with it. But as these systems are increasingly used in critical fields ranging from emergency situations to the public sector, people have little choice but to interact with them.
First responders
health care to
Education
Law
The more serious the consequences will be if they do not listen to what people are saying. Imagine that you are injured in a car accident in the future. You dial 911 for help but instead of getting connected to a dispatcher, you’re connected to a bot designed to filter out non-emergency calls. It takes several rounds before you are understood, wasting your time and increasing your anxiety at the worst possible moment.
Why does this type of error occur? Some of the inequalities resulting from these systems have been baked into the reams.
Developers use linguistic data to build
large language models. Developers train artificial-intelligence systems to understand and mimic the human language by feeding vast quantities of text and sound files containing real human voice. But whose voice are they feeding?
It is reasonable to assume that a system with high accuracy rates in speaking with white Americans who are in their mid-30s and are affluent was trained by listening to a lot of audio recordings from people who fit the profile.
AI developers can reduce these errors by collecting data from diverse sources. But to build AI systems capable of understanding the infinite variations in speech that arise from things like
gender,
Age
Race
First vs. Second Language
Socioeconomic status
…and much more, require significant resources and time.
“Proper” English
The challenges are even greater for people who don’t speak English, which is to say most people in the world. The majority of the world’s biggest generative AI systems have been built in English and they work better in English than any other language. On paper, AI is a very promising field.
Civic potential is a translation tool that can be used to increase access to information for people in different languages. However, for the time being, most languages are not able to translate.
They have a smaller digital footprint which makes it difficult for them power large language models.
Even in languages well-served with large language models like
English
The experience you have with Spanish will vary depending on the dialect of the language that you speak.
Currently, most speech recognition and generative AI bots reflect the dialects of the language you speak.
They are biased by the datasets that they are trained on. They are sometimes prescriptive.
Prejudiced notions of “correctness” in speech.
AI has been shown to be ”
Flatten linguistic diversity. There are now AI startups that offer to
The company’s primary clients are customer service providers in countries like India and the Philippines, who have call centres there. The offer perpetuates the idea that some accents are more valid than others.
The human connection
AI is likely to improve at processing language and accounting for variables such as accents, codeswitching, and the like. In the US, federal law requires public services to provide a range of services.
Access to servicesis equal for all people, regardless of the language they speak. It is not clear if this alone will be sufficient to encourage the tech industry towards eliminating linguistic inequities.
Many would prefer to speak to a human being when they have questions about a medical issue or a bill, or to be able to avoid interacting with automated systems. It is not true that there are no miscommunications in interpersonal communication. However, when you talk to a person, they will be more likely to listen sympathetically.
AI is either working or not, at least in the short term. You are good to go if the system can understand what you’re saying. If it can’t, you have to make sure that you are understood.
Roberto Rey Agudo (19459133) – Research Assistant Professor of Spanish & Portuguese
Dartmouth College (19659023) This article has been republished.
The Conversation is licensed under a Creative Commons License. Read the
Original article
Receive the most important tech updates in your inbox every week.
This post is also tagged as
.