Machine learning improves the ability to transcribe Arabic speech

[ad_1]

Thanks to advances in speech processing and natural language, it is hoped that one day you may be able to ask your virtual assistant what the best ingredients for a salad are. You may now be able to ask your home gadget to play music or open it with a voice command, a feature that is already available on some many devices.

If you speak Moroccan, Algerian, Egyptian, Sudanese or any of the other dialects of Arabic that are extremely different from region to region, where some of them are mutually incomprehensible, this is a different story. If your native language is Arabic, Finnish, Mongolian, Navajo or any other language with a high level of morphological complexity, you may feel abandoned.

These complex constructions intrigued Ahmed Ali to find a solution. He is a chief engineer in the Arabic Language Technologies group at the Qatar Computing Research Institute (QCRI), part of the Hamad Bin Khalifa University of the Qatar Foundation, and the founder of ArabicSpeech, “a community that exists for the benefit of Arabic speech and speech technology.”

The headquarters of the Qatar Foundation

Ali was fascinated by the idea of ​​talking to cars, appliances and gadgets many years ago while at IBM. “Can we build a machine capable of understanding different dialects – an Egyptian pediatrician to automate a recipe, a Syrian teacher to help children get the basics of their lesson, or a Moroccan chef who describes the best recipe for couscous. cousin? ”he claims. However, the algorithms that power these machines cannot sift through the approximately 30 varieties of Arabic, let alone make sense of them. Today, most speech recognition tools only work in English and a few other languages.

The coronavirus pandemic has further fueled the growing reliance on voice technology, where the way natural language processing technology has helped people follow the guidelines for staying home and physical distancing measures. However, as we use voice commands to support e-commerce purchases and manage our households, the future has even more applications.

Millions of people around the world use massive open online courses (MOOCs) for open access and unlimited participation. Speech recognition is one of the main functions in MOOC, where students can search in certain areas in the oral content of the courses and allow translations through subtitles. Speech technology allows the digitization of lectures to display spoken words as text in university classrooms.

Ahmed Ali, Hamad bin Khalifa University

According to a recent article in Speech Technology magazine, the voice and speech recognition market is expected to reach $ 26.8 billion by 2025, as millions of consumers and companies around the world begin to rely on voice bots not just to interact with their devices. or cars, but also to improve customer service, stimulate innovation in healthcare and improve accessibility and inclusion for those with hearing, speech or mobility impairments.

In a 2019 study, Capgemini predicts that by 2022, more than two out of three users will choose voice assistants rather than visits to stores or bank branches; a share that could justifiably jump, given the domestic, physically remote life and trade that the epidemic has imposed on the world for more than a year and a half.

However, these devices fail to deliver to huge parts of the globe. For these 30 species of Arabs and millions of people, this is a significantly missed opportunity.

Arabic for machines

English- or French-speaking voice bots are far from perfect. However, teaching machines to understand Arabic is particularly difficult for several reasons. These are three generally accepted challenges:

  1. Lack of diacritical marks. Arabic dialects are popular as well as mostly spoken. Most of the available text is undiacritical, which means that it lacks accents such as sharp (´) or heavy (`), which show the sound values ​​of the letters. Therefore, it is difficult to determine where the vowels go.
  2. Lack of resources. There is a shortage of labeled data on the various Arabic dialects. Collectively, they lack standardized spelling rules that dictate how to write a language, including norms or spelling, hyphens, word breaks, and accents. These resources are crucial for the training of computer models, and the fact that there are too few of them hinders the development of Arabic speech recognition.
  3. Morphological complexity. Arabic speakers are involved in a lot of code switching. For example, in areas colonized by the French – North Africa, Morocco, Algeria and Tunisia – the dialects include many busy French words. Therefore, there are a large number of words outside the dictionary that speech recognition technology cannot understand because these words are not Arabic.

“But the field is moving at lightning speed,” says Ali. It is a joint effort between many researchers to move even faster. Ali’s Arabic Technology Lab is leading the ArabicSpeech project to combine Arabic translations with dialects that are native to each region. For example, Arabic dialects can be divided into four regional dialects: North African, Egyptian, Gulf and Levantine. However, given that dialects do not conform to boundaries, this can be as fine as a city dialect; for example, speaking Egyptian can distinguish between the Alexandrian dialect by his fellow citizen of Aswan (distance of 1000 kilometers on the map).

Building a technological future for all

At this point, machines are almost as accurate as human transcribers, thanks in large part to advances in deep neural networks, a field of machine learning in artificial intelligence that relies on algorithms inspired by the way the human brain works, biologically and functionally. . Until recently, however, speech recognition was a bit hacked together. The technology has a history of reading various modules for acoustic modeling, building lexicons for pronunciation and language modeling; all modules that must be taught separately. More recently, researchers have been training models that convert acoustic functions directly into text transcriptions, potentially optimizing all parts for the final task.

Even with these improvements, Ali still can’t give a voice command to most devices in his native Arabic. “It’s 2021 and I still can’t speak many machines in my dialect,” he said. “I mean, I now have a device that can understand my English, but the machine recognition of multi-dialect Arabic has not yet happened.”

The realization of this is at the heart of Ali’s work, which culminates in the first transformer for the recognition of Arabic speech and its dialects; one that has achieved an incomparable performance so far. Called the QCRI Advanced Transcription System, the technology is currently used by television broadcasters Al-Jazeera, DW and the BBC to transcribe online content.

There are several reasons why Ali and his team have succeeded in building these speech engines right now. First, he says, “There is a need for resources for all dialects. We need to accumulate resources so that we can then train the model. “Advances in computer processing mean that computationally intensive machine learning is now done on a graphics processor that can quickly process and display complex graphics. As Ali says:” We have a great architecture, good modules and we have data that represents reality.

Researchers from QCRI and Kanari AI have recently developed models that can achieve human parity in Arabic-language news. The system demonstrates the effect of subtitling Aljazeera’s daily reports. While the human error rate in English (HER) is about 5.6%, the study found that Arabic HER is significantly higher and can reach 10% due to the morphological complexity of the language and the lack of standard spelling rules in dialect Arabic. Thanks to the latest advances in deep learning and end-to-end architecture, the Arabic speech recognition machine is able to surpass language speakers in broadcasting news.

While modern standard Arabic speech recognition seems to work well, researchers at QCRI and Kanari AI are absorbed in testing the limits of dialect processing and achieving great results. Since no one speaks modern standard Arabic at home, attention to the dialect is what we need to enable our voice assistants to understand us.

This content was written by the Qatar Computer Research Institute, Hamad Bin Khalifa University, a member of the Qatar Foundation. Not written by the MIT Technology Review.

[ad_2]

Source link

Leave a Reply

Your email address will not be published.