After ChatFPT caused a ‘fever’, the world focused on generative AI (artificial intelligence) in which algorithms generate content with documents, images, audio or video. The systems must undergo an education process during which they have to ‘learn’ a huge volume of data. They operate by predicting next words or pixels.
LLMs use generative AI algorithms that capture different levels of complexity of natural language and generate new products based on the trained data.
ChatGPT created by OpenAI and Cortana by Microsoft are examples of generative AI. However, most LLMs available are created by foreign companies and do not support the Vietnamese language.
Scientists have predicted that generative AI will create a large market valued at $16 trillion by 2030. Statista Market Insights predicted that generative AI in Vietnam in 2023 could be worth $100.2 million.
Generative AI is expected to play an important role in accelerating digital transformation and increasing productivity in many business fields, especially banking, manufacturing, retail, and agriculture.
A survey conducted by Finastra found that Vietnam has a high level of interest in generative AI. About 91 percent of Vietnamese have shown enthusiasm about the positive values of generative AI.
The development of LLMs in Vietnamese language will improve the quality of language processing apps. An LLM in Vietnamese will help machine translation, recognize voices, answer questions, and summarize documents in Vietnamese with higher quality.
There are high hopes that Make in Vietnam virtual assistants in the future will use LLMs in Vietnamese.
Great challenges
Nguyen Tuan Khang of IBM Vietnam said there are a limited number of LLMs available in the world. The LLMs developed by foreign companies don’t have the data of Vietnamese people, and don’t give support in Vietnamese. In general, large models only answer questions in English, and then the answers are translated into Vietnamese. Because of the complicated process (an intermediary language is needed), the quality of answers sometimes is not as high as expected.
Explaining the modest number of LLMs, Khang said it is very costly to develop an LLM, and investors have to spend hundreds of millions of dollars.
“This is very costly which explains why LLMs are mostly developed by technology giants such as IBM, Facebook and Google,” he said.
Asked if Vietnamese firms could create an LLM, Khang commented that the Vietnamese language is a problem in AI development.
Khang said the "t" in ChatGPT means Transformer. The Transformer model converts data from one form to another. In order to build LLMs in the Vietnamese language, the ‘T’ must have ‘Transformer’ capability with Vietnamese data.
“And in order to do that, they will have to learn very much,” Khang said.
Khang said that it won’t be an easy to build an LLM in Vietnamese. Vietnamese technology firms and institutes are aware of the challenge but they want to develop such LLMs for Vietnamese.
Tran Manh Quan from Viettel Cyberspace said that it is necessary to build an LLM in the Vietnamese language to prevent data and information leakage abroad.
Current LLMs have been learning from everywhere on the internet, collecting information from many sources, both bad and good. It is impossible to control information. It is necessary to build a Make in Vietnam LLM for the Vietnamese people, he said.
Trong Dat