The latest innovations in the field of artificial intelligence have allowed us to define intelligent systems with a greater and more articulate understanding of language than ever before. There has been an exponential increase in the performance of large language models (LLMs) like GPT-3, T5, PaLM, etc. These models began to imitate humans by learning to read, summarize, and generate textual data. It has been pointed out by some extensive studies that an LLM works well if its size is large. What an LLM model is capable of can be determined by increasing its scale. However, when an LLM is trained on huge training data, it sometimes becomes expensive and less fit for use.
What is Adaptive Modeling of Confident Language?
To speed up the text generation process and make it less expensive, Google AI researchers are introducing “Adaptive Confident Language Modeling” (CALM). CALM was discussed at the hybrid conference, NeurIPS 2022, and claimed to speed up the text generation process by three times. Textual data consists of different types of sentences. In some sentences, next word prediction, i.e. text generation, can be very insignificant, while it can be difficult in some cases. CALM introduces the concept of granting relatively additional computational resources in case the text prediction that needs to be done is extensive and difficult. Thus, unlike a standard LLM model in which the computing power for all types of predictions is the same, the newly flourishing CALM method speeds up the text generation process and, therefore, retains the quality of the output. It basically works by allocating individual compute volumes for each input and each build timestamp.
The team, in their recently published article Confident Adaptive Language Modeling, declared their contributions which are as follows –
- Confident Adaptive Language Modeling is a framework responsible for a particularly remarkable speedup in text generation.
- The effectiveness of CALM was illustrated on three varied text generation datasets.
- The team used the concept of early exit, which drives an efficient class of confidence measures and threshold functions used by CALM.
The team carried out an in-depth analysis of three sets of data, namely –
- Text Digest Dataset – CNN/DM (https://github.com/abisee/cnn-dailymail)
- Machine Translation Dataset – WMT (https://www.statmt.org/wmt15/translation-task.html)
- The Stanford Question Answering – SQuAD 2.0 dataset (https://rajpurkar.github.io/SQuAD-explorer/)
To research and experiment on the datasets mentioned above, the group used an eight-layer encoder-decoder T5 architecture. First, the encoder takes the text input and transforms it into a dense representation. This is followed by the decoder, which gives an output prediction one after the other. Transformer layers used for development include attention and anticipation modules as well as matrix multiplication. CALM predicts the next word before all decoder layers are complete, thus skipping calculations for a few predictions.
The model was measured on the calculated values of the three confidence measures – the SoftMax score, the state spread and the early exit classifier.
It has been stated that a standard technique uses the same number of layers to make predictions which can vary from 3 to 5. Local Oracle Confidence Measure, used to make predictions in CALM, shows satisfactory performance using only 1 .5 decoder layers.
Therefore, Confident Adaptive Language Modeling is a breakthrough in terms of language modeling. It is capable of maintaining high performance, producing high quality output as well as increasing text generation speed. It even decreases the heavy computational task of the model and is, without a doubt, a very efficient solution.
Check Paper, Coded, and Reference article. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.
Tanya Malhotra is a final year undergraduate at the University of Petroleum and Energy Studies, Dehradun, pursuing a BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, as well as a keen interest in learning new skills, leading groups and managing work in an organized manner.