Close Menu
    Facebook X (Twitter) Instagram
    Facebook Instagram YouTube
    Crypto Go Lore News
    Subscribe
    Wednesday, May 27
    • Home
    • Market Analysis
    • Latest
      • Bitcoin News
      • Ethereum News
      • Altcoin News
      • Blockchain News
      • NFT News
      • Market Analysis
      • Mining News
      • Technology
      • Videos
    • Trending Cryptos
    • AI News
    • Market Cap List
    • Mining
    • Trading
    • Contact
    Crypto Go Lore News
    Home»AI News»This AI Paper from Huawei Introduces a Theoretical Framework Focused on the Memorization Process and Performance Dynamics of Transformer-based Language Models (LMs)
    AI News

    This AI Paper from Huawei Introduces a Theoretical Framework Focused on the Memorization Process and Performance Dynamics of Transformer-based Language Models (LMs)

    CryptoExpertBy CryptoExpertMay 19, 2024No Comments4 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    This AI Paper from Huawei Introduces a Theoretical Framework Focused on the Memorization Process and Performance Dynamics of Transformer-based Language Models (LMs)
    Share
    Facebook Twitter Pinterest Email Copy Link
    Coinmama


    Transformer-based neural networks have shown great ability to handle multiple tasks like text generation, editing, and question-answering. In many cases, models that use more parameters show better performance measured by perplexity and high accuracies of end tasks. This is the main reason for the development of larger models in industries. However, larger models sometimes result in a bad performance, for example,  the 2B model MiniCPM exhibits comparable capabilities to larger language models, such as Llama2-7B, Mistral-7B, Gemma-7B, and Llama-13B. Moreover, the size of high-quality data available may not keep pace as the computational resources for training larger models increase. 

    Current methods to overcome such shortcomings include Scaling laws, Energy-based models, and Hopfield models. In scaling laws, the performance of models increases when there is a scale-up in the models’ size and volume of training data. Energy-based models have become famous as a fundamental modeling tool in different areas of machine learning over the past few decades. The main idea of this method is to model the neural network using a parameterized probability density function to present the distribution in terms of a learnable energy function. The last one is the Hopfield model, in which the classical Hopfield networks were developed as an example of associative memory. 

    Researchers from Central Research Institute, 2012 Laboratories Huawei Technologies Co., Ltd. introduced a theoretical framework focused on the memorization process and performance dynamics of transformer-based language models (LMs). Researchers carried out a series of experiments using GPT-2 across different data sizes to overcome the signs of saturation and, at the same time, trained vanilla Transformer models on a dataset consisting of 2M tokens. The results of these experiments validated the theoretical results, offering important theoretical insights on the optimal cross-entropy-loss that can guide and improve decision-making in model training. 

    A 12-layer transformer LM is trained using the GPT-2 small tokenizer and architecture on the OpenWebText dataset. This dataset is similar to the WebText dataset used for original GPT-2 model training, which contains 9B tokens from 8,013,769 documents. Using different amounts of data, three models are trained where a subset containing the first 1% (90M) and 0.1% (9M) of the OpenWebText data is created. Further, vanilla transformer models are trained using a small amount of high-quality data that contains pairs of English sentences in declarative formation and is context-free with a vocabulary size of 68 words, where the task is to convert declarative sentences into questions.

    Phemex

    The training with 0.1% (9M) of the OpenWebText data shows over-fitting, and the training loss disappears over iterations. This happens because the training samples are not well-separated due to which the model energy decreases to a sum of some delta functions. When the model size is about the order O(D2) and trained on 90M tokens, the model can achieve similar training and validation loss compared to the setting with 9B tokens. Two vanilla Transformers of 6 and 10 layers are trained using a batch size of 8, and the training losses stabilize at a value of around 1 as predicted in Proposition.

    In conclusion, researchers presented a theoretical framework focused on the memorization process and performance dynamics of transformer-based language models LMs. In this paper, transformer-based networks are modeled using associative memory, and cross-entropy loss is highlighted for model and data sizes. Also, experiments are carried out by (a) utilizing GPT-2 of different data sizes and (b) training vanilla Transformer models on a dataset of 2M tokens. Finally, a global energy function is created for the layered structure of the transformer models using the majorization-minimization technique.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.

    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…



    Source link

    okex
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
    CryptoExpert
    • Website

    Related Posts

    AI News

    AI Trading Bots Explained (Pocket Option Guide)

    April 9, 2026
    AI News

    How is AI reshaping opportunities for students? #news #ai #trending #opportunity #shorts

    April 3, 2026
    AI News

    Create Stunning AI Videos in Minutes! LunaBloomAI Full Tutorial for Beginners (2024)

    December 16, 2025
    AI News

    Glimmering Labs of 2050 AI Shaping Tomorrow’s Materials

    December 15, 2025
    AI News

    Sunday Funny Comic #google #AI News #War #Dogs Virals memes #stockmarket #news #crypto #shorts

    December 14, 2025
    AI News

    ✨ What I Noticed About AI Today 🤖 | Simple Tip for Beginners #shorts

    December 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Recommended
    Editors Picks

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026

    Uniswap price outlook as Ethereum’s Vitalik Buterin offloads UNI tokens

    April 9, 2026
    Latest Posts

    We are a leading platform dedicated to delivering authoritative insights, news, and resources on cryptocurrencies and blockchain technology. At Crypto Go Lore News, our mission is to empower individuals and businesses with reliable, actionable, and up-to-date information about the cryptocurrency ecosystem. We aim to bridge the gap between complex blockchain technology and practical understanding, fostering a more informed global community.

    Latest Posts

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026
    Newsletter

    Subscribe to Updates

    Get the latest Crypto news from Crypto Golore News about crypto around the world.

    Facebook Instagram YouTube
    • Contact
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    © 2026 CryptoGoLoreNews. All rights reserved by CryptoGoLoreNews.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 75,754.00
    ethereum
    Ethereum (ETH) $ 2,076.86
    tether
    Tether (USDT) $ 0.998493
    bnb
    BNB (BNB) $ 651.89
    xrp
    XRP (XRP) $ 1.33
    usd-coin
    USDC (USDC) $ 0.999617
    solana
    Solana (SOL) $ 83.93
    tron
    TRON (TRX) $ 0.372735
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05