Close Menu
    Facebook X (Twitter) Instagram
    Facebook Instagram YouTube
    Crypto Go Lore News
    Subscribe
    Wednesday, May 27
    • Home
    • Market Analysis
    • Latest
      • Bitcoin News
      • Ethereum News
      • Altcoin News
      • Blockchain News
      • NFT News
      • Market Analysis
      • Mining News
      • Technology
      • Videos
    • Trending Cryptos
    • AI News
    • Market Cap List
    • Mining
    • Trading
    • Contact
    Crypto Go Lore News
    Home»AI News»DeepMind makes big jump toward interpreting LLMs with sparse autoencoders
    AI News

    DeepMind makes big jump toward interpreting LLMs with sparse autoencoders

    CryptoExpertBy CryptoExpertJuly 26, 2024No Comments6 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    DeepMind makes big jump toward interpreting LLMs with sparse autoencoders
    Share
    Facebook Twitter Pinterest Email Copy Link
    Blockonomics


    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

    Large language models (LLMs) have made remarkable progress in recent years. But understanding how they work remains a challenge and scientists at artificial intelligence labs are trying to peer into the black box.

    One promising approach is the sparse autoencoder (SAE), a deep learning architecture that breaks down the complex activations of a neural network into smaller, understandable components that can be associated with human-readable concepts.

    In a new paper, researchers at Google DeepMind introduce JumpReLU SAE, a new architecture that improves the performance and interpretability of SAEs for LLMs. JumpReLU makes it easier to identify and track individual features in LLM activations, which can be a step toward understanding how LLMs learn and reason.

    Betfury

    The challenge of interpreting LLMs

    The fundamental building block of a neural network is individual neurons, tiny mathematical functions that process and transform data. During training, neurons are tuned to become active when they encounter specific patterns in the data.

    However, individual neurons don’t necessarily correspond to specific concepts. A single neuron might activate for thousands of different concepts, and a single concept might activate a broad range of neurons across the network. This makes it very difficult to understand what each neuron represents and how it contributes to the overall behavior of the model. 

    This problem is especially pronounced in LLMs, which have billions of parameters and are trained on massive datasets. As a result, the activation patterns of neurons in LLMs are extremely complex and difficult to interpret.

    Sparse autoencoders

    Autoencoders are neural networks that learn to encode one type of input into an intermediate representation, and then decode it back to its original form. Autoencoders come in different flavors and are used for different applications, including compression, image denoising, and style transfer.

    Sparse autoencoders (SAE) use the concept of autoencoder with a slight modification. During the encoding phase, the SAE is forced to only activate a small number of the neurons in the intermediate representation.

    This mechanism enables SAEs to compress a large number of activations into a small number of intermediate neurons. During training, the SAE receives activations from layers within the target LLM as input.

    SAE tries to encode these dense activations through a layer of sparse features. Then it tries to decode the learned sparse features and reconstruct the original activations. The goal is to minimize the difference between the original activations and the reconstructed activations while using the smallest possible number of intermediate features.

    The challenge of SAEs is to find the right balance between sparsity and reconstruction fidelity. If the SAE is too sparse, it won’t be able to capture all the important information in the activations. Conversely, if the SAE is not sparse enough, it will be just as difficult to interpret as the original activations.

    JumpReLU SAE

    SAEs use an “activation function” to enforce sparsity in their intermediate layer. The original SAE architecture uses the rectified linear unit (ReLU) function, which zeroes out all features whose activation value is below a certain threshold (usually zero). The problem with ReLU is that it might harm sparsity by preserving irrelevant features that have very small values. 

    DeepMind’s JumpReLU SAE aims to address the limitations of previous SAE techniques by making a small change to the activation function. Instead of using a global threshold value, JumpReLU can determine separate threshold values for each neuron in the sparse feature vector. 

    This dynamic feature selection makes the training of the JumpReLU SAE a bit more complicated but enables it to find a better balance between sparsity and reconstruction fidelity.

    JumpReLU vs other activation functions (source: arXiv)

    The researchers evaluated JumpReLU SAE on DeepMind’s Gemma 2 9B LLM. They compared the performance of JumpReLU SAE against two other state-of-the-art SAE architectures, DeepMind’s own Gated SAE and OpenAI’s TopK SAE. They trained the SAEs on the residual stream, attention output, and dense layer outputs of different layers of the model.

    The results show that across different sparsity levels, the construction fidelity of JumpReLU SAE is superior to Gated SAE and at least as good as TopK SAE. JumpReLU SAE was also very effective at minimizing “dead features” that are never activated. It also minimizes features that are too active and fail to provide a signal on specific concepts that the LLM has learned.

    In their experiments, the researchers found that the features of JumpReLU SAE were as interpretable as other state-of-the-art architectures, which is crucial for making sense of the inner workings of LLMs.

    Furthermore, JumpReLU SAE was very efficient to train, making it practical to apply to large language models. 

    Understanding and steering LLM behavior

    SAEs can provide a more accurate and efficient way to decompose LLM activations and help researchers identify and understand the features that LLMs use to process and generate language. This can open the door to developing techniques to steer LLM behavior in desired directions and mitigate some of their shortcomings, such as bias and toxicity. 

    For example, a recent study by Anthropic found that SAEs trained on the activations of Claude Sonnet could find features that activate on text and images related to the Golden Gate Bridge and popular tourist attractions. This kind of visibility on concepts can enable scientists to develop techniques that prevent the model from generating harmful content such as creating malicious code even when users manage to circumvent prompt safeguards through jailbreaks. 

    SAEs can also give more granular control over the responses of the model. For example, by changing the sparse activations and decoding them back into the model, users might be able to control aspects of the output, such as making the responses more funny, easier to read, or more technical. Studying the activations of LLMs has turned into a vibrant field of research and there is a lot to be learned yet.

    VB Daily

    Stay in the know! Get the latest news in your inbox daily

    By subscribing, you agree to VentureBeat’s Terms of Service.

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.



    Source link

    Tokenmetrics
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
    CryptoExpert
    • Website

    Related Posts

    AI News

    AI Trading Bots Explained (Pocket Option Guide)

    April 9, 2026
    AI News

    How is AI reshaping opportunities for students? #news #ai #trending #opportunity #shorts

    April 3, 2026
    AI News

    Create Stunning AI Videos in Minutes! LunaBloomAI Full Tutorial for Beginners (2024)

    December 16, 2025
    AI News

    Glimmering Labs of 2050 AI Shaping Tomorrow’s Materials

    December 15, 2025
    AI News

    Sunday Funny Comic #google #AI News #War #Dogs Virals memes #stockmarket #news #crypto #shorts

    December 14, 2025
    AI News

    ✨ What I Noticed About AI Today 🤖 | Simple Tip for Beginners #shorts

    December 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Recommended
    Editors Picks

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026

    Uniswap price outlook as Ethereum’s Vitalik Buterin offloads UNI tokens

    April 9, 2026
    Latest Posts

    We are a leading platform dedicated to delivering authoritative insights, news, and resources on cryptocurrencies and blockchain technology. At Crypto Go Lore News, our mission is to empower individuals and businesses with reliable, actionable, and up-to-date information about the cryptocurrency ecosystem. We aim to bridge the gap between complex blockchain technology and practical understanding, fostering a more informed global community.

    Latest Posts

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026
    Newsletter

    Subscribe to Updates

    Get the latest Crypto news from Crypto Golore News about crypto around the world.

    Facebook Instagram YouTube
    • Contact
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    © 2026 CryptoGoLoreNews. All rights reserved by CryptoGoLoreNews.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 75,124.00
    ethereum
    Ethereum (ETH) $ 2,058.52
    tether
    Tether (USDT) $ 0.998338
    bnb
    BNB (BNB) $ 652.71
    xrp
    XRP (XRP) $ 1.33
    usd-coin
    USDC (USDC) $ 0.999737
    solana
    Solana (SOL) $ 83.65
    tron
    TRON (TRX) $ 0.369282
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05