Close Menu
    Facebook X (Twitter) Instagram
    Facebook Instagram YouTube
    Crypto Go Lore News
    Subscribe
    Tuesday, May 26
    • Home
    • Market Analysis
    • Latest
      • Bitcoin News
      • Ethereum News
      • Altcoin News
      • Blockchain News
      • NFT News
      • Market Analysis
      • Mining News
      • Technology
      • Videos
    • Trending Cryptos
    • AI News
    • Market Cap List
    • Mining
    • Trading
    • Contact
    Crypto Go Lore News
    Home»AI News»How LLMs are learning to differentiate spatial sounds
    AI News

    How LLMs are learning to differentiate spatial sounds

    CryptoExpertBy CryptoExpertFebruary 13, 2024No Comments6 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    How LLMs are learning to differentiate spatial sounds
    Share
    Facebook Twitter Pinterest Email Copy Link
    Blockonomics


    Humans have unique sensory functions, among them binaural hearing — meaning we can identify types of sound, as well as what direction it’s coming from and how far away it is, and we can also differentiate multiple sources of sound all occurring at once. 

    While large language models (LLMs) are impressive in their ability to perform audio question answering and speech recognition, translation and synthesis, they have yet to handle such “in-the-wild” spatial audio input. 

    A group of researchers is finally starting to crack that code, introducing BAT, what they are calling the first spatial, audio-based LLM that can reason about sounds in a 3-D environment. 

    The model shows impressive precision in classifying types of audio (such as laughter, heartbeat, and splashing water), sound direction (right, left, below) and sound distance (anywhere from 1 to 10 feet). It also has strong capabilities in spatial reasoning in scenarios where two different sounds are overlapping. 

    Ledger

    GB Event

    GamesBeat Summit Call for Speakers

    We’re thrilled to open our call for speakers to our flagship event, GamesBeat Summit 2024 hosted in Los Angeles, where we will explore the theme of “Resilience and Adaption”.

    Apply to speak here

    “The integration of spatial audio into LLMs represents a significant step towards truly multimodal AI systems,” researchers write. 

    The complexities of spatial audio

    Spatial audio — sometimes referred to as ‘virtual surround sound’ — creates the illusion of sound sources in a 3-D space. It is used in applications including virtual reality (VR) and advanced theater systems (as well as other emerging areas, such as the metaverse). 

    But spatial audio is challenging for AI and machine learning (ML), as intelligent agents in 3-D spaces struggle to localize and interpret sound sources. Scientists have attempted to mitigate this with the development of acoustic simulation techniques and algorithms incorporating spatial audio information (such as YouTube-360 and STARSS23). 

    However, BAT’s developers point out, that these applications are often inconsistent in quality and lack “crucial ground truth labels” such as source distance and direction. Similarly, Sound Event Localization and Detection (SELD), which fuses sound source localization with sound event detection (SED) often focuses on “shallow spatial audio perception,” researchers point out.

    Other applications in the audio domain include AudioGPT, which integrates ChatGPT for a wide range of audio and speech applications; LTU, which trains models to reason and answer questions about sounds in a clip; and Qwen-audio, which enables universal audio understanding.

    “However, despite their impressive performance in the audio domain, none of these models have the capability to perceive and reason about spatial audio that is situated in diverse, reverberant, and complex 3-D environments,” researchers assert. 

    Questions on sound type, direction, distance and spatial reasoning

    BAT seems to upend this, demonstrating strong capabilities in spatial reasoning abilities with mixed sounds and sources, achieving a nearly 77% accuracy rate. 

    Its underlying spatial audio encoder, meanwhile, achieved a Mean Average Precision of more than 50% in identifying sound type; a Mean Angular Error of nearly 18 degrees for sound direction; and a Distance Error Rate within 1.64 feet of the actual location at 32.54% for distance estimation.

    The researchers — from the University of Texas, the USA 2Department of Computer Science and Engineering and Shanghai Jiao Tong University in China — began by first developing a Spatial Audio Spectrogram Transformer (SPATIAL-AST), which is capable of sound event detection, spatial localization and distance perception; and SPATIALSOUNDQA, a collection of spatial question-answering tasks. 

    The ensuing LLM BAT then integrated SPATIAL-AST with the LLaMA-2 LLM. 

    The model was asked questions in categories including sound type, what direction the sound was coming from and how far away it was. Lastly, it was tasked with spatial reasoning, in which two concurrent sounds came from entirely different distances and directions. 

    Because previous spatial audio datasets are often limited to music, speech and basic domestic sounds, researchers curated a binaural set of 355 audio event labels using Audioset and Soundspaces. For their environmental meshes, they relied on the large-scale RGB-D dataset Matterport3D, which includes renderings of 90 complete buildings, each with an average of 24.5 rooms across roughly two-and-a-half floors of 5,550 square feet. 

    Questions on sound type

    Q: What sound events can you detect in the recording?

    A: A baby’s laughter

    Q: What are the distinct sounds present in this audio clip?

    A: Heartbeat

    Q: Identify the sound events in the audio clip coming from the right, front, below, approximately 9 feet away. 

    A: Splashing; speech

    Q: What sound events can you detect in the audio recording emanating from the left, behind, above roughly a foot-and-a-half away?

    A: Music; musical instrument; steel pan

    Questions on direction and distance

    Q: In which direction and how far away is the source of the heart sounds?

    A: Left, behind, below; 3 feet away

    Q: Where is the sound of the music coming from? 

    A: left, behind, below; 10 feet away

    Questions on spatial reasoning

    Q: Is the wheezing sound closer than the sound from bird flight/flapping wings? 

    A: No

    Q: Is the source of both the explosion sounds and speech sounds on your left side?

    A: Yes

    Q: Does the sound of an electric shaver occur behind the sound of the waterfall?

    A: Yes

    Q: Can you estimate the distance from the sound of the speech to the sound of the dog? 

    A: 1.64 feet

    Q: What is the sound on the above side of the sound of the vibration? 

    A: Croak; frog

    Q: Could you determine whether the singing’s sound is to the left or right of the steam’s sound?

    A: Left

    “This task demands both perception and complex reasoning,” researchers write of the latter. “The model must implicitly separate the sound sources based on their unique classes, spatially localize each source and then analyze the relationship between the sources in the context of the question.”

    Spatial audio capabilities open up a multitude of possibilities

    Developing LLMs for spatial audio opens up a multitude of possibilities when it comes to virtual reality, gaming, audio engineering and more. 

    “This can lead to more immersive and realistic experiences in these domains,” researchers write. 

    The ability to interpret and reason about spatial sounds can also enhance embodied AI systems such as robots or autonomous vehicles. And, the further development of ambisonics (sources above and below) could provide an even more immersive and realistic experience.

    The researchers conclude: “We are confident that BAT will significantly contribute to the development of spatial audio perception and reasoning, as well as multimodal LLMs.”

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.



    Source link

    okex
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
    CryptoExpert
    • Website

    Related Posts

    AI News

    AI Trading Bots Explained (Pocket Option Guide)

    April 9, 2026
    AI News

    How is AI reshaping opportunities for students? #news #ai #trending #opportunity #shorts

    April 3, 2026
    AI News

    Create Stunning AI Videos in Minutes! LunaBloomAI Full Tutorial for Beginners (2024)

    December 16, 2025
    AI News

    Glimmering Labs of 2050 AI Shaping Tomorrow’s Materials

    December 15, 2025
    AI News

    Sunday Funny Comic #google #AI News #War #Dogs Virals memes #stockmarket #news #crypto #shorts

    December 14, 2025
    AI News

    ✨ What I Noticed About AI Today 🤖 | Simple Tip for Beginners #shorts

    December 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Recommended
    Editors Picks

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026

    Uniswap price outlook as Ethereum’s Vitalik Buterin offloads UNI tokens

    April 9, 2026
    Latest Posts

    We are a leading platform dedicated to delivering authoritative insights, news, and resources on cryptocurrencies and blockchain technology. At Crypto Go Lore News, our mission is to empower individuals and businesses with reliable, actionable, and up-to-date information about the cryptocurrency ecosystem. We aim to bridge the gap between complex blockchain technology and practical understanding, fostering a more informed global community.

    Latest Posts

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026
    Newsletter

    Subscribe to Updates

    Get the latest Crypto news from Crypto Golore News about crypto around the world.

    Facebook Instagram YouTube
    • Contact
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    © 2026 CryptoGoLoreNews. All rights reserved by CryptoGoLoreNews.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 75,738.00
    ethereum
    Ethereum (ETH) $ 2,065.64
    tether
    Tether (USDT) $ 0.998418
    bnb
    BNB (BNB) $ 654.72
    xrp
    XRP (XRP) $ 1.33
    usd-coin
    USDC (USDC) $ 0.999688
    solana
    Solana (SOL) $ 83.48
    tron
    TRON (TRX) $ 0.373747
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05