Close Menu
    Facebook X (Twitter) Instagram
    Facebook Instagram YouTube
    Crypto Go Lore News
    Subscribe
    Thursday, June 5
    • Home
    • Market Analysis
    • Latest
      • Bitcoin News
      • Ethereum News
      • Altcoin News
      • Blockchain News
      • NFT News
      • Market Analysis
      • Mining News
      • Technology
      • Videos
    • Trending Cryptos
    • AI News
    • Market Cap List
    • Mining
    • Trading
    • Contact
    Crypto Go Lore News
    Home»AI News»THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models
    AI News

    THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models

    CryptoExpertBy CryptoExpertMay 12, 2024No Comments3 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    THRONE: Advancing the Evaluation of Hallucinations in Vision-Language Models
    Share
    Facebook Twitter Pinterest Email Copy Link
    Ledger


    Understanding and mitigating hallucinations in vision-language models (VLVMs) is an emerging field of research that addresses the generation of coherent but factually incorrect responses by these advanced AI systems. As VLVMs increasingly integrate text and visual inputs to generate responses, the accuracy of these outputs becomes crucial, especially in settings where precision is paramount, such as medical diagnostics or autonomous driving.

    Hallucinations in VLVMs typically manifest as plausible yet incorrect details generated about an image. These inaccuracies pose significant risks, potentially misinforming decisions in critical applications. The challenge lies in detecting these errors and developing methods to mitigate them effectively, ensuring the reliability of VLVM outputs.

    Most existing benchmarks for evaluating hallucinations in VLVMs focus on responses to constrained query formats, such as yes/no questions about specific objects or attributes within an image. These benchmarks often fail to measure more complex, open-ended hallucinations that can occur in varied real-world applications. As a result, there is a significant gap in the ability to fully understand and mitigate the broader spectrum of hallucinations that VLVMs can produce.

    Researchers from the University of Oxford, AWS AI Labs, introduced a new framework called THRONE (Text-from-image Hallucination Recognition with Object-probes for open-ended Evaluation) to address this gap. THRONE is designed to assess Type I hallucinations, those that occur in response to open-ended prompts requiring detailed image descriptions. Unlike previous methods, THRONE uses publicly available language models to evaluate the hallucinations in free-form responses generated by various VLVMs, offering a more comprehensive and rigorous approach.

    Binance

    THRONE leverages multiple metrics to measure hallucinations across different VLVMs quantitatively. For example, it employs precision and recall metrics alongside a class-wise F0.5 score, emphasizing precision twice as much as recall. This scoring is particularly relevant in scenarios where false positives, incorrect but plausible responses, are more detrimental than false negatives.

    An evaluation of THRONE’s effectiveness revealed insightful data about the prevalence and characteristics of hallucinations in current VLVMs. Despite the framework’s advanced approach, the results indicate that many VLVMs still struggle with a high rate of hallucinations. For instance, the framework detected that some of the evaluated models produce responses, with about 20% of the objects mentioned being hallucinations. This high rate of inaccuracies underscores the persistent challenge of reducing hallucinations and improving the reliability of VLVM outputs.

    In conclusion, the THRONE framework represents a significant step forward in evaluating hallucinations in vision-language models, particularly addressing the complex issue of Type I hallucinations in free-form responses. While existing benchmarks have struggled to effectively measure these more nuanced errors, THRONE utilizes a novel combination of publicly available language models and a robust metric system, including precision, recall, and class-wise F0.5 scores. Despite these advances, the high rate of detected hallucinations, around 20% in some models, underscores the ongoing challenges and the necessity for further research to enhance the accuracy and reliability of VLVMs in practical applications.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 42k+ ML SubReddit

    Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

    [Recommended Read] Rightsify’s GCX: Your Go-To Source for High-Quality, Ethically Sourced, Copyright-Cleared AI Music Training Datasets with Rich Metadata



    Source link

    Betfury
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
    CryptoExpert
    • Website

    Related Posts

    AI News

    Nude photo dikhai parliament me #news #nude #ai #parliament #newsupdate #foryou #shortsvideo #short

    June 4, 2025
    AI News

    Top 10 AI Tools in 2025 🔥 | Life-Changing Tools for Beginners | AI Use at 55 Story

    June 3, 2025
    AI News

    What if the characters knew they were fake? 🤯 #ai #shorts #veo3 #aigenerated

    June 2, 2025
    AI News

    #reels #viral #fact #tremding #shorts #reels #ai #aitools #fact #factreeks #comedey #news

    June 1, 2025
    AI News

    Top 5 Free Ai Tool’s For beginners

    May 31, 2025
    AI News

    Complete Interface of Gamma Explained | AI Guide for Beginners

    May 30, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Recommended
    Editors Picks

    crypto for beginners cryptocurrency explained how to buy crypto what crypto to buy altcoin season

    June 5, 2025

    WPI Official Exchange Listing – Step-by-Step Withdrawal Guide 2025 #seo

    June 5, 2025

    Ethereum Bots Drive $480B Stablecoin Surge as Network Reclaims DeFi Spotlight

    June 5, 2025

    Fund Manager APS Buys $3.4M in Tokenized Real Estate via MetaWealth

    June 5, 2025
    Latest Posts

    We are a leading platform dedicated to delivering authoritative insights, news, and resources on cryptocurrencies and blockchain technology. At Crypto Go Lore News, our mission is to empower individuals and businesses with reliable, actionable, and up-to-date information about the cryptocurrency ecosystem. We aim to bridge the gap between complex blockchain technology and practical understanding, fostering a more informed global community.

    Latest Posts

    crypto for beginners cryptocurrency explained how to buy crypto what crypto to buy altcoin season

    June 5, 2025

    WPI Official Exchange Listing – Step-by-Step Withdrawal Guide 2025 #seo

    June 5, 2025

    Ethereum Bots Drive $480B Stablecoin Surge as Network Reclaims DeFi Spotlight

    June 5, 2025
    Newsletter

    Subscribe to Updates

    Get the latest Crypto news from Crypto Golore News about crypto around the world.

    Facebook Instagram YouTube
    • Contact
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    © 2025 CryptoGoLoreNews. All rights reserved by CryptoGoLoreNews.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 104,410.97
    ethereum
    Ethereum (ETH) $ 2,579.98
    tether
    Tether (USDT) $ 1.00
    xrp
    XRP (XRP) $ 2.19
    bnb
    BNB (BNB) $ 658.52
    solana
    Solana (SOL) $ 150.33
    usd-coin
    USDC (USDC) $ 1.00
    dogecoin
    Dogecoin (DOGE) $ 0.187377
    tron
    TRON (TRX) $ 0.274157
    cardano
    Cardano (ADA) $ 0.671429