Close Menu
    Facebook X (Twitter) Instagram
    Facebook Instagram YouTube
    Crypto Go Lore News
    Subscribe
    Sunday, June 8
    • Home
    • Market Analysis
    • Latest
      • Bitcoin News
      • Ethereum News
      • Altcoin News
      • Blockchain News
      • NFT News
      • Market Analysis
      • Mining News
      • Technology
      • Videos
    • Trending Cryptos
    • AI News
    • Market Cap List
    • Mining
    • Trading
    • Contact
    Crypto Go Lore News
    Home»AI News»Researchers at Intel Labs Introduce LLaVA-Gemma: A Compact Vision-Language Model Leveraging the Gemma Large Language Model in Two Variants (Gemma-2B and Gemma-7B)
    AI News

    Researchers at Intel Labs Introduce LLaVA-Gemma: A Compact Vision-Language Model Leveraging the Gemma Large Language Model in Two Variants (Gemma-2B and Gemma-7B)

    CryptoExpertBy CryptoExpertApril 7, 2024No Comments3 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    Researchers at Intel Labs Introduce LLaVA-Gemma: A Compact Vision-Language Model Leveraging the Gemma Large Language Model in Two Variants (Gemma-2B and Gemma-7B)
    Share
    Facebook Twitter Pinterest Email Copy Link
    Changelly


    Recent advancements in large language models (LLMs) and Multimodal Foundation Models (MMFMs) have spurred interest in large multimodal models (LMMs). Models like GPT-4, LLaVA, and their derivatives have shown remarkable performance in vision-language tasks such as Visual Question Answering and image captioning. However, their high computational demands have prompted exploration into smaller-scale LMMs.

    Researchers from Cognitive AI, Intel Labs, introduce LLaVA-Gemma, a suite of vision-language assistants trained from Gemma LLM variants, Gemma-2B and Gemma-7B and inspired by progress in small yet capable visual language models (VLMs) like LLaVA-Phi. LLaVA-Gemma allows researchers to investigate the trade-offs between computational efficiency and the richness of visual and linguistic understanding by possessing two variants with different parameter sizes. Also, the researchers examine how a massively increased token set affects multi-modal performance.

    LLaVA-Gemma follows the LLaVA framework with modifications, combining a pretrained vision encoder (like CLIP) and a pretrained language model (such as Gemma) via an MLP connector. It undergoes a two-stage training process: pretraining the MLP connector on a custom dataset, then jointly finetuning the language model and connector on multimodal instruction tuning examples. Deviations include using Gemma models for language backbone, employing the larger DINOv2 image encoder for vision, and exploring skipping the initial pretraining stage for improved performance. Both pretraining and finetuning stages are conducted with and without initial pretraining.

    For the 2B backbone, DinoV2 variants outperform CLIP variants on all benchmarks except POPE-F1 and MMVP. Comparing the training and eval speed for the two model sizes, The training time for the Gemma-2B model on 8 Intel Gaudi 2® AI accelerators was 4 hours, while the larger Gemma-7B model required 16 hours to train under the same conditions. This indicates that the Gemma-7B model, with its increased parameter count, takes approximately four times longer to train than the Gemma-2B model. The relative speed of the Gemma7B model is thus 0.25x compared to the Gemma-2B model. These results highlight the trade-off between model size and training efficiency, with larger models requiring significantly more computational resources and time.

    okex

    Contributions to this research are as follows:

    1. Researchers introduce LLaVA-Gemma, an MMFM leveraging compact, powerful Gemma language models for efficient multimodal interactions. 

    2. They extensively evaluate Gemma-2B and Gemma-7B model variants, providing valuable insights into the tradeoffs between computational efficiency and the richness of visual and linguistic understanding in LLMs.

    3. They present a deep exploration into alternate design choices and visualize attention with relevancy maps to enhance their understanding of the model’s performance and attention.

    In conclusion, The research introduces LLaVA-Gemma, a compact vision-language model utilizing Gemma LLM in two variants, Gemma-2B and Gemma-7B. This research provides a unique opportunity for researchers to explore the trade-offs between computational efficiency and multimodal understanding in small-scale models. Evaluations demonstrate the versatility and effectiveness of LLaVA-Gemma across a range of datasets, highlighting its potential as a benchmark for future research in small-scale vision-language models.

    Check out the Paper and HF Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    Asjad is an intern consultant at Marktechpost. He is persuing B.Tech in mechanical engineering at the Indian Institute of Technology, Kharagpur. Asjad is a Machine learning and deep learning enthusiast who is always researching the applications of machine learning in healthcare.

    🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…



    Source link

    Binance
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
    CryptoExpert
    • Website

    Related Posts

    AI News

    Privacy is the most fundamental aspect of human rights! #ai #ainews #chatgpt #openai #technews

    June 7, 2025
    AI News

    Test your AI knowledge | Fun AI Quiz for beginners & Developers

    June 6, 2025
    AI News

    Struggling with One Part? Let AI Guide You, Not Replace You #ai #shorts #homework

    June 5, 2025
    AI News

    Nude photo dikhai parliament me #news #nude #ai #parliament #newsupdate #foryou #shortsvideo #short

    June 4, 2025
    AI News

    Top 10 AI Tools in 2025 🔥 | Life-Changing Tools for Beginners | AI Use at 55 Story

    June 3, 2025
    AI News

    What if the characters knew they were fake? 🤯 #ai #shorts #veo3 #aigenerated

    June 2, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Recommended
    Editors Picks

    How a 91% Audit Score Signals DeFi’s Maturing Moment

    June 8, 2025

    TRUMP Meme Coin is Unlikely to Recover Soon – Here’s Why

    June 8, 2025

    Privacy is the most fundamental aspect of human rights! #ai #ainews #chatgpt #openai #technews

    June 7, 2025

    Pumpfun pe memecoin kaise bnaye #crypto #guide

    June 7, 2025
    Latest Posts

    We are a leading platform dedicated to delivering authoritative insights, news, and resources on cryptocurrencies and blockchain technology. At Crypto Go Lore News, our mission is to empower individuals and businesses with reliable, actionable, and up-to-date information about the cryptocurrency ecosystem. We aim to bridge the gap between complex blockchain technology and practical understanding, fostering a more informed global community.

    Latest Posts

    How a 91% Audit Score Signals DeFi’s Maturing Moment

    June 8, 2025

    TRUMP Meme Coin is Unlikely to Recover Soon – Here’s Why

    June 8, 2025

    Privacy is the most fundamental aspect of human rights! #ai #ainews #chatgpt #openai #technews

    June 7, 2025
    Newsletter

    Subscribe to Updates

    Get the latest Crypto news from Crypto Golore News about crypto around the world.

    Facebook Instagram YouTube
    • Contact
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    © 2025 CryptoGoLoreNews. All rights reserved by CryptoGoLoreNews.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 105,539.28
    ethereum
    Ethereum (ETH) $ 2,515.91
    tether
    Tether (USDT) $ 1.00
    xrp
    XRP (XRP) $ 2.21
    bnb
    BNB (BNB) $ 650.26
    solana
    Solana (SOL) $ 149.99
    usd-coin
    USDC (USDC) $ 0.999996
    dogecoin
    Dogecoin (DOGE) $ 0.18417
    tron
    TRON (TRX) $ 0.28588
    cardano
    Cardano (ADA) $ 0.663686