Close Menu
    Facebook X (Twitter) Instagram
    Facebook Instagram YouTube
    Crypto Go Lore News
    Subscribe
    Wednesday, May 27
    • Home
    • Market Analysis
    • Latest
      • Bitcoin News
      • Ethereum News
      • Altcoin News
      • Blockchain News
      • NFT News
      • Market Analysis
      • Mining News
      • Technology
      • Videos
    • Trending Cryptos
    • AI News
    • Market Cap List
    • Mining
    • Trading
    • Contact
    Crypto Go Lore News
    Home»AI News»Embeddings or LLMs: What’s Best for Detecting Code Clones Across Languages?
    AI News

    Embeddings or LLMs: What’s Best for Detecting Code Clones Across Languages?

    CryptoExpertBy CryptoExpertAugust 14, 2024No Comments4 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email VKontakte Telegram
    Embeddings or LLMs: What’s Best for Detecting Code Clones Across Languages?
    Share
    Facebook Twitter Pinterest Email Copy Link
    Blockonomics


    Cross-lingual code cloning has become an important and difficult job due to the rising complexity of modern software development, where numerous programming languages are typically employed inside a single project. The term ‘cross-lingual code clone detection’ describes the process of finding identical or nearly identical code segments in several computer languages. 

    Recent advances in Artificial Intelligence and Machine Learning have made tremendous progress in handling many computing jobs possible, especially with the introduction of Large Language Models (LLMs). Due to their exceptional Natural Language Processing skills, LLMs have garnered attention for their possible use in code-related tasks like code clone identification. Building on these advancements, in recent research, a team of researchers from the University of Luxembourg has re-examined the problem of cross-lingual code clone detection and studied the effectiveness of both LLMs and pre-trained embedding models in this field.

    The research assesses the performance of four different LLMs in conjunction with eight unique prompts intended to support the detection of cross-lingual code clones. It evaluates the usefulness of a pre-trained embedding model that produces vector representations of code excerpts. Following this, pairs of code fragments are categorized as clones or non-clones based on these representations. Two popular cross-lingual datasets have been used for the evaluations, which are CodeNet and XLCoST.

    The study’s findings have demonstrated the benefits and drawbacks of LLMs in this situation. When working with simple programming examples like those in the XLCoST dataset, the LLMs showed that they could attain high F1 scores, up to 0.98. However, when presented with more difficult programming tasks, their performance suffered. This decline raises the possibility that LLMs will find it difficult to completely appreciate the subtle meaning of code clones, especially in a cross-lingual context where it is crucial to comprehend the functional equivalency of code between languages.

    okex

    However, the research has shown that embedding models, which represent code fragments from many programming languages within a single vector space, offer a stronger basis for identifying cross-lingual code clones. With an improvement of about two percentage points on the XLCoST dataset and about 24 percentage points on the more complicated CodeNet dataset, the researchers could attain results that surpassed all evaluated LLMs by training a basic classifier using these embeddings.

    The team has summarized their primary contributions as follows.

    The work broadly analyzes LLM capacities to identify cross-lingual code clones, with a particular emphasis on Java combined with ten distinct programming languages. This work applies several LLMs to a wide range of cross-lingual datasets and assesses the effects of several quick engineering methods, providing a distinct viewpoint in contrast to previous research.

    The study offers insightful information about how well LLM performs in code clone identification. It emphasizes how much the closeness of two programming languages influences LLMs’ capacity to identify clones, particularly when given straightforward cues. The effects of programming language differences are lessened when prompts focus on reasoning and logic. The generalisability and universal effectiveness of LLMs in cross-lingual code clone detection tasks have also been discussed.

    The study contrasts LLM performance with traditional ML techniques using learned code representations as a basis. The experiment’s findings have indicated that LLMs might not fully understand the meaning of clones in the context of code clone detection, suggesting that conventional techniques may still be superior in this regard.

    In conclusion, the results imply that while LLMs are highly capable, especially when it comes to handling simple code examples, they might not be the most effective method for cross-lingual code clone detection, especially when dealing with more complicated circumstances. On the other hand, embedding models are more appropriate for attaining state-of-the-art performance in this domain since they provide consistent and language-neutral representations of code. 

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.



    Source link

    Betfury
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Telegram Copy Link
    CryptoExpert
    • Website

    Related Posts

    AI News

    AI Trading Bots Explained (Pocket Option Guide)

    April 9, 2026
    AI News

    How is AI reshaping opportunities for students? #news #ai #trending #opportunity #shorts

    April 3, 2026
    AI News

    Create Stunning AI Videos in Minutes! LunaBloomAI Full Tutorial for Beginners (2024)

    December 16, 2025
    AI News

    Glimmering Labs of 2050 AI Shaping Tomorrow’s Materials

    December 15, 2025
    AI News

    Sunday Funny Comic #google #AI News #War #Dogs Virals memes #stockmarket #news #crypto #shorts

    December 14, 2025
    AI News

    ✨ What I Noticed About AI Today 🤖 | Simple Tip for Beginners #shorts

    December 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Recommended
    Editors Picks

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026

    Uniswap price outlook as Ethereum’s Vitalik Buterin offloads UNI tokens

    April 9, 2026
    Latest Posts

    We are a leading platform dedicated to delivering authoritative insights, news, and resources on cryptocurrencies and blockchain technology. At Crypto Go Lore News, our mission is to empower individuals and businesses with reliable, actionable, and up-to-date information about the cryptocurrency ecosystem. We aim to bridge the gap between complex blockchain technology and practical understanding, fostering a more informed global community.

    Latest Posts

    Ethereum Sees 56.9% Jump in Transfers as Adoption Gains Ground

    April 12, 2026

    Polymarket Briefly Appears in Google News Before Being Removed

    April 12, 2026

    The Bitcoin miner sell-off looks close to exhaustion marking impending reversal in market pressure

    April 9, 2026
    Newsletter

    Subscribe to Updates

    Get the latest Crypto news from Crypto Golore News about crypto around the world.

    Facebook Instagram YouTube
    • Contact
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    © 2026 CryptoGoLoreNews. All rights reserved by CryptoGoLoreNews.

    Type above and press Enter to search. Press Esc to cancel.

    bitcoin
    Bitcoin (BTC) $ 74,320.00
    ethereum
    Ethereum (ETH) $ 2,022.84
    tether
    Tether (USDT) $ 0.998525
    bnb
    BNB (BNB) $ 647.69
    xrp
    XRP (XRP) $ 1.31
    usd-coin
    USDC (USDC) $ 0.999632
    solana
    Solana (SOL) $ 82.44
    tron
    TRON (TRX) $ 0.367979
    figure-heloc
    Figure Heloc (FIGR_HELOC) $ 1.03
    staked-ether
    Lido Staked Ether (STETH) $ 2,265.05