Folgen

  • TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest
    Mar 6 2025

    In this episode, we delve into the paper "TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest" . This research introduces TransAct, a novel Transformer-based model designed to enhance Pinterest's recommendation system by capturing users' short-term preferences through their real-time activities.​

    Research Paper Link - arxiv.org+4arxiv.org+4export.arxiv.org+4

    🔹 What’s Inside?

    • Hybrid Ranking Approach – Combines real-time user behavior with long-term embeddings for better recommendations.
    • Production Deployment – Powers multiple Pinterest surfaces like Homefeed, Search, and Notifications.
    • Proven Impact – A/B tests show improved recommendation quality and engagement.

    Tune in to learn how TransAct balances real-time responsiveness with efficiency in large-scale AI-driven personalization. 🚀

    Mehr anzeigen Weniger anzeigen
    17 Min.
  • Action Speaks Louder Than Words Trillion-Parameter Sequential Transducers for Generative Recommendations
    Feb 20 2025

    In today’s episode, we’re diving into the fascinating world of model merging—a technique that allows multiple AI models to be combined, often enhancing their capabilities without the need for costly retraining. Our focus? A groundbreaking paper titled "Do Merged Models Copy or Compose? Evaluating the Transfer of Capabilities in Model Merging" by researchers exploring the inner workings of this emerging technique.

    We'll be discussing:

    🔹 What is model merging? Why it's gaining traction in AI research.

    🔹 Do merged models simply copy knowledge, or can they create something new?

    🔹 How does merging affect generalization, robustness, and performance?

    🔹 Real-world implications—from adapting models across different domains to fine-tuning AI with fewer resources.

    Mehr anzeigen Weniger anzeigen
    22 Min.
  • Modern Recommender Systems Using Generative Models (Gen-RecSys)
    Feb 16 2025

    In this episode, we delve into the transformative impact of Generative Models on modern Recommender Systems (RS), as detailed in the comprehensive survey titled "A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)". This multidisciplinary study explores how traditional RS, which primarily relied on user-item rating histories, are evolving through the integration of advanced generative techniques.

    Key Discussion Points:

    • Interaction-Driven Generative Models: We examine how models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) are utilized to capture complex user-item interactions, enabling the generation of personalized recommendations beyond historical data.
    • Large Language Models (LLMs) in Natural Language Recommendations: The episode highlights the role of LLMs, such as ChatGPT and Gemini, in understanding and generating human-like text, facilitating conversational recommendations and enhancing user engagement through natural language interfaces.
    • Multimodal Models for Rich Content Integration: We discuss the incorporation of multimodal data—text, images, and videos—into RS, allowing for a more holistic understanding of user preferences and the ability to recommend diverse content types.
    • Evaluation Paradigms and Ethical Considerations: The survey emphasizes the importance of developing new evaluation frameworks to assess the performance and societal impact of Gen-RecSys, addressing challenges such as bias, fairness, and user privacy.

    Join us as we explore these advancements, shedding light on the future directions of recommender systems in the era of generative AI.

    Mehr anzeigen Weniger anzeigen
    19 Min.
  • LLM Query Scheduling with Prefix Reuse and Latency Constraints
    Feb 11 2025

    Research paper: https://arxiv.org/pdf/2502.04677

    Authors: Gregory Dexter, Shao Tang, Ata Fatahi Baarzi, Qingquan Song, Tejas Dharamsi, and Aman Gupta

    Introduction

    In this episode, we explore the challenge of efficiently deploying large language models (LLMs) in online settings, where strict latency constraints—such as time-to-first-token (TTFT) and time-per-output-token (TPOT)—must be met. As demand for AI-generated content grows, optimizing inference performance becomes a critical bottleneck.

    Key Topics Covered

    • The Challenge of Query Scheduling: Existing scheduling strategies like First-Come-First-Serve (FCFS) and Longest-Prefix-Match (LPM) struggle to balance efficiency and latency.
    • Prefix Reuse with RadixAttention: A technique that stores and reuses shared prefixes across queries using a radix tree structure, reducing computational overhead.
    • The NP-Hard Nature of Scheduling: The paper establishes that optimizing scheduling under TTFT constraints is computationally challenging.
    • Introducing k-LPM: A novel scheduling algorithm that balances prefix reuse and fairness, outperforming existing methods in reducing TTFT.
    • Empirical Validation: Real-world evaluations show that k-LPM significantly reduces P99 TTFT, making it a promising solution for large-scale LLM inference.

    Conclusion

    This research highlights the need for advanced scheduling strategies to improve LLM efficiency in real-world applications. Tune in to learn how k-LPM is pushing the boundaries of AI inference optimization!

    Mehr anzeigen Weniger anzeigen
    13 Min.
  • Mutation-Guided LLM-based Test Generation at Meta
    Feb 10 2025

    In this episode, we explore Meta's ACH system, a novel mutation-guided test generation approach that leverages LLMs (Large Language Models) to enhance software robustness. Unlike traditional mutation testing, which generates numerous random faults, ACH focuses on identifying undetected faults related to specific concerns, such as privacy vulnerabilities.

    🔍 Key Highlights:

    • Targeted Mutant Generation: Instead of mass-producing mutants, ACH intelligently identifies meaningful faultsthat could otherwise go unnoticed.
    • LLM-Driven Test Generation: ACH automates the creation of test cases to detect and eliminate these faults, effectively hardening software against regressions.
    • Real-World Deployment: Applied to 10,795 Android Kotlin classes across 7 Meta platforms, ACH generated 9,095 mutants and 571 privacy-focused test cases.
    • Equivalent Mutant Detection: ACH integrates an LLM-based detection agent achieving up to 0.95 precision and 0.96 recall with preprocessing.
    • Industry Validation: Used in Messenger & WhatsApp test-a-thons, engineers accepted 73% of ACH-generated tests, with 36% deemed privacy-relevant.

    🔎 Why It Matters: ACH represents a paradigm shift in mutation testing, using AI to pinpoint real-world vulnerabilities instead of generating irrelevant noise. This approach not only improves software reliability but also streamlines engineering workflows by focusing on actionable test cases.

    🔗 Reference Paper: 📄 Meta’s ACH System for Mutation-Guided LLM-Based Test Generation – Read here

    📢 Tune in as we break down how ACH is redefining software testing, enhancing privacy safeguards, and paving the way for AI-driven quality assurance! 🚀

    Mehr anzeigen Weniger anzeigen
    8 Min.
  • 360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation
    Feb 3 2025

    Ranking and recommendation systems are the foundation for numerous online experiences, ranging from search results to personalized content delivery. These systems have evolved into complex, multilayered architectures that leverage vast datasets and often incorporate thousands of predictive models. The maintenance and enhancement of these models is a labor intensive process that requires extensive feature engineering. This approach not only exacerbates technical debt but also hampers innovation in extending these systems to emerging problem domains. In this report, we present our research to address these challenges by utilizing a large foundation model with a textual interface for ranking and recommendation tasks. We illustrate several key advantages of our approach: (1) a single model can manage multiple predictive tasks involved in ranking and recommendation, (2) decoder models with textual interface due to their comprehension of reasoning capabilities, can generalize to new recommendation surfaces and out-of-domain problems, and (3) by employing natural language interfaces for task definitions and verbalizing member behaviors and their social connections, we eliminate the need for feature engineering and the maintenance of complex directed acyclic graphs of model dependencies. We introduce our research pre-production model, 360Brew V1.0, a 150B parameter, decoder-only model that has been trained and fine-tuned on LinkedIn's data and tasks. This model is capable of solving over 30 predictive tasks across various segments of the LinkedIn platform, achieving performance levels comparable to or exceeding those of current production systems based on offline metrics, without task-specific fine-tuning. Notably, each of these tasks is conventionally addressed by dedicated models that have been developed and maintained over multiple years by teams of a similar or larger size than our own.

    Mehr anzeigen Weniger anzeigen
    20 Min.