◆ Industry Talk 1
Title: The Progress and Next Steps of Multimodal Foundation Model
Speaker: Nan Duan, VP of JD.COM and the Head of Vision and Multimodal Lab at JD Explore Academy
Short Bio: Dr. Nan Duan is the VP of JD.COM and the Head of Vision and Multimodal Lab at JD Explore Academy, where he leads a research team focused on vision and multimodal foundation models. Prior to this, he served as the Technical Fellow at StepFun and as the Senior Principal Researcher and Research Manager in the Natural Language Computing Group at Microsoft Research Asia. Dr. Duan‘s research spans natural language processing, code intelligence, multimodal foundation models, and AI agents. He has authored over 200 research papers in top-tier conferences and journals, accumulating more than 28,000 citations (h-index 76) and holds over 20 patents. He is an adjunct professor and Ph.D. supervisor at the University of Science and Technology of China, Xi’an Jiaotong University, and Tianjin University.
Abstract: This talk will begin with a systematic review of the development of multimodal foundation models, covering visual representation models, multimodal understanding models, multimodal generation models, and multimodal understanding-generation unified models. It will highlight the unique characteristics of each model and the challenges they face. Building on these challenges, the talk will then explore the future direction of multimodal foundation models, with a particular emphasis on two key areas: the video world model and multimodal understanding-generation unified models.
◆ Industry Talk 2
Title: Introduction to Ling LLM's model architecture and intelligent evolution
Speaker: Lei Liang, Technical Director, NextEvo Department, Ant Group
Short Bio:
Abstract: This talk will first introduce the training process of the Ling large language model and key technical explorations, including heterogeneous hardware-aware training strategies under a cross-platform Mixture-of-Experts (MoE) architecture, hierarchical corpus pretraining strategies, and efficient attention mechanisms. Next, it will present the technical architecture of Ant Group's omni multimodal LLM . Finally, it will introduce Ant Group's open-source strategy driven by innovation to advance intelligent evolution.
◆ Industry Talk 3
Title: Innovative Practices of Lenovo Lexiang Super Intelligent Agent
Speaker: Peng Wang, the Head of the Large Reasoning Models (LRM) Team and a Principal Researcher at Lenovo's Artificial Intelligence Lab
Short Bio: Dr. Peng Wang is the Head of the Large Reasoning Models (LRM) Team and a Principal Researcher at Lenovo's Artificial Intelligence Lab. He leads the team in the research and development of advanced AI algorithms and systems, focusing on large reasoning models, multimodal models, and next-generation intelligent agents. His work supports Lenovo's Super Agent initiative, on-device AI models and agent products for AIPC, as well as enterprise-grade AI agent solutions.
Abstract: With the rapid advancement of generative AI and multi-agent collaboration technologies, Lenovo has upgraded its "Lenovo Lexiang Super Intelligent Agent" from an "Enterprise Digital Twin" concept into a unified intelligent interface connecting customers, ecosystem partners, and employees. This presentation will showcase Lenovo's latest achievements through advanced product applications, strategic technology roadmaps, and forward-looking research insights. On the product side, Lenovo's super intelligent agent has been successfully deployed across critical business scenarios, such as advanced consulting, digital human livestreaming, proactive customer service, device lifecycle management, self-service ordering, and real-time interactive experiences. These capabilities collectively deliver a unified user interface, consistent intent recognition, and an integrated AI platform experience. From a technology perspective, Lenovo has developed a robust foundational architecture, comprising the "Optimus 4.0" enterprise knowledge-data platform, the Agent Intelligent Platform, and the Multi-modal Model Platform. This supports essential capabilities like complex intent understanding, hybrid memory systems, and multi-agent coordination. Lenovo is actively investing in the Large Reasoning Models (LRM) technology stack, focusing on deep reasoning, multi-modality fusion, inference optimization, and adaptive thinking techniques to continuously strengthen its technical leadership.