Time | Topic | Speaker |
16:00-16:25 | AI Foundation Models at Huawei Noah’s Ark Lab | Linlin Li |
16:25-16:50 | Qwen: Open Foundation, Human-Aligned and Specialist Models | Junyang Lin |
16:50-17:15 | JittorLLM: An efficient large model training and inference framework based on Jittor | Dun Liang |
17:15-17.40 | Enhancing Large Language Models with Legal Knowledge | Yansong Feng |
17.40-18.05 | Machine Translation and Large Language Models: Universal Translation and AGI | Deyi Xiong |
Time: 16:00-16:25
Topic: AI Foundation Models at Huawei Noah’s Ark Lab
Abstract : The latest progress on LLMs has drawn enormous research interest in large language modelling among the artificial intelligence community. In this talk, we introduce our latest progress on LLM-based foundation models including Pangu-alpha, Pangu-sigma, and other multimodal based approaches. Meanwhile, we also introduce how we integrate knowledge via information retrieval and how we extend the capability of the raw LLMs by integrating various tools. The talk is started by an introduction of the foundation model Pangu-alpha, followed by Pangu-sigma which is a MoE version of the LLM that aims at efficient compute. Then, we introduce our work on how to leverage web/domain search components and various tools/plugins into our model to solve complex tasks. In the last part, we give a brief introduction of multimodal approaches which process not only text-based info but also image- and audio-based content.
Short Bio: Linlin Li is a senior research scientist at Huawei Noah’s Ark Lab. Her major research interest is in the field of Natural Language Processing. Linlin Li received her Ph.D degree in Computer Science from Saarland University, Germany, with a dissertation titled Computational Modelling of Lexical Ambiguity. She joint Microsoft soon after graduation and worked in Europe for several years before moving back to China and joining Alibaba in Hangzhou. She joint Huawei in 2019 and has been working on NLP related projects in Huawei ever since. She has published over 30 papers in top NLP conferences and journals such as ACL, EMNLP, NAACL and Computational Linguistics.
Time: 16:25-16:50
Topic: Qwen: Open Foundation, Human-Aligned and Specialist Models
Abstract : LLMs has remarkable potentials to use external tools and understand multimodal data like humans. Recently we have publicly released Qwen-7B and Qwen-14B, as well as their assistant models, Qwen-7b-Chat and Qwen-14B-Chat. These models demonstrate strong performance and significantly outperform the baselines on a series of benchmark. We reinforce the models on tool use and thus they have the potential to become powerful AI agents for downstream applications. Additionally, based on the Qwen models, we continue pretraining on code and math data, and produce the specialist models Code-Qwen and Math-Qwen, which pave a way towards training domain-specific experts. In this talk, I will give a brief introduction to our models and the techniques that make a difference in building a strong LLM.
Short Bio: Junyang Lin is a staff engineer in Alibaba Group. He graduated from Peking University. His research interests are on natural language processing and multimodal representation learning, with a focus on large-scale pretraining. He has published articles on NeurIPS, ICML, ACL, etc. Previously, he developed the extremely large-scale pretrained model M6, unified multimodal multitask model OFA, cross-modal representation model Chinese CLIP, etc. Recently, he is leading the development of the large language model, Qwen, and working on pretraining, alignment, multimodal integration and AI agent.
Time: 16:50-17:15
Topic: JittorLLM: An efficient large model training and inference framework based on Jittor
Abstract : Jittor Large Model Engine (JittorLLM) supports model training, fine-tuning and inference, making it easy for everyone to train large models without worries. This talk will introduce the latest technical advances of the domestic deep learning framework Jittor in large model inference and training. Jittor is a deep learning framework built entirely on just-in-time compilation with innovative meta operators and a unified computation graph. Meta operators are as easy to use as Numpy but go beyond Numpy in enabling more complex and efficient operations. The unified computation graph combines the many advantages of static and dynamic graphs, providing high performance optimization while remaining easy to use. Deep learning models built with meta operators can be optimized in real-time by Jittor and run on specified heterogeneous computing devices like CPU, GPU and TPU.
Short Bio: Liang Dun is a postdoctoral researcher at the Department of Computer Science and Technology, Tsinghua University. He received his PhD from the Computer Graphics Lab at the same department. He is a core member of the Jittor framework. In recent years, he has published papers in major conferences and journals such as Science China Information Sciences, CVPR and TOG. He was also the champion of the International Student Supercomputer Competition.
Time: 17:15-17.40
Topic: Enhancing Large Language Models with Legal Knowledge
Abstract : Large Language Models (LLMs) have exhibited remarkable performance across various natural language understanding tasks. Nevertheless, when deployed to specific domains such as law or medicine, LLMs still face the challenge of a deficiency in domain-specific knowledge and an inadequate capability to leverage that knowledge. In this talk, I will introduce a new framework to adapt LLMs to specific domains and, specifically, how we build the Lawyer LLaMA, a legal domain LLM, based on this framework.
Short Bio: Yansong Feng is an associate professor in the Wangxuan Institute of Computer Technology at Peking University. Before that, he obtained his PhD from ICCS (now ILCC) at the University of Edinburgh. His current research interests include using neural methods to distill knowledge from large volumes of natural language texts, and supporting intelligent human-computer interfaces. He has served as Senior Action Editor or Area Chair for ACL ARR and *ACL conferences. Yansong received the IBM Faculty Award in 2014 and 2015, and the IBM Global Shared University Research Award in 2016.
Time: 17.40-18.05
Topic: Machine Translation and Large Language Models: Universal Translation and AGI
Abstract : There has been a remarkable increase in the scale of models in both machine translation and language modeling in recent years. In this talk, I will introduce our recent works of scaling MT from large language models, including studies on knowledge transfer from pretrained LMs or MT models to NMT via modularization, structured MoE for very large NMT, sparsifying multilingual language models for MT and efficient in-context learning for LLM-based MT. To conclude this talk, I'll discuss limitations of LLMs and their relations with universal translation as well as AGI.
Short Bio: Deyi Xiong is a Professor of Computer Science at Tianjin University (TJU), Director of both the Natural Language Processing Laboratory at the College of Intelligence and Computing, TJU and the International Joint Research Center of Language Intelligence and Technology at TJU. Prior to joining TJU, he was a professor at Soochow University and a research scientist at the Institute for Infocomm Research, Singapore. His research focuses on natural language processing, specifically machine translation, dialogue, large language models and commonsense reasoning. He has published over 150 papers in prestigious journals and conferences, including Computational Linguistics, IEEE TPAMI, IEEE TASLP, Artificial Intelligence, AAAI, IJCAI ACL, and EMNLP. He is the first author of the book "Linguistically Motivated Statistical Machine Translation: Models and Algorithms" published by Springer and the Chinese book "Neural Machine Translation: Foundations, Principles, Practices and Frontiers". He was the program co-chair of IALP 2021 and CWMT 2017, co-sponsorship chair of AACL 2022 and EMNLP 2023. He has also served as an (senior) area chair of conferences including ACL, EMNLP, NAACL and COLING. He was the founder and co-organizer of multiple ACL/EMNLP/NAACL-affiliated workshops such as S2MT 2015, SedMT 2016 and DiscoMT 2019. He is a member of the standing committee of reviewers of CL, action editor of both TACL and ARR, and associate editor of ACM TALLIP and DiB.