NLPCC 2021 Baidu Workshop

Since the pre-trained language model BERT had been proposed in 2018, pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Recently some researchers proposed the concept of the "Foundation Model" based on this efficient paradigm of "pre-training + fine-tuning", believing that the pre-trained models will profoundly change all aspects of human society.

Grasping the opportunity of rapid development of the pre-trained model, this workshop invites distinguished scholars and experts to give keynote speeches and share their cutting-edge progresses and practices in pre-trained models. The latest advances of pre-trained models in simple terms will be presented. The speakers will also show practical examples of how to apply the pre-trained model in real applications to the audience. Finally, future direction and work in the area of pre-trained model will be deeply discussed as well.

Schedule

Oct. 15	Speaker	Title	Topics
14:00-14:10	Yu Sun	Distinguished Architect of Baidu, Leader of Baidu ERNIE platform	Opening Speech
14:10-14:40	Pengfei Liu	Postdoc at the Language Technologies Institute of Carnegie Mellon University	The Fourth Paradigm of Modern Natural Language Processing Techniques
14:40-14:45	Q&A
14:45-15:15	Fuli Luo	Core member of AliceMind, Machine Intelligence laboratory of Alibaba Damo Academy	AliceMind: Alibaba's Collection of Encoder-decoders from Machine Intelligence of Damo Lab
15:15-15:20	Q&A
15:20-15:50	Shuohuan Wang	Staff Software Engineer of Baidu NLP department	The Algorithm, Platform and Application of ERNIE
15:50-15:55	Q&A
15:55-16:25	Lingxiang Wu	Postdoc at Institute of Automation, Chinese Academy of Sciences	Exploration and practice of multimodal pretrained models
16:25-16:30	Q&A
16:30-17:00	Adjourn/Networking

Workshop Host

Bio: Yu Sun is a Distinguished Architect of Baidu and the leader of Baidu ERNIE platform. His main research topics cover natural language understanding, dialogue system, deep learning and information retrieval. He leads the research and development of Baidu language understanding technology and platform Wenxin (ERNIE)— key technology of Baidu search engine and other global leading technologies. Yu’s works are widely used in search engine, smart speaker, map and other products, which significantly improved the user experience effecting hundreds of millions of users. He has won more than 10 worldwide champions including SemEval, the world's largest semantic evaluation, and published dozens of papers at ACL, NAACL, AAAI, and obtained more than 70 patents. He has also pocketed SAIL (the highest award) of the World Artificial Intelligence Conference (WAIC), excellent scientific and technological achievement award of Chinese Association for Artificial Intelligence (CAAI) as well as First Prize of scientific and technological progress award of Chinese Institute of Electronics (CIE).

Talk 1

Time: 14:10-14:40, 15th October, 2021

Title: The Fourth Paradigm of Modern Natural Language Processing Techniques

Abstract : In this talk, Pengfei will summarize and analyze four paradigms in the development of statistical natural language processing techniques, and argue that prompt-based learning is a promising new paradigm that may represent another major change in the way we look at NLP. Then he will try to organize the current state of knowledge in this rapidly developing field by providing an overview and formal definition of prompting methods and highlight some potential challenges to motivate future works.

Speaker: Pengfei Liu is a postdoc at the Language Technologies Institute of Carnegie Mellon University and serves as a co-lecturer in the CMU Natural Language Processing course. His research topics currently focus on information extraction, text generation, and NLP system evaluation. He serves as area chairs of NAACL, EMNLP, NeurIPS, ACL Rolling Review, etc., and wins the best demo paper award in ACL 2021. He also won a CAAI outstanding doctoral dissertation, Baidu scholarship, and so on.

Talk 2

Time: 14:45-15:15, 15th October, 2021

Title: AliceMind: Alibaba's Collection of Encoder-decoders from Machine Intelligence of Damo Lab

Abstract : AliceMind, aka. ALIbaba's Collection of Encoder-decoders from MinD (Machine intelligence of Damo) is a leading and systematic framework for pre-trained language models. AliceMind includes language understanding model StructBERT (ICLR 2020), generative language model PALM (EMNLP 2020), cross-lingual language model VECO (ACL 2021), cross-modal language model StructVBERT (CVPR 2021 VQA Challenge Champion), structural language model StructuralLM (ACL 2021). Recently, AliceMind released PLUG, a large-scale Chinese language understanding and generation model with 21 billion parameters, and opened an API for testing. The AliceMind team has published more than 20 papers. At present, the pre-trained models within the AliceMind are leading the leaderboard of major international benchmarks, including GLUE, CLUE, XTREME, VQA Challenge, DocVQA, MS MARCO, etc. AliceMind currently has the core technical competitiveness, being one of the large-scale pre-trained language model frameworks that cover technology, platform, application, ecosystem, and so on.

Speaker: Fuli Luo, master of Peking University. Her research focuses on natural language understanding and generation, including language model pre-training, text style transfer, sentiment controllable text generation, and word sense disambiguation. Currently, she works on language model pre-training, model compression, fine-tuning. She is now working at Alibaba and is the core member of AliceMind, a leading framework for pre-trained language models created by the Machine Intelligence laboratory of Alibaba Damo Academy. She has published more than 20 research papers, and most of them are published in top-tier conferences and journals. She has served as the Program Committee Member of various international conferences including ACL, EMNLP, AAAI, IJCAI, etc.

Talk 3

Time: 15:20-16:10, 15th October, 2021

Title: The Algorithm, Platform and Application of ERNIE

Abstract : This talk is focusing on the algorithm, platform and application of ERNIE, which is the pre-training technology of Baidu NLP. ERNIE was made a series of breakthroughs in various fields such as language understanding, language generation, cross-lingual and cross-modal understanding. ERNIE also played an important role in a series of products such as Baidu search, Baidu App and Xiaodu. In July 2021, ERNIE 3.0, large-scale knowledge enhanced pre-training for language understanding and generation, has achieved SoTA Results in 54 Chinese NLP tasks, crowned 1st place on SuperGLUE leaderboard.

Speaker: Shuohuan Wang is a Staff Software Engineer of Baidu NLP department. He received Master degree from Peking University in 2015 and Bachelor degree from Beihang University in 2012. His primary work is on pre-training in NLP and the work has achieved SoTA results in several global benchmarks such as GLUE、SuperGLUE、XTREME and SemEval. The related papers have been published on AAAI, ACL, EMNLP etc. He has more than 70 patents filed or granted.

Talk 4

Time: 16:15-16:45, 15th October, 2021

Title: Exploration and practice of multimodal pretrained models

Abstract : Pre-training model has attracted intensive attention since Google proposed BERT in 2018. With “big data + large modal”, the unsupervised training paradigm has demonstrated impressive performance on several downstream tasks. Zidongtaichu, proposed by Institute of Automation, Chinese Academy of Sciences, consists of three single-modal base models and a series of cross-modal encoders/decoders. In the single modal, object-level training and efficient transformer structure are explored in vision tasks. In the vision-language pretraining, knowledge is embedded to achieve better semantic understanding and generalization capability. In the three-modal works, the first pre-trained model that connects three modalities of text, vision, and audio in a unified framework is proposed. By token-, modality-, and sample- level pretext tasks, it can effectively adapt to a series of cross-modal understanding and generation tasks given single-, two-, or three-modal inputs. Zidongtaichu can support practical applications, including a cross-modal chatbot, generalized representation, image-to-audio generation, image generation etc. Intensive computing power enables large models and incubates various applications. Multi-modal pretraining is a keypoint towards the more general artificial intelligence, and it is expected to become a platform-level technology in the future. In this talk, I’ll introduce the Zidongtaichu in details and provide an expectation of multi-modal pretraining in the future development.

Speaker: Lingxiang is a Postdoc in Institute of Automation, Chinese Academy of Sciences, supervised by Jinqiao Wang. Receiving her Ph.D. from University of Technology, Sydney, she primarily works on vision language pretraining in multi-modal scenarios. Her works have been published on TMM, TCSVT, PR, etc. Before joining the Institute of Automation, Chinese Academy of Sciences, she worked as a data scientist for Woolworths Group Limited, and a senior RD in Baidu subsequently.