◆ Ehud Reiter
Title: High Quality Human Evaluation of Generated Texts by LLMs
Time: November 2nd 09:15 – 10:15AM
Abstract: Evaluating the quality of texts generated by modern large language models is difficult. I will discuss some general evaluation challenges and then focus specifically on the role of human evaluations. Human evaluations are the best way to evaluate more subtle aspects of LLMs, such as task appropriateness and real-world impact, but only if they are done rigorously. I will summarise our work in replicating and identifying weaknesses in existing human evaluations and in designing improved evaluation protocols. Ill conclude with advice about conducting high-quality human evaluations.
Bio: Ehud Reiter is a Professor of Computing Science at the University of Aberdeen, and was formerly Chief Scientist of Arria NLG (a spinout he cofounded). He has been working on Natural Language Generation for 35 years, and in recent years has focused on evaluation of language generation; he also has a longstanding interest in healthcare applications. He is one of the most cited and respected researchers in NLG, and his awards include an INLG Test of Time award for his work on data-to-text. He writes a widely read blog on NLG and evaluation (ehudreiter.com)
◆ Minlie Huang
Title: The super alignment of of large language models
Time: November 2nd 10:45 – 11:45AM
Abstract: With the rapid development of large language models such as ChatGPT and GPT-4, artificial general intelligence is getting closer and closer. But “the more intelligent AI is, the more dangerous it is.”, it becomes more and more crucial for us to make sure LLMs are really aligned and safe for our societies. In this talk, the speaker will talk about the super alignment technologies to the super intelligence, including the alignment algorithms, training-free model extrapolation, and a conceptual framework for auto refinement. He will talk about weak-to-strong generalization, scalable oversight, auto red-teaming, exact optimization algorithm for alignment, and so on.
Bio: Dr. Minlie Huang, professor of Tsinghua University, deputy director of the Intelligent Technology and Systems Laboratory, deputy director of the Foundation Model Center of Tsinghua University. He was supported by National Distinguished Young Scholar project and NSFC key project. His research fields include large-scale language models, dialogue systems, and language generation. He authored a Chinese book "Modern Natural Language Generation". He published more than 150 papers in premier conferences and journals, with more than 20,000 Google Scholar citations, and is selected as Elsevier China's Highly Cited Scholars since 2022 and the AI 2000 list of the world's most influential AI scholars since 2020; He has won several best papers or nominations at major international conferences (IJCAI, ACL, SIGDIAL, NLPCC, etc.). He led the development of several pretrained models including CDial-GPT, EVA, OPD, CPM, CharacterGLM. He serves as associate editors for TNNLS, TACL, CL, and TBD, and has served as the senior area chair of ACL/EMNLP/IJCAI/AAAI for more than 10 times. His homepage is located at http://coai.cs.tsinghua.edu.cn/hml/.
◆ Kam-Fai Wong
Title: The Forgettability, Reliability, Adaptability, Multiplicity, Explainability of Large Models
Time: November 3rd 09:00 – 10:00AM
Abstract: In recent years, large models like ChatGPT and GPT-4 have driven significant advancements in the field of artificial intelligence, revolutionizing various research domains. However, these models also present challenges such as privacy leakage, black-box nature, and poor reliability. This talk addresses these issues by introducing methods developed by our research team, focusing on enhancing the forgettability, reliability, adaptability, multiplicity, and explainability of large models.
Bio: Kam-fai Wong is the Associate Dean (External Affairs) of the Faculty of Engineering, Professor in the Department of Systems Engineering and Engineering Management, Director of Centre for Innovation and Technology, The Chinese University of Hong Kong. Prof. Wong’s research interest primarily revolves Chinese computing, database and information retrieval. He’s an ACL Fellow and is very active in professional and public service. He serves as Member of the 13th & 14th National Committee of the CPPCC, Member of the 7th Legislative Council of the HKSAR, Advisor of Our Hong Kong Foundation, Vice-Chairman of Hong Kong Professionals and Senior Executives Association, Vice Chairman & Secretary General of Hong Kong Alliance of Technology and Innovation, Director of Finance Dispute Resolution Centre, Executive Member of Council for the Promotion of Guangdong-Hong Kong-Macao Cooperation, Member of the Standing Committee of Shenzhen Association for Science and Technology, and Advisor of Guangzhou Association for Science and Technology, etc. Prof. Wong was awarded the Medal of Honour (MH) by the HKSAR Government in 2011 for his contributions to IT development in Hong Kong.
◆ Fei Liu
Title: LLM Numerical Reasoning and Decision-Making: New Benchmarks and Frameworks
Time: November 3rd 10:30 – 11:30AM
Abstract: Large language models hold significant potential for integrating various data types, such as text documents and database records, for advanced analytics. However, blending text and numerical data presents considerable challenges for LLMs in complex reasoning and decision-making tasks. This talk is divided into two main parts. The first part addresses numerical reasoning in sports analytics. I will introduce a new benchmark, SportsMetrics, for evaluating the numerical reasoning and information fusion capabilities of LLMs. It involves providing LLMs with detailed, play-by-play sports game descriptions, then challenging them with adversarial scenarios such as new game rules, longer durations, scrambled narratives, and analyzing key statistics in game summaries. The second part of the talk delves into LLM decision-making. LLMs are ideal for decision-making due to their ability to reason over long contexts and identify critical factors. However, challenges arise when processing transcripts of spoken speech describing complex scenarios. These transcripts often contain ungrammatical or incomplete sentences, repetitions, hedging, and vagueness. It is crucial for LLMs to incorporate this uncertainty systematically when making decisions. We introduce DeFine, a new framework that constructs probabilistic factor profiles from complex scenarios. DeFine then integrates these profiles with analogical reasoning, leveraging insights from similar past experiences to guide LLMs in making critical decisions in novel situations. To conclude, I will summarize our key findings and outline promising future directions.
Bio: Dr. Fei Liu is an Associate Professor in the Computer Science Department at Emory University. Her areas of expertise include natural language processing, deep learning, large language models, and artificial intelligence. Dr. Liu is committed to advancing the state of the art in natural language understanding and generation by developing innovative model architectures, training methodologies, and robust evaluation metrics. With an excess of information available from various sources, Dr. Liu's research plays a vital role in devising efficient techniques to process and make sense of this vast amount of data. Dr. Liu held a postdoctoral fellowship at Carnegie Mellon University and was a member of Noah's ARK. She also worked as a senior scientist at Bosch Research in Palo Alto, California. Bosch is one of the largest German companies and a leading provider of intelligent car systems and home appliances. Liu received her Ph.D. in computer science from the University of Texas at Dallas, supported by the Erik Jonsson Distinguished Research Fellowship, and holds bachelor's and master's degrees in computer science from Fudan University. Dr. Liu has published over 80 peer-reviewed papers in leading conferences and journals and she regularly serves on the program committees of major international conferences. In 2015, she was selected for the "MIT Rising Stars in EECS" program. Her research has been recognized with several awards, including a Best Paper Award Finalist at WWW 2016, an Area Chair Favorite Paper at COLING 2018, an Amazon AWS Machine Learning Research award in 2020, and NSF's CAREER award in 2022.