LIC 2021 (Language and Intelligence Challenge) invites you to participate in the competition which will start in March 2021 as an event of Language & Intelligence Summit, initiated by China Computer Federation (CCF) and Chinese Information Processing Society of China (CIPS).The summit and competition have been held since 2016, providing opportunities for researchers and experts from both academia and industry to discuss the developments and the issues in the field of Language and Intelligence.
Jointly organized by Baidu Inc., Technical Committee on Natural Language Processing of CCF (CCF TCNLP) and Committee on Evaluation of CIPS (CIPS CE), LIC 2021 organizes three tasks, including machine reading comprehension, dialogue and information extraction, and provides large-scale Chinese datasets. Different from the previous competition tasks which only focus on a single data and a single evaluation metric, the competition this year is based on an open-source project of Chinese NLP benchmarks "LUGE" (https://www.luge.ai). Each task contains richer data set and multiple evaluation metrics. It is expected that the technologies can be comprehensively evaluated from the perspectives of multi-task, robustness and generalization, so as to promote the technology to better adapt to the industrial applications. The winning teams will share a total prize pool of 300,000 RMB. The competition forum and award ceremony will be held at the Sixth Language & Intelligence Summit.
LIC 2021 features the following 3 tasks:
No | Task | Description |
1 | Machine Reading Comprehension | Requires machine to read the text and then answer questions about the given text. It aims to challenge the ability of machine to understand language from multiple fine-grained metrics. |
2 | Multi-skill Dialog | Requires machine to effectively and naturally blend multiple dialog skills with a unified learning architecture, acting like human to converse with multiple dialog skills. |
3 | Multi-format Information Extraction | Requires machine to extract entities, relations, and events from plain texts, enabling machine to automatically extract structured knowledge from massive unstructured texts. |
◇ Task1 - Machine Reading Comprehension:
Most of the existing machine reading comprehension datasets mainly use a single metric to evaluate the overall performance of a model. The lack of fine-grained and multi-dimensional metrics makes it difficult to find and improve the specific defects of the model. In order to address this issue, we build a data set to systematically evaluate the model from multiple fine-grained aspects, including the understanding of vocabulary, phrases, semantic roles and logical reasoning. The samples in the dataset come from real applications, which are difficult and bring many challenges.
◇ Task2 - Multi-skill Dialog:
It is important and challenging to enable machines to conduct natural and engaging dialogs with humans. With this aim, we focus on a dialog modeling task with multiple skills, such as knowledge dialogs, recommendation dialogs, and persona dialogs. In this task, the machine should effectively and naturally blend these skills for better dialog engagingness, e.g., blending knowledge dialogs and persona dialogs into a single dialog session. The task provides a dataset containing dialogues with various skills, and various domains, and various profiles of users, which has potential applications in real-world scenarios.
◇ Task3 - Multi-format Information Extraction:
Information extraction (IE) aims to extract structured knowledge such as entities, relations, and events from unstructured natural language texts. Most current researches focus on extraction of information in a single format, while lacking a unified evaluation platform for IE in different formats. Therefore, the competition setup a multi-format IE task, which is designed to comprehensively evaluate IE from different dimensions. The competition will release the largest Chinese dataset in this task, covering extraction of structured knowledge in the form of entities, relations, and events, as well as natural language texts in both sentence-level and document-level.
This competition will award 1 First Prize, 2 Second Prizes and 3 Third Prizes for each task. The winners will get the award certificates issued by CCF&CIPS. The prizes and travel grants for attending the competition forum and award ceremony will be sponsored by Baidu.
◇ First prize: 30,000 RMB + award certificate
◇ Second prize: 20,000 RMB + award certificate
◇ Third Prize: 10,000 RMB + award certificate
◇ Mar 10, 2021: Open Registration
◇ Mar 31, 2021: Dataset Ready for Download; Open Leaderboards
◇ May 12,2021: Registration Deadline
◇ May 13, 2021: Release the Final Datasets
◇ May 20, 2021: Results Submission Deadline
◇ Jun 5, 2021: Winners Notification
◇ Jun 30, 2021: Camera-ready Submission Deadline
◇ Aug, 2021: Competition Forum and Award Ceremony on Language & Intelligence Summit
Official registration: The official registration opens on Mar 10, 2021 and will close on May 12, 2021. All registered participants who submit valid results will receive customized T-shirts.
Website Link: http://lic2021.ccf.org.cn
欢迎参加2021语言与智能技术竞赛!
中国计算机学会和中国中文信息学会 2016-2020年成功举办了五届“语言与智能高峰论坛”,邀请了国内外学术界和工业界的知名专家学者,共同探讨语言与智能领域的新发展和新技术。语言与智能技术竞赛是该论坛下,由百度公司、中国计算机学会自然语言处理专委会和中国中文信息学会评测工作委员会联合承办的评测比赛。历届竞赛组织了阅读理解、人机对话、语义解析、信息抽取等评测任务,覆盖自然语言处理和人工智能领域的重要前沿课题,极具挑战。竞赛对于推进相关技术研究及智能搜索、智能推荐、智能交互等人工智能应用发展具有重要意义。
2021年语言与智能竞赛将设立来源于真实应用需求的三大任务,提供大规模中文数据集。与以往竞赛任务仅关注单一数据、单一评价维度不同,本届竞赛基于“千言”数据集,每项任务设置了更加丰富的数据集合和评测维度,期望能够从多任务、鲁棒性和泛化性等角度对技术效果进行综合评价,从而推动技术更好地适应多领域、多场景的产业应用。竞赛希望为研究者提供学术交流平台,进一步推动语言理解和人工智能领域技术研究和应用的发展。为研究者提供学术交流平台,进一步推动语言理解和人工智能领域技术研究和应用的发展。获胜队伍将分享总额30万的奖金,并将在第六届“语言与智能高峰论坛”举办技术交流和颁奖。
竞赛共设置3项热门任务,兼具挑战性和实用性,具体介绍如下:
序号 | 任务名称 | 任务简介 |
1 | 机器阅读理解 | 让机器阅读文本然后回答相关的问题,从多个细粒度维度挑战机器理解语言的能力。 |
2 | 多技能对话 | 让机器较好地融合各个不同的对话技能,同时考察模型在跨技能场景下的通用性。 |
3 | 多形态信息抽取 | 让机器从自然语言文本中抽取实体、关系、事件等多形态知识,旨在使机器具备从海量非结构化文本信息中自动抽取结构化知识的能力。 |
◇ 任务1 - 机器阅读理解:
当前的机器阅读理解数据集大多都只采用单一的指标来评测模型的好坏,缺乏对模型语言理解能力的细粒度、多维度评测,导致模型的具体缺陷很难被发现和改进。为了解决这个问题,我们建立了细粒度的、多维度的评测数据集,从词汇理解、短语理解、语义角色理解、逻辑推理等多个维度检测模型的不足之处,从而推动阅读理解评测进入“精细化“时代。该数据集中的样本均来自于实际的应用场景,难度大,考察点丰富,覆盖了真实应用中诸多难以解决的问题。
◇ 任务2 - 多技能对话:
真实世界的人机交互会同时涉及到多种对话技能,例如在闲聊中会有画像聊天,因此如何自然的融合多技能对话是一个重要的挑战。为了应对这个挑战,本次竞赛集中于多技能对话这一任务,并且在往年数据集基础上丰富了技能种类,现在覆盖知识对话,推荐对话,画像聊天,闲聊等多种技能。该任务下的对话数据覆盖的对话技能多样、领域多样,很多对话交互场景来自真实的实际应用。
◇ 任务3 - 多形态信息抽取:
信息抽取旨在从非结构化自然语言文本中提取结构化知识,如实体、关系、事件等。目前,大多数研究工作仅关注单一类型信息的抽取效果,缺乏在不同类型信息抽取任务上的统一评价。因此,本次竞赛设立了多形态信息抽取任务,期望从不同维度对结构化知识抽取效果进行综合评价。竞赛将对外发布业界最大规模的中文多形态信息抽取数据集,囊括实体、关系、事件等不同形态的结构化知识抽取,同时涵盖句子和篇章两种粒度的自然语言文本。希望通过此次竞赛及开放的大规模中文数据集,助力信息抽取技术的进一步发展。
竞赛的每个任务都将分别评出一等奖1名,二等奖2名,三等奖3名。主办方中国计算机学会(CCF)和中国中文信息学会(CIPS)将为获奖者提供荣誉证书认证,百度公司将为获奖者提供奖金和参会交流赞助。
◇ 一等奖:30,000 RMB +荣誉证书
◇ 二等奖:20,000 RMB +荣誉证书
◇ 三等奖:10,000 RMB +荣誉证书
◇ 2021/3/10:启动竞赛报名,发放样例数据
◇ 2021/3/31:对报名者发放全部训练数据和第一批测试数据,开放评测入口和在线排行榜
◇ 2021/5/12:报名截止
◇ 2021/5/13:发放最终测试数据
◇ 2021/5/20:系统结果提交截止
◇ 2021/6/5:公布竞赛结果,接收系统报告和论文
◇ 2021/6/30:论文提交截止日期
◇ 2021/8:在“语言与智能高峰论坛”上交流和颁奖
关于2021语言与智能竞赛的任务详情和报名方式,请见官网:http://lic2021.ccf.org.cn
竞赛于2021年3月10日正式开启报名通道,在此,诚邀学术界和工业界的研究者和开发者参加本次竞赛!
注:报名并最终提交有效结果的队伍,成员均将获得1件大赛定制T恤(多任务不重复领取)。