LIC 2020 invites you to participate in the competition which will be held in March 2020 as an event of Language & Intelligence Summit, initiated by Chinese Information Processing Society of China (CIPS) and China Computer Federation (CCF), which has been held four times since 2016 for researchers and experts from both academia and industry to discuss technologies and developments in the field of Language and Intelligence.
Jointly organized by Baidu Inc., Technical Committee on Chinese Information Technology of CCF (CCF TCCI) and Committee on Evaluation of CIPS (CIPS CE), LIC 2020 includes five tasks, each provides a large-scale Chinese dataset. The winning teams will share a total prize pool of 350,000 RMB. The competition forum and award ceremony will be held at the Fifth Language & Intelligence Summit.
LIC 2020 features the following 5 tasks:
No | Task | Description |
1 | Machine Reading Comprehension | Answering questions about given texts |
2 | Conversational Recommendation | Leading a conversation into recommending an item |
3 | Semantic Parsing | Converting natural language questions to programming language (e.g., SQL) queries |
4 | Relation Extraction | Extracting knowledge such as entities, attributes and relations from natural language texts. |
5 | Event Extraction | Extracting events from plain text, identifying event types and arguments |
◇ Task1 - Machine Reading Comprehension:
With the development of NLP technology, the recent MRC models achieved decent results on some datasets. However, their poor robustness limits their applications in real world. The task this year focuses on MRC models’ robustness. A dataset named DuReaderrobust is specially designed to evaluate models’ robustness from multiple perspectives including sensitivity, stability and generalization ability.
◇ Task2 - Conversational Recommendation
It is important and challenging to enable machines to have free conversations with humans. Towards this end, we introduce a new task of conversational recommendation. In this task, the machine proactively and naturally leads a conversation from non-recommendation conversations (e.g. Q&A) to a recommendation based on the user’s interests until the user accepts its recommendation. The task provides a dataset containing dialogues of various types, domain knowledge used in the dialogue, and the profiles of users.
◇ Task3 - Semantic Parsing:
This task is to convert natural language questions to programming language (e.g., SQL) queries which can be efficiently executed against a database. Most of the existing Chinese Text-to-SQL datasets contain only single-table databases and simple-pattern questions. The DuSQL dataset, which will be firstly released in this competition, includes 200 databases of 164 domains, covering various types of questions commonly occurring in applications such as matching, calculation, and reasoning. Each question is associated with one or more tables in a database. Model’s calculation and reasoning ability will be well evaluated in this task.
◇ Task4 - Relation Extraction:
This task is to extract knowledge such as entities, attributes and relations from natural language texts. This year’s task is an upgrade of last year’s, more challenging and practical in that it includes complex relations on top of simple SPO relations and introduces colloquial texts from Baidu Tieba.
◇ Task5 - Event Extraction:
This task is to extract events from plain text and identify event types and arguments. We will release the largest Chinese event extraction dataset containing 17k event-describing sentences of 65 event types.
This competition will award one First Prize, one Second Prize, one Third Prize and two Honorable Mentions for each task. The winners will get the award certificates issued by CIPS & CCF. The prizes and travel grants for attending the competition forum and award ceremony will be sponsored by Baidu.
◇ First prize: 30,000 RMB + award certificate
◇ Second prize: 20,000 RMB + award certificate
◇ Third Prize: 10,000 RMB + award certificate
◇ Honorable Mention: 5,000 RMB + award certificate
◇ Mar 10, 2020: Open Registration
◇ Mar 31, 2020: Dataset Ready for Download; Open Leaderboards
◇ May 12,2020: Registration Deadline
◇ May 13, 2020: Release of the Final Datasets
◇ May 20, 2020: Results Submission Deadline
◇ May 31, 2020: Winners Notification
◇ Jun 30, 2020: Camera-ready Submission Deadline
◇ Jul, 2020: Competition Forum and Award Ceremony on Language & Intelligence Summit
Official registration: The official registration opens on Mar 10, 2020 and will close on May 12, 2020. All registered participants who submit valid results will receive customized T-shirts.
Website Link: http://lic2020.cipsc.org.cn
欢迎参加2020语言与智能技术竞赛!
中国中文信息学会 和中国计算机学会2016-2019年成功举办了四届“语言与智能高峰论坛”,邀请了国内外学术界和工业界的知名专家学者,共同探讨语言与智能领域的新发展和新技术。语言与智能技术竞赛是该论坛下,由百度公司、中国中文信息学会评测工作委员会和中国计算机学会中文信息技术专委会联合承办的,覆盖语言理解、人机对话、知识抽取等复杂技术的评测比赛。这些任务的研究对于智能搜索、智能推荐、智能交互等人工智能应用具有重要意义,是自然语言处理和人工智能领域的重要前沿课题,极具挑战性。
2020年语言与智能竞赛将设立来源于真实应用需求的五大任务,提供大规模中文数据集,为研究者提供学术交流平台,进一步推动语言理解和人工智能领域技术研究和应用的发展。获胜队伍将分享总额35万的奖金,并将在第五届“语言与智能高峰论坛”举办技术交流和颁奖。
竞赛共设置5项任务。在2019年竞赛的基础上进一步提升了阅读理解、对话、关系抽取任务的挑战性,并新增了语义解析、事件抽取两大热门任务。
序号 | 任务名称 | 任务简介 |
1 | 机器阅读理解 | 阅读给定文本后回答和阅读内容相关的问题 |
2 | 面向推荐的对话 | 主动且自然地将对话从非推荐对话引导到推荐对话 |
3 | 语义解析 | 将用户输入的问题转成可与数据库操作的编程语言 |
4 | 关系抽取 | 从文本中抽取实体、属性、关系等知识信息 |
5 | 事件抽取 | 从文本中抽取事件并识别事件类型和对应论元信息 |
◇ 任务1 - 机器阅读理解:
随着NLP相关技术的进步,当前的阅读理解模型已经能够在一些测试集上取得较好的效果。然而在实际应用中,表现出的鲁棒性仍然较差,这大大限制了当前技术在实际场景中的落地。因此,本次竞赛中的机器阅读理解任务,将重点关注阅读理解模型在真实应用场景中的鲁棒性。为此构建了DuReaderrobust数据集,用于考察模型在多个维度上的鲁棒性,包括模型的过敏感性、过稳定性以及泛化能力。数据集中的样本均来自于实际的应用场景,难度大,考察点丰富,覆盖了真实应用中诸多难以解决的问题。
◇ 任务2 - 面向推荐的对话:
真实世界的人机交互会同时涉及到多种类型的对话,如何自然的融合多类型对话是一个重要的挑战。为了应对这个挑战,本次竞赛提出了一个新的任务—多类型对话中的面向推荐的对话,期望机器能够主动且自然地将对话从非推荐对话(比如“问答”)引导到推荐对话,然后基于收集到的用户兴趣及用户实时反馈通过多次交互完成最终的推荐目标。任务还将提供对话类型多样、领域多样、融合用户profile信息的对话逻辑数据集,贴近真实的应用场景。
◇ 任务3 - 语义解析:
该任务旨在使得机器能自动将用户输入的自然语言问题(Text)转化成数据库可操作的编程语言(如SQL),降低结构化数据使用的门槛和成本。当前的中文Text-to-SQL数据集的数据库基本都是由单表构成的,问题模式比较简单,仅覆盖部分实际应用中存在的问题。本次竞赛将首次发布DuSQL数据集,包含覆盖164个领域的200个数据库,覆盖了匹配、计算、推理等实际应用中常见的问题形式,每个问题关联一个数据库中一或多张表格。该数据集更贴近真实应用场景,对模型解决领域无关性、问题无关性、计算推理问题的能力提出了更高的挑战。
◇ 任务4 - 关系抽取:
关系抽取是从自然语言文本中抽取实体及其之间关系的信息技术。本次竞赛在2019年竞赛的信息抽取任务基础上进行了两处升级:一是在简单SPO关系的基础上增加了复杂关系类型,用以刻画现实世界中广泛存在的复杂关系;二是引入百度贴吧口语化表达语料,其文本语义自由度更高,更贴近日常口语表达习惯,使关系抽取评测任务具有更大的挑战性和实战应用价值。
◇ 任务5 - 事件抽取:
事件抽取是从自然语言文本中抽取事件并识别其事件类型和事件论元的信息技术。该任务是自然语言理解领域一项重要且富有挑战性的工作。本次竞赛我们将对外发布当前业界最大规模的中文事件抽取数据集,其中包含65个事件类型和1.7万个具有事件信息的句子。希望通过此次竞赛及开放的大规模中文数据集,助力事件抽取技术的进一步发展。
竞赛的每个任务都将分别评出一等奖1名,二等奖1名,三等奖1名,优胜奖2名,主办方中国中文信息学会(CIPS)和中国计算机学会(CCF)将为获奖者提供荣誉证书认证,百度公司将为获奖者提供奖金和参会交流赞助。
◇ 一等奖: 30,000 RMB +荣誉证书
◇ 二等奖: 20,000 RMB +荣誉证书
◇ 三等奖: 10,000 RMB +荣誉证书
◇ 优胜奖: 5,000 RMB +荣誉证书
◇ 2020/3/10:启动竞赛报名,发放样例数据
◇ 2020/3/31:对报名者发放全部训练数据和第一批测试数据,开放评测入口和在线排行榜
◇ 2020/5/12:报名截止
◇ 2020/5/13:发放最终测试数据
◇ 2020/5/20:系统结果提交截止
◇ 2020/5/30:公布竞赛结果,接收系统报告和论文
◇ 2020/6/30:论文提交截止日期
◇ 2020/7:在“语言与智能高峰论坛”上交流和颁奖
关于2020语言与智能竞赛的任务详情和报名方式,请见官网:http://lic2020.cipsc.org.cn
竞赛于2020年3月10日正式开启报名通道,在此,诚邀学术界和工业界的研究者和开发者参加本次竞赛!
注:报名并最终提交有效结果的队伍,成员均将获得1件大赛定制T恤(多任务不重复领取)。