LIC 2022 (Language and Intelligence Challenge) invites you to participate in the competition which will start in March 2022 as an event of Language & Intelligence Summit, initiated by Chinese Information Processing Society of China (CIPS) and China Computer Federation (CCF).The summit and competition have been held since 2016, providing opportunities for researchers and experts from both academia and industry to discuss the developments and the issues in the field of Language and Intelligence.
Jointly organized by Baidu Inc., CIPS Committee on Evaluation (CIPS CE) and CCF Technical Committee on Natural Language Processing (CCF TCNLP), LIC 2022 organizes four tasks designed around three cutting-edge topics: cross-modal understanding, knowledge-driven modelling and trustworthy learning. The winning teams will share a total prize pool of 200,000 RMB. The competition forum and award ceremony will be held at the Seventh Language & Intelligence Summit.
LIC 2022 features the following four tasks:
No | Task | Description |
1 | Passage Retrieval | Retrieving a part of document from a collection of documents that is most relevant to a search query |
2 | Knowledge Grounded Dialog Generation | Open domain dialog generation using external knowledge from a web search engine |
3 | Interpretable Sentiment Analysis | A task to test whether a sentiment analysis model is interpretable from the perspectives of plausibility and faithfulness |
4 | Video Semantic Understanding | Requires machine to analyze and understand the content of the video. On the basis of perceptual content analysis, it integrates multimodal information such as knowledge, language, vision, and voice, and combines knowledge calculation and reasoning to generate corresponding semantic tags for videos. |
◇ Task1 - Passage retrieval:
Finding relevant passages from a large collection is an important task in the fields of natural language processing and information retrieval. Traditional retrieval systems use term-based sparse representations (e.g. BM25) to find the candidate passages that match the query. They cannot handle term mismatch between the query and semantically relevant passages. Recent studies have shown that dense retrieval based on pre-trained language models can effectively deal with such problem using semantic dense representations of query and passages. The method has better performance in many applications including question answering. To promote the research in dense retrieval, we present DuReader_retrieval, the first large-scale Chinese dataset for passage retrieval, which contains queries and passages from real search users and covers many challenges in real-world applications.
◇ Task2 - Knowledge Grounded Dialog Generation:
Pre-trained dialogue generation models can learn to compress the knowledge from dialog corpora into model parameters. But they tend to generate inaccurate responses when the conversation requires fine-grained knowledge. Moreover, it is hard to update the knowledge embedded in these models, resulting in poor performance in conversations concerning time-sensitive topics. To address this issue, we propose the task of response generation using external knowledge acquired in real time from a web search engine. Specifically, a dialog model is required to produce an appropriate search query based on a given dialogue context, acquire relevant knowledge from the search engine and generate an appropriate and informative response using the knowledge. To help build such capabilities, we provide a dataset of web search knowledge powered dialogue.
◇ Task3 - Interpretable sentiment analysis:
Deep learning models are often criticized for being “black boxes”, providing no way to understand their inner decision processes and making it hard to apply them in medicine, law and other fields that involve high-stakes decision-making. Recent years have witnessed greater attention to model interpretability. Following the trend we propose this task to evaluate the interpretability of sentiment analysis models and the performance of rationale extraction methods. The dataset we provide contains a set of opinion texts, human-annotated sentiment labels, token-level rationales for the labels and perturbations of the opinion texts, which is well-designed to evaluate model interpretability from the perspectives of plausibility and faithfulness.
◇ Task4 - Video Semantic Understanding:
Traditional perception-based video content analysis is not really an “understanding” task. In our video semantic understanding task, models take Internet video as input and output semantic labels that describe the video content through combining perceptual content analysis (such as face recognition, OCR recognition, speech recognition, etc.) with knowledge graph computation and reasoning. This task is inspired by real application requirements, providing both the large-scale Chinese video semantic understanding dataset and video-related knowledge graphs.
This competition will award 1 First Prize, 1 Second Prizes and 2 Third Prizes for each task. The winners will get the award certificates issued by CIPS& CCF. The prizes and travel grant for attending the competition forum and award ceremony will be sponsored by Baidu.
◇ First prize: 20,000 RMB + award certificate
◇ Second prize: 15,000 RMB + award certificate
◇ Third Prize: 10,000 RMB + award certificate
◇ Mar 30, 2022: Open Registration
◇ Mar 31, 2022: Dataset Ready for Download; Open Leaderboards
◇ May 30,2022: Registration Deadline
◇ Jun 1, 2022: Release the Final Datasets
◇ Jun 6, 2022: Results Submission Deadline
◇ Jul 5, 2022: Winners Notification
◇ Jul 30, 2022: Camera-ready Submission Deadline
◇ Aug, 2022: Competition Forum and Award Ceremony on Language & Intelligence Summit
Official registration: The official registration opens on Mar 30, 2022 and will close on May 30, 2022. All registered participants who submit valid results will receive customized T-shirts.
Website Link: http://lic2022.cipsc.org.cn