In recent years, significant breakthroughs have been made in artificial intelligence (AI) areas that involve single modality, such as speech, natural language, and vision. On many single-modality tasks such as object recognition, image generation, speech recognition, machine translation, AI systems have been performed at a level comparable to humans on specific data sets. These research results are also transformed into important applications in the real world. On the other hand, as the underlying problems of single-modality AI are getting solved, researchers realize that higher-level AI tasks often involve more complex information processing across multiple modalities. Meanwhile, research that limited to single modality often fail to take full advantages of cross-modal information. Therefore, it is of great interest to study multimodal modeling and learning approaches across more than one modality. The goal of this workshop is to provide a forum for the community to exchange ideas and latest advances for multimodal assisted natural language processing.
Zhengzhou, China | October 16, 2020
◇ 1:30 PM – 1:40 PM Opening
◇ 1:40 PM – 3:00 PM Keynote Speech
◇ 3:00 PM – 3:15 PM Tea Break
◇ 3:15 PM – 4:30 PM Oral Session
◇ 4:30 PM – 5:30 PM Poster Session
Topics of interest include (but are not limited to):
◇ Fundamental problems and methods for processing multi-modality data across text, speech, image, video, code, and heterogeneous signals
- Multimodal representation learning, pre-training
- Multimodal alignment, fusion and embedding
- Multi-task learning, few-shot learning, reinforcement learning in multimedia
- Semi-supervised and unsupervised learning of multimodal problems
◇ Modelling multimodal in dialogue system
- Multimodal semantic analysis and prediction
- Multimodal sentiment analysis and emotion recognition
- Multimodal dialogue response generation
◇ Multimodal methods for classical NLP tasks
- Multimodal neural machine translation
- Multimodal summarization
◇ Deep learning methods and new applications for cross-modalities
- Image/video captioning
- Visual question answering
- Visual story-telling
- Comments generation
- Text-to-image/video retrieval
- Text-to-image synthesis
- Vision-and-language navigation
- Text-to-code retrieval
- Code-to-text generation
◇ Multimodal and Multimedia Resources
◇ Evaluation metrics of multimodal applications
All submissions must be written in English. Accepted submissions will be presented at the workshop orally or as poster and be published as a volume in the Springer LNAI series (EI & ISTP indexed, for English papers), and the ACTA Scientiarum Naturalium Universitatis Pekinensis (EI & Scopus indexed, for Chinese papers). Submissions are limited to a total of 12 (twelve) pages for English papers and 8 (eight) pages for Chinese papers, including all content and references, and must be in PDF format. The website for submissions is https://www.softconf.com/nlpcc/mmnlp-2020. Submissions must conform to the specifications of NLPCC 2020 call for papers regarding multiple submissions and preparing papers for the double-blind review process (http://tcci.ccf.org.cn/conference/2020/cfp.php). Papers that violate these specifications will be desk rejected.
◇ Submission Deadline: July 15, 2020
◇ Notification of Acceptance: Aug 5, 2020
◇ Camera ready: Aug 15, 2020
◇ MMNLP workshop date: October 16, 2020
◇ Xiaodong He, JD.COM
◇ Youzheng Wu, JD.COM
◇ Nan Duan, Microsoft Research Asia
◇ Jia Jia, Tsinghua University
◇ Qin Jin, Renmin University of China
◇ Qun Liu, Huawei Noah's Ark Lab
◇ Guodong Zhou, Soochow University
◇ Ting Yao, JD.COM
◇ Nan Zhao, JD.COM
◇ Haoyang Huang, MSRA
◇ Lei Ji, MSRA
◇ Jiajun Zhang, CASIA
◇ Kang Liu, CASIA
◇ Please contact wuyouzheng1@jd.com or zhaonan8@jd.com if you have any questions.