Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
摘要:Evaluating natural language generation (NLG) systems is inherently challenging. While human evaluation remains the gold standard, it is difficult to scale and often suffers from inconsistencies and demographic biases. LLM-based evaluation offers a scalable alternative but is highly sensitive to prompt design, where small variations can lead to significant discrepancies. In this talk, I will introduce an inversion learning method that learns effective reverse mappings from model outputs back to their input instructions, enabling the automatic generation of highly effective, model-specific evaluation prompts. This method is simple, requires only a single evaluation sample, and eliminates the need for manual prompt engineering, thereby improving both the efficiency and robustness of LLM-based evaluation.

简介:Chenghua Lin is a Full Professor and Chair in Natural Language Processing in the Department of Computer Science at The University of Manchester. His research lies at the intersection of machine learning and natural language processing, with a focus on language generation, multimodal LLMs, and evaluation methods. He currently serves as Chair of the ACL SIGGEN Board, a member of the IEEE Speech and Language Processing Technical Committee, and Associate Editor for Computer Speech and Language. He has received several prizes and awards for his research and academic leadership, including the CIKM Test-of-Time Award, the INLG Best Paper Runner-up Award, and an Honourable Mention for the Scottish Informatics and Computer Science Alliance (SICSA) Supervisor of the Year Award. He has also held numerous program and chairing roles for *ACL conferences, including Documentation Chair for ACL’25, Publication Chair for ACL’23, Workshop Chair for AACL-IJCNLP’22, Program Chair for INLG’19, and Senior Area Chair for EMNLP’20, ACL’22–’23, EACL’23, NAACL’25, and AACL’25.