2016 Baidu Cup CCF NLP Challenge (CFP)

Developing more intelligent search engines is a long-term common goal for both academia and industry. An intelligent search engine should meet either precise or vague requirements from users. Results that are semantically relevant to the queries should be returned, other than those literally matching results.

In particular, the technologies behind modern personal digital assistants and robots demand accurate understandings of requirements and intents from users, and retrieve answers automatically from the web. During the 2015 Baidu World Summit, Baidu has unveiled the new virtual assistant ‘Duer’ (度秘) which will be integrated into its latest mobile search app. The company’s CEO Robin Li pointed out that there will be three core components involved in Duer: integration, indexing and delivery. Here indexing means the indexing of all integrated information and services for providing more intelligent services to users. In other words, we need better ways to model all these information and services. For instance, people may search for “restaurants with good environment suitable for kids’ birthday parties”. In order to meet the user’s intent, the search engine needs to understand every restaurant, check which of them can be tagged as “with good environment” and “suitable for kids’ birthday parties” and index the restaurants with these tags for retrieval.

This year, Baidu Cup 2016 challenges participants to tackle the problem of entity search in the scenario of Duer.

Task Description

Given an entity search query q and a set of candidate entities E, the participating system should automatically retrieve entities that match the query q from E.

Key Concepts:

Entity: An entity is something that exists in itself, including normal entities such as animals, plants and foods; named entities such as names of people, organization, movies, songs, etc.

Entity search: search behaviors that targeting for entities, such as restaurants with elegant environment, movies recommended for lovers, yellow-flowered trees, etc.

Query: keywords or keyphrases used by users to express their intent.

Input/Output:

Input: an entity search query q, a set of candidate entities E.

Output: a ranked list of entities E’ that match q , where E’ should be a subset of E.

Dataset:

The dataset contains four specified types of entities including restaurants, movies, TV shows and celebrities. (The specific types are subject to change.)

For every type there are 1,000 entity queries along with 50-100 candidate entities for each query.

All retrieved candidate entities will be collected and manually annotated to construct the gold-standard entity sets for each query.

Evaluation Metric:

MAP (Mean Average Precision) based on the gold-standard entity set.

Rules:

Right after the competition begins, 40% of data will be released as the development set for all participants.

All participants can submit results for the development set to the evaluation platform. The platform will automatically evaluate and rank all submitted results every day.

The rest 60% of data will be released two days before the submission deadline. All participants should finish processing the full dataset (including the development set) and submit their final results to the evaluation platform.

Competition Rules:

We allow every participant to submit at most once to the evaluation platform per day. Multiple submissions made per day will be treated as violation against the rules.

Prizes

One First Prize, One or Two Second Prizes, and Several Third Prizes.

The Top Prize is RMB 25,000.00!

Timeline

Mar 4, 2016: Registration opens;

Mar 15, 2016: Competition begins (dataset ready for the public);

June 15, 2016: Competition ends (submission deadline);

July 6, 2016: Winners notified.

Organization

◆ Hosts: China Compute Federation(CCF)

◆ Organizer:

◇ Technical Committee on Chinese Information Technology(CCF TCCI)

◇ Baidu Netcom Science and Technology Co.,Ltd

◆ Co-organizer:

◇ Beijing Institute of Big Data Research(BIBDR)

◆ Steering Committee:

◇ Weinan E, Peking University

◇ Haifeng Wang, Baidu

◇ Bing Liu, UIC

◆ Co-Chairs:

◇ Xiaojun Wan, Peking University

◇ Shiqi Zhao, Baidu

◆ Organzing Committee:

◇ Xiaojun Wan, Peking University

◇ Shiqi Zhao, Baidu

◇ Yansong Feng, Peking University

◇ Ke Sun, Baidu

◇ Yajuan Lyu, Baidu

◆ Contact Email: nlpcc@pku.edu.cn

◆ Web Page: http://tcci.ccf.org.cn/conference/2016/index.html

Host:		Organizer:
Co-organizer: