Current retrieval systems provide good solutions towards phrase-level retrieval for simple fact and entity-centric needs. This track encourages research for answering more complex information needs with longer answers. Much like Wikipedia pages synthesize knowledge that is globally distributed, we envision systems that collect relevant information from an entire corpus, creating synthetically structured documents by collating retrieved results.
Join our mailing list: https://groups.google.com/d/forum/trec-car
To motivate a brief example, consider a user wondering about the latest advances in mobile technology. She heard that there is a new iPhone on the marked and is looking for a summary on its features or issues. With this intention in mind, she enters the query
iPhone 7 new features and issues. A possible answer she would be very happy to receive could look like this:
Despite the inclusion of an adapter, the removal of the headphone jack was met with criticism. Criticism was based primarily on the following arguments
Digital output (as opposed to analog output from the 3.5 mm headphone jack) does not show any notable improvement in sound quality;
the inability to charge the iPhone 7 and listen to music simultaneously without Bluetooth;
and the inconvenience of having to carry around an adapter for what is purely a mobile device, diminishing its utility.
In particular, Apple’s vice president Phillip Schiller, who announced the change, was mocked extensively online for stating that removing the headphone jack took ‘courage’.
While this example was taken from Wikipedia, IPhone_7 it should be possible to identify such a list of issues from a Web collection with passage retrieval, consolidation, and organization. Of course, one might envision other responses that would satisfy the information need equally well.
Given an article stub \(Q\), retrieve for each of its sections \(H_i\), a ranking of relevant entity-passage tuples \((E,P)\). The passage \(P\) is taken from a provided passage corpus. The entity \(E\) refers to an entry in the provided knowledge base. We define a passage or entity as relevant if the passage content or entity should be mentioned in the knowledge article. Different degrees of “could be mentioned”, “should be mentioned”, or “must be mentioned” are represented through a PEGFB annotation scale during manual assessment.
Two tasks are offered: Passage ranking and Entity ranking.
In the passage ranking task, entities are omitted.
In the entity ranking task, the entity must be given, and optionally, complemented with a passage that serves as provenance for why the entity is relevant. If the passage is omitted, it will be replaced with the first paragraph from the entity’s article.
Details on the submission format at the end of this page.
Supporting the neural IR community, we provide a large (yet approximate) training set based on a large collection of knowledge articles from Wikipedia that are divided into content passages and outlines. Across all articles, all extracted paragraphs are provided as one large passage corpus (see paragraphs release). We ensure that entity links remain intact. These are gathered from links to Wikipedia (and in future might be augmented to the FACC1 collection, and the TagMe toolkit).
From all stubs obtained in this process, we will select 1000 stubs for which the original stub association is revealed as a training set. Additionally we open up 50% of a recent Wikipedia dump for training (see halfwiki release)
For each section \(H_i\) in training stubs \(Q\) we indicate which passages originated from the section and which knowledge base entities are mentioned therein. We provide three variations of
trec_eval compatible qrels files, using the section hierarchy, only top-level sections, and only the article.
Passage retrieval tasks are often criticized for leading to test collections that are not reusable. We are aware of this problem and intend on solving it through legal span sets and optional candidate sets.
We enact both an automatic binary and a fine-grained manual evaluation procedure on 40 test stubs.
Analogously to the training signals, an automatic (yet approximative) test signal is derived from the passages of the original article as well as entities mentioned therein. However, this test signal is not necessarily reliable, as even the best article has room for improvement. Especially for complex answers, there are often different ways to provide an equally good answer.
Consequently, for a subset of test stubs, contributed entity-passage rankings will additionally be manually assessed for relevance, reflecting different levels of relevance. This manual assessment also provides a reference for the quality of the automatic evaluation procedure.
See the data releases page for data download.
Articles we provide are encoded with the following grammar. Terminal nodes are indicated by
Page -> $pageName $pageId [PageSkeleton] PageSkeleton -> Section | Para | Image Section -> $sectionHeading $sectionHeadingId [PageSkeleton] Para -> Paragraph Paragraph -> $paragraphId, [ParaBody] Image -> $imageURL [PageSkeleton] ParaBody -> ParaText | ParaLink ParaText -> $text ParaLink -> $targetPage $targetPageId $linkSection $anchorText
Support tools for reading this format are provided in the trec-car-tools repository.
All passages that are cut out of Wikipedia articles are assigned a unique (SHA256 hashed)
paragraphId that will not reveal the originating article. Pairs of
paragraphContent are provided as passage corpus.
Each extracted page outline, provides the
pageName (e.g., Wikipedia title) and
pageId, and a nested list of children. Each section is represented by a
headingId, and a list of its children in order. Children can be sections or paragraphs.
Each section in the corpus is uniquely identified by a section path of the form
To avoid UTF8-encoding and whitespace issues with existing evaluation tools like
headingId are projected to ASCII characters using a variant of URL percent encoding.
We provide training/evaluation annotations in the
trec_eval compatible Qrels format.
We provide qrels for passage and entity rankings as well as differnt variations on the section path:
Format for passages:
$sectionId 0 $paragraphId $isRelevant
Format for entities (where the entity is represented through the Wikipedia article it represents - using percent encoding):
$sectionId 0 $pageId $isRelevant
For automatic training/test data
$isRelevant is either 1 or 0 (only 1 are reported).
For manual test data
$isRelevant is a grading on the PEGFB-scale (4-0) according to
We also provide training data not just as stubs and Qrels but also (alternatively) as annotated articles.
These follow a similar format as stubs, but also include the passages with their IDs inside their sections.
Details on the data selection process are available here.
Retrieval results are uploaded through the official NIST submission page, using a format compatible with
.run files. Each ranking entry is represented by a single line in the format below. Please submit a separate run file for every approach, but include rankings for all queries.
Notice that the format differs slightly between tasks. A validation script for your runs will be provided soon.
Per task and per team, up to three submissions will be accepted.
For every section, a ranking of paragraphs is to be retrieved (as identified by the
$sectionId Q0 $paragraphId $rank $score $teamname-$methodname
For every section, a ranking of entities is to be retrieved (as identified by the
$entityId which coincides with
$pageIds – the percent-encoded Wikipedia title)
Optionally, a paragraph from the corpus can be provided as provenance for why the entity is relevant for the section. If the paragraph is omitted, the first paragraph from the entity’s page will be used as default provenance.
The format is as follows (first: with paragraph, second: default)
$sectionId Q0 $paragraphId/$entityId $rank $score $teamname-$methodname $sectionId Q0 $entityId $rank $score $teamname-$methodname
As formats may change slightly between releases, make sure you check out support tools from the corresponding branch in trec-car-tools.
Previous releases - do not use!
v1.4-release (Jan 24, 2017): release of the paragraph collection, spritzer, and halfwiki
v1.2-release (Jan 17, 2017): release of the paragraph collection, spritzer, and halfwiki
v1.1-release (Jan 11, 2017): release of the paragraph collection, spritzer, and halfwiki
2016-11-v1.0-release: Pre-release to straighten out issues in the data format – Format change, not supported any longer.
Collection of paragraphs collected from Wikipedia articles that meet the selection criterion. The links between articles are preserved (see format definition above).
The official test topics for the benchmarkY1 are released benchmarkY1test.
The selection process is as follows:
Half of the Wikipedia articles are dedicated for training. A minimally processed version of this data is available as unprocessed training data (formerly called halfwiki).
Excluding articles that were already study in literature, we derive a subset of pages matching the list of selection criteria as an initial training corpus. The data is provided as five folds which are indended for cross-validation as release.
Additionally, a manual selection pages with similar characteristics, types, structure as the test topics are provided for training:
See data releases for more information.
TREC-CAR Dataset by Laura Dietz, Ben Gamari is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at www.wikipedia.org.