TREC CAR

Data Releases (latest, v2.0)

We provide the following data sets under a Creative Commons Attribution-ShareAlike 3.0 Unported License. It is based on content extracted from Wikipedia that is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.

Please cite this data set as

Laura Dietz, Ben Gamari. 
"TREC CAR 2.0: A Data Set for Complex Answer Retrieval". 
Version 2.0, 2018. 
http://trec-car.cs.unh.edu

Note that there are significant changes between this dataset and v1.5.

Data sets

All archives use XZ compression, datasize refers to uncompressed data.

Old Releases

For discontinued data releases see:

Support Tools

Support tools for reading this data can be found here: https://github.com/TREMA-UNH/trec-car-tools-java.

Don’t copy the code, use it as maven dependency:

    <repositories>
        <repository>
            <id>jitpack.io</id>
            <url>https://jitpack.io</url>
        </repository>
    </repositories>

    <dependencies>
...
  	<dependency>
            <groupId>com.github.TREMA-UNH</groupId>
            <artifactId>trec-car-tools-java</artifactId>
            <version>9</version>
        </dependency>
...

See examples on how to use the tools here: https://github.com/TREMA-UNH/trec-car-tools/.

You may be abel to lead v1.5 data files with this code, but we can’t guarantee correctness.

Contents

The following kinds of data derivatives are provided

Data:

Qrels (trec_eval-compatible qrels files) which are automatically derived from Articles (to be complemented by human judgments).

Test topics

For evaluation, only a *cbor.outlines fill will be distributed and a decision will have been made which of the three qrels files will be used.

Issues

Please submit any issues with trec-car-tools as well as the provided data sets using the issue tracker on github.

Creative Commons License
TREC-CAR Dataset by Laura Dietz, Ben Gamari is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at www.wikipedia.org.