all ids (page squid, run id, section path, etc) must be set to non-empty ascii strings.
additionally: a paragraph_id must be a hexadecimal string of 40 characters that is contained in the paragraphCorpus.cbor
Paragraphs for a page must be a non-empty list
The minimal representation of a paragraph is the paragraph_id
. The para_body element is optional, but if given, it must be correct and agree with the representation in the paragraphCorpus.cbor. Cannot be set of an empty list, instead the entry must not appear in the json.
A page’s paragraph_origins
are optional, but if given, they must be correct according to the following defition with valid paragraph id and a float-valued rank_score
. Cannot be set to an empty list, instead must not appear in json.
The section_path` must refer to a valid heading id of the page outlines are allowed. These are to be given in the format “squid/heading id”. It is strongly recommended to include paragraphs for all headings.
Up to 20 paragraphs are allowed per heading. (We strongly encourage to include exactly 20 paragraphs per heading.)
The rank
field is optional, but if given must must agree with the sort-order of the rank_score
. Also, the lowest valid number for rank
is 1 (i.e., highest rank is 1). Ranks must be unique (i.e., no ties).
All page squids must start with the proper namespace, i.e., “tqa2:”. They cannot contain %20
symbols, because these were only used in Y1 and Y2 – not in Y3!
Run ids must not contain more than 15 alpha-numeric characters including _-.
, but cannot start with .
. (Please include an abbreviation of your team name!)
Maximal 20 paragraphs can be givem. We strongly encourage to provide exactly 20 paragraphs!
Code for format validation (and conversion of TREC rankings into the Y3 submission format) is available.