![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2025T15 |
| Metadata | ||
| Title: | KAIROS Phase 2 Quizlet | |
| Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
| Bibliographic Citation: | Chen, Song, et al. KAIROS Phase 2 Quizlet LDC2025T15. Web Download. Philadelphia: Linguistic Data Consortium, 2025 | |
| Contributor: | Chen, Song | |
| Bies, Ann | ||
| Caruso, Christopher | ||
| Tracey, Jennifer | ||
| Strassel, Stephanie | ||
| Date (W3CDTF): | 2025 | |
| Date Issued (W3CDTF): | 2025-10-15 | |
| Description: | *Introduction* KAIROS Phase 2 Quizlet was developed by the Linguistic Data Consortium (LDC). It contains English and Spanish text, video and image data and annotations used for pre-evaluation research and system development during Phase 2 of the DARPA KAIROS program. KAIROS Quizlets were a series of narrowly defined tasks designed to explore specific evaluation objectives enabling KAIROS system developers to exercise individual system components on a small data set prior to the full program evaluation. This corpus contains the complete set of Quizlet data used in Phase 2 which focused on five real-world complex events (CEs) within the Disease Outbreak (DO) scenario: * CE2002: Clostridium perfringens, Chipotle restaurant, Ohio, 2018 * CE2004: Salmonella, from peanut butter, originated from Georgia peanut factory 2008 * CE2011: 2011 E. coli linked to contact with livestock at fair, North Carolina * CE2019: 2017 Botulism from nacho cheese sauce, California * CE2039: 1976 Philadelphia Legionnaires' disease outbreak The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large multilingual, multimedia corpus. *Data* Five quizlets were developed in Phase 2 (Quizlets 5 - 9). In additon to the source documents, this release contains the contents of Quizlet 6 (source documents and manual annotation), Quizlet 7 (source documents, updated annotation and graph G), Quizlet 8 (source documents, updated annotation, and graph G), and Quizlet 9 (source documents, manual annotation, and graph G). Quizlet 5 (schema representation development) did not require data or annotation and is not included in this release. Source data was collected from the web by LDC; 66 root web pages were collected and processed, yielding 65 text data files, 890 image files, and 10 video files. Annotation steps included labeling scenario-relevant events and relations for each document to develop a structured representation of temporally-ordered events, relations and arguments; generating a reference knowledge graph; and linking labeled entries to a knowledge base derived from a Wikidata-based ontology.. Source data is presented in various formats: .gif, .jpg,. ltf, .mp4, .png, .psm, and .svg. Annotations are presented as tab separated files (.tab) for temporal ordering, relations, events, and arguments. Software tools are also included in this release. The tools recreate original source data from the processed XML material. * ltf2rsd.perl -- convert ltf.xml files to rsd.txt (raw-source-data) * ltfzip2rsd.perl -- extract and convert ltf.xml files from zip archives *Samples* Please view these samples: * Argument Annotations (.tab) * Graph G (.json) * PSM (.xml) * LTF (.xml) *Sponsorship* KAIROS was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-19-S-0014. *Updates* No updates at this time. | |
| Extent: | Corpus size: 370688 KB | |
| Identifier: | LDC2025T15 | |
| https://catalog.ldc.upenn.edu/LDC2025T15 | ||
| ISLRN: 655-957-812-959-9 | ||
| DOI: 10.35111/n6td-nn51 | ||
| Language: | Spanish | |
| English | ||
| Language (ISO639): | spa | |
| eng | ||
| License: | LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | |
| Medium: | Distribution: Web Download | |
| Publisher: | Linguistic Data Consortium | |
| Publisher (URI): | https://www.ldc.upenn.edu | |
| Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2025T15 | |
| Rights Holder: | Portions © 2021 20 Minutos Editora, SL, © 2021 ABC News, © 2018-2020 A&E Television Networks, LLC, © 2020 Cable News Network. A Warner Bros. Discovery Company., © 2021 Capitol Broadcasting Company, Inc., © 2011 CBS Broadcasting Inc., © 2021 Chicago Tribune, © 2021 CNBC LLC, © 2016 ElPeriodicodeMexico.com., © 2021 Encyclopedia of Greater Philadelphia, © 2021 EnsembleIQ, © 2019 Entravision, © 2021 Google LLC, © 2020 Gray Television, Inc., © 2021 Inside Edition Inc. and CBS Interactive Inc., © 2017-2020 Insider Inc., © 2021 Metro Corp., © 2019 Marler Clark, Inc., © 2017 MERCADOTECNIA PUBLICIDAD MARKETING NOTICIAS, © 2019 NBCUniversal Media, LLC, © 2021 News. Policy. Trends. North Carolina., © 2022 npr, © 2021 PacBio, © 2021 Public Broadcasting Service (PBS), © 2021 Regents of the University of Minnesota, © 2019 Reuters, © 2021 Siegel Brill, P.A., © 2021 Sinclair, Inc., © 2021 The Chestnut Hill Local, © 2018, 2019 The Philadelphia Inquirer, LLC, © 2017, 2019, 2020 Univision Communications Inc., © 2021 Vimeo.com, Inc., © 2022 Vox Media, LLC, © 2021, 2022, 2025 Trustees of the University of Pennsylvania | |
| Subject: | English language | |
| Spanish language | ||
| Subject (ISO639): | eng | |
| spa | ||
| Type (DCMI): | Image | |
| MovingImage | ||
| Software | ||
| Sound | ||
| StillImage | ||
| Text | ||
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | The LDC Corpus Catalog | |
| Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2025T15 | |
| DateStamp: | 2025-10-15 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Chen, Song; Bies, Ann; Caruso, Christopher; Tracey, Jennifer; Strassel, Stephanie. 2025. Linguistic Data Consortium. | |
| Terms: | area_Europe country_ES country_GB dcmi_Image dcmi_MovingImage dcmi_Software dcmi_Sound dcmi_StillImage dcmi_Text iso639_eng iso639_spa olac_primary_text | |
Inferred Metadata | ||
| Country: | SpainUnited Kingdom | |
| Area: | Europe | |