![]() |
OLAC Record oai:www.ldc.upenn.edu:LDC2016T22 |
| Metadata | ||
| Title: | Chinese-English Parallel Sentences Extracted from Patents | |
| Access Rights: | Licensing Instructions for Subscription & Standard Members, and Non-Members: http://www.ldc.upenn.edu/language-resources/data/obtaining | |
| Bibliographic Citation: | Tsou, Benjamin, Bin Lu, and Kapo Chow. Chinese-English Parallel Sentences Extracted from Patents LDC2016T22. Web Download. Philadelphia: Linguistic Data Consortium, 2016 | |
| Contributor: | Tsou, Benjamin | |
| Lu, Bin | ||
| Chow, Kapo | ||
| Date (W3CDTF): | 2016 | |
| Date Issued (W3CDTF): | 2016-10-19 | |
| Description: | *Introduction* Chinese-English Parallel Sentences Extracted from Patents was developed by Chilin (HK) Limited and contains 500,000 sentence pairs of Chinese-English parallel text. This resource is based on the training corpus and test sets developed for the Tokyo-based NTCIR 2009 & 2010 tasks on Patent Machine Translation. *Data* The sentences in this release were selected from a larger corpus of than 300,000 Chinese-English parallel patents in different fields according to a number of filtering parameters including word alignment, sentence length and language modeling. They were then automatically segmented and aligned. All text is encoded as UTF-8. *Samples* Please view this Chinese sample and English sample. *Updates* None at this time. *Pricing* Not-for-profit organizations may license this data set for US$25.00 under the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement for Non-Members for use in linguistic research, education and non-commercial technology development. For-profit organizations may license this data for US$5000, discounted to US$4000 for LDC for-profit members, under the Commercial License Agreement for Chinese-English Parallel Sentences Extracted from Patents (LDC2016T22). Current fees in this catalog entry reflect those pertaining to a for-profit organization license. Not-for-profit organizations should contact LDC's Membership Office to license this data set. | |
| Extent: | Corpus size: 214864 KB | |
| Identifier: | LDC2016T22 | |
| https://catalog.ldc.upenn.edu/LDC2016T22 | ||
| ISBN: 1-58563-770-X | ||
| ISLRN: 280-113-850-942-8 | ||
| DOI: 10.35111/td6z-pv16 | ||
| Language: | English | |
| Chinese | ||
| Language (ISO639): | eng | |
| zho | ||
| License: | Chinese-English Parallel Sentences Extracted from Patents Agreement (For-profit): https://catalog.ldc.upenn.edu/license/chinese-english-parallel-sentences-extracted-from-patents-agreement-for-profit.pdf | |
| LDC User Agreement for Non-Members: https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | ||
| Medium: | Distribution: Web Download | |
| Publisher: | Linguistic Data Consortium | |
| Publisher (URI): | https://www.ldc.upenn.edu | |
| Relation (URI): | https://catalog.ldc.upenn.edu/docs/LDC2016T22 | |
| Rights Holder: | Portions © 2016 Chilin (HK) Limited, © 2016 Trustees of the University of Pennsylvania | |
| Type (DCMI): | Text | |
| Type (OLAC): | primary_text | |
OLAC Info |
||
| Archive: | The LDC Corpus Catalog | |
| Description: | http://www.language-archives.org/archive/www.ldc.upenn.edu | |
| GetRecord: | OAI-PMH request for OLAC format | |
| GetRecord: | Pre-generated XML file | |
OAI Info |
||
| OaiIdentifier: | oai:www.ldc.upenn.edu:LDC2016T22 | |
| DateStamp: | 2020-11-30 | |
| GetRecord: | OAI-PMH request for simple DC format | |
Search Info | ||
| Citation: | Tsou, Benjamin; Lu, Bin; Chow, Kapo. 2016. Linguistic Data Consortium. | |
| Terms: | area_Europe country_GB dcmi_Text iso639_eng iso639_zho olac_primary_text | |