MCSCSet is a large-scale specialist-annotated dataset, designed for the task of Medical-domain Chinese Spelling Correction that contains about 200k samples. MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists.
Source: MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling CorrectionPaper | Code | Results | Date | Stars |
---|