Word Segmentation for Chinese Judicial Documents

Linxia Yao, Jidong Ge, Chuanyi Li, Yuan Yao, Zhenhao Li, Jin Zeng, Bin Luo, Victor Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents: (1) existing methods rely on large-scale labeled data which is typically unavailable in judicial documents, and (2) judicial document has its own language features and writing formats. In this paper, a word segmentation method is proposed for Chinese judicial documents. The proposed method consists of two steps: (1) automatically generating some labeled data as legal dictionaries, and (2) applying a hybrid multi-layer neural networks to do word segmentation incorporating legal dictionaries. Experiments are conducted on a dataset of Chinese judicial documents showing that the proposed model can achieve better results than the existing methods.

Original languageEnglish
Title of host publicationData Science - 5th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2019, Proceedings
EditorsXiaohui Cheng, Weipeng Jing, Xianhua Song, Zeguang Lu
PublisherSpringer-Verlag
Pages466-478
Number of pages13
ISBN (Print)9789811501173
DOIs
Publication statusPublished - 13 Sep 2019
Event5th International Conference of Pioneer Computer Scientists, Engineers and Educators - Guilin, China
Duration: 20 Sep 201923 Sep 2019

Publication series

NameCommunications in Computer and Information Science
Volume1058
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference5th International Conference of Pioneer Computer Scientists, Engineers and Educators
Abbreviated titleICPCSEE 2019
Country/TerritoryChina
CityGuilin
Period20/09/1923/09/19

Fingerprint

Dive into the research topics of 'Word Segmentation for Chinese Judicial Documents'. Together they form a unique fingerprint.

Cite this