#cc-cedict #builder #morphological #dictionary #chinese

lindera-cc-cedict-builder

A Chinese morphological dictionary builder for CC-CEDICT

40 releases (21 breaking)

0.32.3 Mar 18, 2025
0.32.2 Jun 29, 2024
0.31.0 May 28, 2024
0.29.0 Mar 18, 2024
0.12.2 Mar 23, 2022

#869 in Text processing

Download history 2926/week @ 2025-01-24 3875/week @ 2025-01-31 4038/week @ 2025-02-07 3740/week @ 2025-02-14 3060/week @ 2025-02-21 2920/week @ 2025-02-28 3333/week @ 2025-03-07 3403/week @ 2025-03-14 3568/week @ 2025-03-21 5278/week @ 2025-03-28 3729/week @ 2025-04-04 4199/week @ 2025-04-11 5249/week @ 2025-04-18 3259/week @ 2025-04-25 2960/week @ 2025-05-02 2771/week @ 2025-05-09

15,054 downloads per month
Used in 13 crates (2 directly)

MIT license

150KB
3K SLoC

Lindera CC-CEDICT Builder

License: MIT Join the chat at http://gitter-im-s.njmu.s5.bt8.net/lindera-morphology/lindera Crates.io

CC-CEDICT dictionary builder for Lindera.

Dictionary format

Refer to the manual for details on the unidic-mecab dictionary format and part-of-speech tags.

Index Name (Chinese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 Major POS classification
5 词类1 Middle POS classification
6 词类2 Small POS classification
7 词类3 Fine POS classification
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition

User dictionary format (CSV)

Simple version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 词类 Major POS classification
2 併音 pinyin

Detailed version

Index Name (Japanese) Name (English) Notes
0 表面形式 Surface
1 左语境ID Left context ID
2 右语境ID Right context ID
3 成本 Cost
4 词类 POS
5 词类1 POS subcategory 1
6 词类2 POS subcategory 2
7 词类3 POS subcategory 3
8 併音 pinyin
9 繁体字 traditional
10 簡体字 simplified
11 定义 definition
12 - - After 12, it can be freely expanded.

How to use CC-CEDICT dictionary

For more details about lindera command, please refer to the following URL:

API reference

The API reference is available. Please see following URL:

Dependencies

~9MB
~212K SLoC