Inference Group
.
.




Search :
.
Alias Chinese.html HiddenPage Chinese

How does Chinese Dasher work?

As of Wed 30/8/06, there is now a working Chinese Dasher, written by Will Zou (yz246@cam)- it will be released as part of Dasher version 4.2 (about 09/2006).

Here are screenshots showing this prototype in use. These screenshots show a long multi-phrase sentence being entered all in one go - pinyin first, then chinese characters. We also aim to make it equally easy, in Dasher, to enter the text one phrase at a time (about 2 to 4 chinese characters converted at a time). (There are theoretical reasons and computational reasons for preferring to enter chinese a single phrase at a time.)

chinesedasher1
1. Starting writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between "beijing " and "aoyun" (Olympics), as an example of diambiguation.
chinesedasher2
2. typical choice of two frequently used phrases
chinesedasher3
3. writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between "beijing " and "aoyun" (Olympics), as an example of diambiguation.
chinesedasher4
4. writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between "beijing " and "aoyun" (Olympics), as an example of diambiguation.
chinesedasher5
5. writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between "beijing " and "aoyun" (Olympics), as an example of diambiguation.

Here are David MacKay's old notes on Chinese Dasher

Chinese Dasher wiki

Some progress has now been made on Chinese Dasher. David MacKay has obtained a pinyin corpus so you can write in pinyin. And Tian Li and Kaburagi are working on making the phonetic-to-ideogram conversion software.

We would not go directly for the ideograms, since there are too many of them. We have to build up sentences using a sequence of symbols each of which has small information content.

We can imagine two possible approaches.

  1. Use a phonetic approach. Pin-yin is a standard method for writing chinese phonetically. Having obtained a pin-yin sequence, we could then if required have a probabilistic pronunciation-to-ideogram mapping also implemented within Dasher. This is how I envisage japanese Dasher working (with hiragana first, then kanji if required). This is exactly how Japanese people write Japanese in JWord or Jemacs.
  2. Use a stroke-based approach. Have the writer build up the ideogram stroke by stroke, in the standard sequence that every chinese / japanese child has drilled into them when they are young. The displayed glyph on the screen could be "the character drawn thus far" or could be "the next stroke" or perhaps both on top of each other.
beijing
This screenshot shows me writing "Beijing city" in Dasher 3.2, the version that does not yet have a proper language model for Chinese.

The Dasher project is supported by the Gatsby Charitable Foundation
David MacKay
Site last modified Fri Oct 1 10:33:28 BST 2010