|
|
How does Chinese Dasher work?
As of Wed 30/8/06, there is now a working Chinese Dasher, written by
Will Zou (yz246@cam)- it will be
released as part of Dasher version 4.2 (about 09/2006).
Here are screenshots showing this prototype in use.
These screenshots show a long multi-phrase sentence being entered
all in one go - pinyin first, then chinese characters.
We also aim to make it equally easy, in Dasher, to enter the text
one phrase at a time (about 2 to 4 chinese characters converted at a time).
(There are theoretical reasons and computational reasons for preferring
to enter chinese a single phrase at a time.)
1. Starting writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between
"beijing " and "aoyun" (Olympics), as an example of diambiguation.
|
2. typical choice of two frequently used phrases
|
3.
writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between
"beijing " and "aoyun" (Olympics), as an example of diambiguation.
|
4.
writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between
"beijing " and "aoyun" (Olympics), as an example of diambiguation.
|
5.
writing 'I hope to be a volunteer for the Beijing Olympics.'
There's a separation symbol : <'> included in the PY string, between
"beijing " and "aoyun" (Olympics), as an example of diambiguation.
|
Here are David MacKay's old notes on Chinese Dasher
Chinese Dasher wiki
Some progress
has now been made on Chinese Dasher.
David MacKay has obtained a pinyin corpus
so you can write in
pinyin.
And Tian Li and Kaburagi
are working on making the phonetic-to-ideogram conversion software.
We would not go directly for the ideograms, since there are too many
of them. We have to build up sentences using a sequence of symbols
each of which has small information content.
We can imagine two possible approaches.
-
Use a phonetic approach.
Pin-yin is
a standard method for writing chinese phonetically.
Having obtained a pin-yin sequence, we could
then if required have a probabilistic pronunciation-to-ideogram
mapping also implemented within Dasher. This is how I envisage
japanese Dasher working (with hiragana first, then kanji if required).
This is exactly how Japanese people write Japanese in JWord or Jemacs.
-
Use a stroke-based approach. Have the writer build up
the ideogram stroke by stroke, in the standard sequence that
every chinese / japanese child has drilled into them when they are young.
The displayed glyph on the screen could be "the character drawn thus far"
or could be "the next stroke" or perhaps both on top of each other.
This screenshot shows me writing "Beijing city" in Dasher 3.2,
the version that does not yet have a proper language model for Chinese.
|
The Dasher project is supported by the Gatsby Foundation and by the European Commission in the context of the AEGIS project - open Accessibility Everywhere: Groundwork, Infrastructure, Standards) David MacKaySite last modified Fri Oct 1 10:33:24 BST 2010
|
|