<HTML>
<HEAD>
<meta http-equiv="content-type" content="text/html; charset=Unicode-2-0">
<TITLE>A Neural Network for Disambiguating Pinyin Chinese Input</TITLE>
</HEAD>
<BODY bgcolor="ffffff" text="000000">
<i><font size="-1">Reproduced here with permission from the <a href="http://www.calico.org">Computer Assisted Language Instruction Consortium</a>, from the Proceedings of the CALICO '94 Annual Symposium, (March 14-18, 1994, Northern Arizona University)<br></font></i>
<center>
<H1><B>A Neural Network for Disambiguating Pinyin Chinese Input</B>
</H1>

<P>
<B>Mei Yuan, Richard A. Kunst, and Frank L. Borchardt</B>
</center>
<P>
<B>Introduction</B> 
<P>
The most user-friendly way of typing Chinese on a personal computer
is through entering it phonetically, as one would speak, by typing
in a standard Roman transcription such as <I>pinyin</I>, and letting
the computer do the work of looking up the <I>pinyin</I> words
in an internal dictionary which contains correspondences between
<I>pinyin</I> and Chinese Hanzi characters, then converting them
instantly into the correct characters. This phonetic conversion
method is the main Chinese input method for both courseware authors
and students in the <B>WinCALIS</B> 2.0 Computer Assisted Language
Learning (CALL) authoring system for Windows<B>. </B>But there
is an inconvenience to the <I>pinyin</I>-based typing of Chinese
because there are many homophones, words which sound alike, even
when tones are taken into account<B>. </B>In such cases, the computer
can only present the typist with a selection list and ask him
to choose the desired word<B>. </B>(If Chinese were English, the
typist would enter the phonetic transcription &quot;tu,&quot;
and be presented with the homophone selection list &quot;to,&quot;
&quot;two,&quot; and &quot;too.&quot;)
<P>
Sometimes the typist must search and choose from among dozens
of Chinese characters<B>. </B>Homophones are most numerous for
the single-syllable words which form the high-frequency core vocabulary
of any language. Choosing from among a list of homophones is obviously
inefficient<B>. </B>Some efforts have been made to deal with this
problem, such as maintaining frequency lists and typing in longer
contexts of combinations of syllables. These are made use of extensively
also in <B>WinCALIS</B> 2.0<B>. </B>Here we would like to introduce
a new approach to deal with cases where these other approaches
fail: a neural network which is used to predict the most likely
word.
<P>
Perhaps because our software is used to assist in language teaching,
it is easy to imagine that syntax could help to predict<B>. </B>But
the problem is that natural language is so flexible that it can't
be defined clearly like a computer language<B>. </B>The concept
of the neural network has been talked about for several years
and has been explained in various ways. Here, we would like to
explain a neural network by drawing an analogy<B>. </B>There are
two ways to learn language: one is native language learning, the
other is foreign language learning<B>. </B>The difference between
these is just like that between the neural network and the traditional
computer. Foreign language learning, by which we mean learning
at school, starts with character, pronunciation, sentence structure,
and so on<B>. </B>That means the teacher knows all the details,
and teaches what to
do and how to do it<B>. </B>This is like the traditional computer
algorithm it does what people know definitely how to do and what
will happen<B>. </B>But the neural network is designed to simulate
the intelligence of human beings and to tell us how to do something
after it learns by itself<B>. </B>It is similar to native language
learning<B>. </B>Parents hardly are concerned at all about syntax,
but the child will speak perfectly after listening and listening<B>.
</B>Parents never talk about any rules of language, but the child
will follow these rules even though he never notices them<B>.
</B>Similarly we provide to the neural network only samples of
Chinese text, but no rules, and after learning by itself, it will
know what to do<B>. </B>It will know which homophone is the right
one to fit in a certain context, just as the English speaker knows
which form of /tu/ fits in a certain context<B>. </B>Thus a neural
network looks as if it is very suitable to deal with the problem
of Chinese input.
<P>
<B>Structure</B> 
<P>
The structure of the neural network in <B>WinCALIS</B> is shown
as Figure 1.
<P>
<center><img src="../neural/1fig1.gif" width=300 height=355 alt=""><BR>
<I>Figure 1. Structure of the Neural Network
</I></center>
<P>
There are 23 nodes both in the input layer and the output layer.
Each of the nodes stands for one word class category, like &quot;noun,&quot;
&quot;verb,&quot; &quot;adverb,&quot; etc. There are 7 nodes in
the hidden layer. The input is one certain word class, and the
output indicates the different probabilities of any of the 23
word classes appearing after that word class.
<P>
<B>Training</B> 
<P>
To train the neural network is to find an appropriate weight matrix
according to the training material<B>. </B>We used the generalized
delta rule as the learning algorithm for the network<B>. </B>The
training material consists of many sample sentences<B>. </B>While
the typists who compiled the training material typed Chinese,
a file including all the information about word classes was automatically
created<B>. </B>For example, when the following sentence was typed:
<P><blockquote>
Wǒ yǒu sān bǎ yàoshi.
我有三把钥匙。
I have three (measure for keys) keys.
<P></blockquote>
the corresponding wordclass string &quot;PR Vt NU ME NO EP&quot;
(which stands for pronoun, verb-transitive, number, measure, noun,
and period) was also saved.
<P>
When the neural network was trained, any two adjacent word classes
were used as a pair of Input and Target. Thus in the sentence
above, the pairs of Inputs and Targets are (PR Vt) (Vt NU) (NU
ME) (ME NO) (NO EP). When the neural network gets a word class
input, it converts the word class to a vector I = (i1,i2,i3 ...
i23) by giving a high value to the node corresponding to that
word class, and giving a low value to all the others<B>. </B>So
if the input word class is AD (adverb), the input vector is I
= (0.999, 0.001, 0.001 ... 0.001)<B>. </B>The target is converted
in the same way.
<P>
After it gets its input, the neural network calculates step by
step from the input layer to the output layer.
<P>
<P><blockquote><b>
H =W<sup>h</sup>.I ;
<br>
O =W<sup>o</sup>.H ; <br>
(Here, H is Hidden, I is Input, O is Output)</b></blockquote>
<P>
Because the output will be the probability, the value of which
ranges from 0 to 1, we used the sigmoid function f(x) = 1/(1+e-x).
<P><blockquote><b>
h1<br>
h2<br>
.  = H = f(Wh.I )<br>
.  hk:=<br>
h7<p>
o1<br>
o2<br>
.  = O = f(Wo.H )<br>
. <br>
o23<p></b></blockquote>
Then it compares the output with the target and gets the errors<B>.
</B>Using these errors it back-propagates step by step from the
output layer back to the input layer, in order to adjust the weight
matrixes.
<P><blockquote><b>
1<br>
2<br>
. = = (T -O ).f'(O )<br>
.<br>
23<br>
f'(x) = (1/(1+e-x))' = f(x).(1-f(x))<br>
h1<br>
h2<br>
. = h= .Wo.f'(H )<br>
.<br>
h7<br>
W(t+1)= W(t) + . .(I )T+ . W(t-1)<P></b></blockquote>
After completing those steps for a single Input-Target pair, the
next Input-Target pair is used to repeat the above steps, then
the next, then the next until the whole material is learned (about
60,000 word class tokens for the whole material).
<P>
We did not use masses of training material, so the neural network
had to occasionally re-learn the same material.<B> </B>When the
table resulting from a trial run had only a few changes after
one more training, we considered that the weight matrixes had
converged sufficiently<B>. </B>Below is the table of weight matrices
which <B>WinCALIS</B> uses now.
<P>
<center><table border>
<caption><i>Figure 2. Table of Neural Network Prediction Weights</i></caption>
<tr><td>Adverb</td><td>AD ( VE 0.6700 VA 0.0826 CV 0.0769 AD 0.0713 )</td></tr>
<tr><td>Adstative</td><td>AS ( AJ 0.9750 VE 0.1478 NO 0.0594 )</td></tr>
<tr><td>Adjective</td><td>AJ ( NO 0.3135 VE 0.1979 PS 0.1131 EC 0.0885 EP 0.0802 )</td></tr>
<tr><td>Coverb (Prep.)</td><td>CV ( NO 0.5748 VE 0.1640 )</td></tr>
<tr><td>Interjection</td><td>IN ( NO 0.4567 VE 0.1553 AJ 0.0780 )</td></tr>
<tr><td>Movable. Adv.</td><td>MA ( NO 0.5776 VE 0.2537 AJ 0.0594 )</td></tr>
<tr><td>Measure</td><td>ME ( NO 0.4664 VE 0.1407 AJ0.0801 )</td></tr>
<tr><td>Noun</td><td>NO ( NO 0.3253 VE 0.1812 EC 0.0741 EP 0.0686 PS 0.0592 AD 0.0569 )</td></tr>
<tr><td>Number</td><td>NU ( ME 0.5017 NU 0.3380 VE 0.1672 NO 0.0751 )</td></tr>
<tr><td>Particle</td><td>PA ( EP 0.3095 NO 0.1819 NU 0.1483 EC 0.1085 VE 0.0513 )</td></tr>
<tr><td>Ordinaliz. Part.</td><td>PO ( NU 0.8621 VE 0.1375 )</td></tr>
<tr><td>Subord. Particle</td><td>PS ( NO 0.6858 AJ 0.0894 )</td></tr>
<tr><td>Prevrb.Sub.Part.</td><td>PT (VE 0.5069 AJ 0.2129 NO 0.1655 CV 0.1092 AD 0.0529 PS 0.0504 )</td></tr>
<tr><td>Pstvrb.Sub.Part.</td><td>PU ( AJ 0.5234 VE 0.1836 NO 0.1256 AD 0.1092 )</td></tr>
<tr><td>Specifier</td><td>SP ( ME 0.6236 NU 0.1553 NO 0.0875 )</td></tr>
<tr><td>Stative Verb</td><td>SV ( NO 0.2955 VE 0.1879 )</td></tr>
<tr><td>Aux. Verb</td><td>VA ( VE 0.6698 AJ 0.1123 CV 0.0839 AD 0.0517 )</td></tr>
<tr><td>Verb</td><td>VE ( NO 0.3315 VE 0.1636 NU 0.0706 EC 0.0624 )</td></tr>
<tr><td>Verb-Complmnt.</td><td>VC ( VE 0.2961 NO 0.1656 )</td></tr>
<tr><td>Verb-Obj.Cmpd.</td><td>VO ( VE 0.2391 )</td></tr>
<tr><td>Period</td><td>EP ( NO 0.6524 VE 0.1531 AD 0.0588 MA 0.0563 )</td></tr>
<tr><td>Comma</td><td>EC ( VE 0.2786 NO 0.2205 AD 0.1555 AJ 0.0731 MA 0.0559 )</td></tr>
<tr><td>Phrase</td><td>PH ( VE 0.4510 AD 0.0750 NO 0.0581)</td></tr>
</table></center>
<p>
This table indicates the probability of one word class being followed
by another. For example:
<ul>
<li>after AD (adverb), VE (verb) is the most probable word class, occurring 67% of the time;
<li>the second most probable one is VA (auxiliary verb), occurring 8% of the time;
<li>CV (coverb, similar to the English preposition), and AD (adverb), are the next, each occurring 7% of the time;
<li>and the other 19 word classes have such a low probability that they are neglected.
</ul>
<B>Operation</B> 
<P>
The neural network will function when the typist presses the space
bar, which serves as a convert key after he finishes typing a
word in <I>pinyin</I> transcription and any homophones are found
after a search of the <B>WinCALIS</B> internal dictionary of <I>pinyin</I>-Chinese
character correspondences<B>. </B>It gets the word class of the
preceding already converted word from the text buffer, does its
calculations, and gets the list of probable following word classes<B>.
</B>Then the word classes of the homophones are compared with
this list<B>. </B>If only one word has the word class belonging
to the list, it is considered as the one the typist wanted as
in automatically converted without any user intervention. For
example, in the sentence cited above,
<P><blockquote>
Wǒ yǒu sān bǎ yàoshi.
我有三把钥匙。
I have three (measure for keys) keys.
<P></blockquote>
in the <B>WinCALIS</B> internal Chinese dictionary, <I>yàoshi</I>
has the entries:
<P><blockquote>
要是 &quot;if&quot; MA (movable adverb)
<br>
钥匙 &quot;key&quot; NO (noun).
<P></blockquote>
Bǎ is a measure which may be followed by NO (noun), VE (verb), AJ
(adjective), but not MA (movable adverb), so the neural network
predicts it is the <I>yàoshi</I> meaning &quot;key&quot; 
钥匙 .
<P>
If more than one match is found, the most probable word will be
compared with the second probable but different word<B>. </B>If
the difference of probability is significant, the neural network
will be sure that the most probable word is the one the typist
wants. Consider the word<I> jìn</I> in the following example:
<P><blockquote>
hěn jìn <br>
很 jìn<br>
<b>very - ? </b>
<P></blockquote>
In the dictionary, the syllable <I>jìn</I> has the word
entries:
<P><blockquote>
进 &quot;come in&quot; VE
<br>
近 &quot;near&quot; AJ
<br>
劲 &quot;energy&quot; NO
<p></blockquote>
Hěn is AS (adstative), so AJ gets the probability of 0.975, but VE
and NO only get 0.1478 and 0.0594, respectively<B>. </B>The difference
is very high, so the neural network is sure it is the <I>jìn</I>
meaning &quot;near&quot; 近 .
<P>
But in another example,
<P><blockquote>
qī qī<br>
七 qī<br>
<b>seven - ?</b>
<P></blockquote>
in the <B>WinCALIS</B> internal dictionary, qī has the word entries:
<P><blockquote>
七 &quot; seven&quot; NU<br>
期 &quot;period&quot; ME
<P></blockquote>
Qī 七 is a NU (number), so it may be followed by both another NU (number)
and ME (measure), and the difference of probability between NU
and ME (which is 0.5017 - 0.3380 = 0.1637) is considered as insignificant<B>.
</B>The neural network will not predict in this situation<B>.
</B>In fact, both qī qī <I>(nián)</I> 七七年  &quot;(the year) seventy-seven&quot;
and <I>(dì) qī qī 第七期 </I>&quot;seven(th) period&quot; are possible.
<P>
<B>Enhancements</B> 
<P>
When we analyzed the neural network's training and operation,
it also suggested to us to regroup the word classes<B>. </B>We
originally had combined all the verbs Vi (verb-intransitive),
Vt (verb-transitive), Va (verb-auxiliary), and Vr (verb-resultative)
in one category VE<B>. </B>Because the word class NO (noun) was
more likely to follow this mega-word class VE (verb) than any
others, when one typed an auxiliary verb+main verb phrase like
<I>néng tīng</I> &quot;can listen,&quot; the neural network
always favored the noun for the second word, choosing the nonsensical 能厅 &quot;can hall,&quot; instead of the desired 能听 &quot;can listen.&quot; We thought about the difference between Va (auxiliary verbs)
and regular verbs and decided to separate the Va from the VE category.
It is wonderful that the neural network confirms that the lists
of probable words after regular VE and Va are totally different<B>.
</B>This is true also for the newly created word class AS (adstative)
from the more general word class AD (adverb), and PT (preverbal
subordinating particle de 地) and PU (postverbal subordinating particle
de 得), differentiated from PS (subordinating particle de 的).
<P>
<B>Conclusions</B> 
<P>
The network usually selects correctly by providing the output
with highest probability in that context: the likelihood that
it will offer a correct choice in second or third place, if first
place is in error, approaches 100%. Considering that the system
allows for twenty other outputs, this result is highly significant.
The network has &quot;learned&quot; the allowable and unallowable
sequences in Chinese syntax and applies that &quot;learning&quot;
to the actual task of typing Chinese in phonetic representation.
<P>
<B>Presenters' Biodata </B>
<P>
Mei Yuan is a visiting researcher at the Humanities Computing
Facility of Duke University.  She received a B.E. degree in Computer
Science from Zhejiang University and an M.S. degree in Computer
Assisted Design from the Shanghai Maritime Institute.  She is
investigating the design and application of back-propagation networks
in Chinese natural language processing.
<P>
Frank L. Borchardt, Ph.D., is Professor of German at Duke University
and Executive Director of <a href="http://agoralang.com:2410/calico.html">CALICO</a>.
<P>
Richard A. Kunst, Ph.D., is a Research Associate in the Duke University
Computer Assisted Language Learning (DUCALL) Project, where he
is developing full support for all the languages of the world
in <B>WinCALIS</B> 2.0.
<P>
<B>Contact Information</B>
<br>The Authors c/o
<p>
<HR><CENTER><A HREF="/index.htm">Home</A> | <A HREF="/sitemap.htm">Site Map</A>
| <A HREF="/hcf/hcf.htm">Services</A> | <A HREF="/whatsnew.htm">New</A>
| <A HREF="/wincalis.htm">WinCALIS</A> | <A HREF="/uniintro.htm">UniEdit</A>
<HR></CENTER>

</body>
</html>