This paper describes: the advantages of Unicode, the WinCALIS 2.0 authoring system and the capabilities of the Unicode-based text editor, a main part of the WinCALIS 2.0 product.
Unicode and Unicode based applications:
Based on current systems, four different classes of language support are currently possible. English-only, variable monolingual, restricted multilingual and unrestricted multilingual. The last one is truly transparent to language and provides support for all languages. A system or application of this type might not necessarily support some particular language; however it will have the capability of doing so. The goal of Unicode is to support this class of language support.
Today, most localized or internationalized systems support a restricted number of languages. This resulted in expanding profusion of character sets, each limited to one or a small number of languages. The choice of a particular character set becomes problematic when a number of languages are to be supported at once; if one wishes to support English, French and Arabic at the same time, then none of the character sets depicted here, except Unicode can represent all three at once. Another important design goal of Unicode is to be able to adequately represent all languages, not just modern written languages but also, as space allows, all archaic languages as well, simultaneously.
Unicode provides a uniform, fixed width character encoding of 16-bits per character, which overcomes some serious shortcomings in variable width encoding techniques. Moreover, in contrast to character encoding techniques which rely on state information in order to determine what character a particular encoding represents, Unicode uniquely associates codes with one and only one character. Consequently, given a particular encoding value, its identity as a Unicode character is never in question.
Unicode, because of its flat, linear 16-bit encoding space for assigning codes to characters, can access up to 65,536 character codes. The Unicode encoding space may be divided into four general linear sections or zones, which follow one another. The first zone contains all general alphabetic, punctuation and symbol characters. The ideographic zone (I-zone) which immediately follows contains the Han ideographic characters. Following this are the O-zone, reserved for future open use, and the R-zone, restricted to private and compatibility characters.
The largest section of the Unicode Standard consists of block-by-block descriptions of characters. Each description provides the encoding value, a representative image of the character when rendered visible, a formal name, and additional annotations which identify alternative names or which indicate cross references to other similar (or dissimilar) Unicode characters.
To summarize, Unicode is designed to serve as the primary mechanism for representing all the world's text. To accomplish this, it defines a standard repertoire of characters, an encoding for these characters, and a plain text format to be used to represent content alone. Unicode can be used both for external interchange and for internal processing, thus obviating the need for multiple string formats. The use of a single file format is very important for supporting both efficient processing and interchange of textual information.
A Unicode-based system of application will use Unicode characters to represent text in perhaps both plain and rich text formats. These representations will serve as the fundamental core of text processing for these systems. Surrounding this core will be input, output (display), processing, and interchange components. WinCALIS 2.0 application , the topic of the next section, is such a system.
CALIS and WinCALIS 2.0
CALIS, a text-based Computer Assisted Language Instruction System, combines the most innovative concepts of Computer Assisted Learning with the soundest pedagogical principles to equip language teachers with a stimulating educational tool. With CALIS, one can prepare language lessons which supplement classroom learning, i.e. create exercises to be administered, corrected and scored individually for each student.
WinCALIS, a Microsoft Windows based CALIS, takes full advantage of what Windows 3.x does well: mouse input, color graphics, multimedia and comprehensive online help. Thus, making successful language education just a mouse click away.
WinCALIS 2.0, the latest and yet to be released version of WinCALIS, belongs to an unrestricted multilingual class of language support and hence, provides access to a rich set of language and script collection. Moreover, WinCALIS 2.0 being Unicode-based, inherits all the advantages of Unicode.
WinCALIS 2.0 comprises of two main modules: the WinCALIS program itself and the WinCALIS Author. The WinCALIS component is that part of the WinCALIS product which actually runs the lessons. The lessons are based on a scripting language called CALIS programming language. The real power of the WinCALIS authoring program, WinCALIS Author, is the ease and speed with which an author can make scripts without directly using the CALIS language at all. An author provides WinCALIS Author with the text, directions, questions, screen layout, etc. and it automatically writes the script for him or her. Of course, very advanced users may still want to tinker with a script, using one of the multilingual editors in the Author program, to make changes or additions. In the previous version of WinCALIS the scripts consisted of pure ASCII text written in the CALIS language. WinCALIS 2.0 uses the same CALIS language but the scripts are made up of Unicode text.
Most contemporary text editors are based on ASCII file format thus, limiting the number of languages that can be supported at once to a very few. The new 2-byte editor called Multilingual Editor, an integral part of WinCALIS 2.0 product, will overcome this limitation thereby giving authors flexibility, capability and sophistication to create and edit scripts having characters from multiple languages. This kind of flexibility is obtained because of the functional dependence of the editor on the Unicode character set.
The multilingual editor is just like any other editor when it comes the normal editing features, like cut, copy, paste, selection of text etc., but has much more to offer when it comes to dealing with multiple languages. The language menu at the top of the WinCALIS program (both modules) makes dynamic switching of languages very easy. Because of its open-ended design, the editor can handle any number of languages. Authors can therefore have an entire repertoire of language collection at their finger tips when it comes to scripting and are no longer limited to just one or two languages. The best part of the multilingual editor is that it can deal with both, left to right and right to left languages. It automatically takes care of some semantic details of most languages and has elaborate features for entering accented characters, for treatment of non-spacing marks etc. There are many software packages capable of handling these special characters but none of them, excepting WinCALIS 2.0 multilingual editor, is capable of handling these in Unicode, in an authoring environment.
Nagendra Revanur was formerly Associate in Research at the Humanities Computing Facility, Duke University, North Carolina.