Introduction
This paper describes: the advantages of Unicode, the WinCALIS
2.0 authoring system and the capabilities of the Unicode-based
text editor, a main part of the WinCALIS 2.0 product.
Unicode and Unicode based applications:
Based on current systems, four different classes of language support
are currently possible. English-only, variable monolingual, restricted
multilingual and unrestricted multilingual. The last one is truly
transparent to language and provides support for all languages.
A system or application of this type might not necessarily support
some particular language; however it will have the capability
of doing so. The goal of Unicode is to support this class of language
support.
Today, most localized or internationalized systems support a restricted number of languages. This resulted in expanding profusion of character sets, each limited to one or a small number of languages. The choice of a particular character set becomes problematic when a number of languages are to be supported at once; if one wishes to support English, French and Arabic at the same time, then none of the character sets depicted here, except Unicode can represent all three at once. Another important design goal of Unicode is to be able to adequately represent all languages, not just modern written languages but also, as space allows, all archaic languages as well, simultaneously.
Unicode provides a uniform, fixed width character encoding of 16-bits per character, which overcomes some serious shortcomings in variable width encoding techniques. Moreover, in contrast to character encoding techniques which rely on state information in order to determine what character a particular encoding represents, Unicode uniquely associates codes with one and only one character. Consequently, given a particular encoding value, its identity as a Unicode character is never in question.
Unicode, because of its flat, linear 16-bit encoding space for assigning codes to characters, can access up to 65,536 character codes. The Unicode encoding space may be divided into four general linear sections or zones, which follow one another. The first zone contains all general alphabetic, punctuation and symbol characters. The ideographic zone (I-zone) which immediately follows contains the Han ideographic characters. Following this are the O-zone, reserved for future open use, and the R-zone, restricted to private and compatibility characters.
The largest section of the Unicode Standard consists of block-by-block descriptions of characters. Each description provides the encoding value, a representative image of the character when rendered visible, a formal name, and additional annotations which identify alternative names or which indicate cross references to other similar (or dissimilar) Unicode characters.
To summarize, Unicode is designed to serve as the primary mechanism for representing all the world's text. To accomplish this, it defines a standard repertoire of characters, an encoding for these characters, and a plain text format to be used to represent content alone. Unicode can be used both for external interchange and for internal processing, thus obviating the need for multiple string formats. The use of a single file format is very important for supporting both efficient processing and interchange of textual information.
A Unicode-based system of application will use Unicode characters to represent text in perhaps both plain and rich text formats. These representations will serve as the fundamental core of text processing for these systems. Surrounding this core will be input, output (display), processing, and interchange components. WinCALIS 2.0 application , the topic of the next section, is such a system.
CALIS and WinCALIS 2.0
CALIS, a text-based Computer Assisted Language Instruction System,
combines the most innovative concepts of Computer Assisted Learning
with the soundest pedagogical principles to equip language teachers
with a stimulating educational tool. With CALIS, one can prepare
language lessons which supplement classroom learning, i.e. create
exercises to be administered, corrected and scored individually
for each student.
WinCALIS, a Microsoft Windows based CALIS, takes full advantage of what Windows 3.x does well: mouse input, color graphics, multimedia and comprehensive online help. Thus, making successful language education just a mouse click away.
WinCALIS 2.0, the latest and yet to be released version of WinCALIS, belongs to an unrestricted multilingual class of language support and hence, provides access to a rich set of language and script collection. Moreover, WinCALIS 2.0 being Unicode-based, inherits all the advantages of Unicode.
WinCALIS 2.0 comprises of two main modules: the WinCALIS program itself and the WinCALIS Author. The WinCALIS component is that part of the WinCALIS product which actually runs the lessons. The lessons are based on a scripting language called CALIS programming language. The real power of the WinCALIS authoring program, WinCALIS Author, is the ease and speed with which an author can make scripts without directly using the CALIS language at all. An author provides WinCALIS Author with the text, directions, questions, screen layout, etc. and it automatically writes the script for him or her. Of course, very advanced users may still want to tinker with a script, using one of the multilingual editors in the Author program, to make changes or additions. In the previous version of WinCALIS the scripts consisted of pure ASCII text written in the CALIS language. WinCALIS 2.0 uses the same CALIS language but the scripts are made up of Unicode text.
Multilingual Editor
Most contemporary text editors are based on ASCII file format
thus, limiting the number of languages that can be supported at
once to a very few. The new 2-byte editor called Multilingual
Editor, an integral part of WinCALIS 2.0 product, will overcome
this limitation thereby giving authors flexibility, capability
and sophistication to create and edit scripts having characters
from multiple languages. This kind of flexibility is obtained
because of the functional dependence of the editor on the Unicode
character set.
The multilingual editor is just like any other editor when it comes the normal editing features, like cut, copy, paste, selection of text etc., but has much more to offer when it comes to dealing with multiple languages. The language menu at the top of the WinCALIS program (both modules) makes dynamic switching of languages very easy. Because of its open-ended design, the editor can handle any number of languages. Authors can therefore have an entire repertoire of language collection at their finger tips when it comes to scripting and are no longer limited to just one or two languages. The best part of the multilingual editor is that it can deal with both, left to right and right to left languages. It automatically takes care of some semantic details of most languages and has elaborate features for entering accented characters, for treatment of non-spacing marks etc. There are many software packages capable of handling these special characters but none of them, excepting WinCALIS 2.0 multilingual editor, is capable of handling these in Unicode, in an authoring environment.
Presenter's biodata
Nagendra Revanur was formerly Associate in Research at the Humanities Computing
Facility, Duke University, North Carolina.