In this article the IM module structer of xcin and the way to add new IM modules are described. ---------------------------------------- A. Internal structer (include/module.h, include/imodule.h, include/cinput.h, module.c): The design goal of module is in a hope that the programmers could be very easy to add new input methods into xcin. It is separated into 2 parts as much as possible: 1. programmer interface (module.h) 2. internal implementation (imodule.h) Programmers could neglect the details of the internal implementation at all. All he has to do is to complete all the fields of "module_t" data structer defined in module.h. But maybe somebody is interested in the internal implementation. Therefore it will be briefly described in this section. The meaning of "load the module or use the module for a input method" in fact includes two stages, as stated in the following example: Suppose that one user presses ctrl+alt+1 to start the cj (Changjie) input method, then xcin will look into its internal "cinput" table for it to start it. The definition of "cinput" is: ========================================================================= typedef struct cinput_s { char *modname; char *objname; imodule_t *inpmod; } cinput_t; cinput_t cinput[MAX_INP_ENTRY]; ========================================================================= Suppose that according to the configuration of rcfile, the cj input method is registered and should use gen_inp module, and with the "setkey" value 1, but that input method is not loaded yet. Then the contents of "cinput" will be cinput[1].modname = "gen_inp"; cinput[1].objname = "cj"; cinput[1].inpmod = NULL; where "inpmod" is actually loaded IM module which is used by cj input method. If its value is not NULL, then it means that input method is already loaded and could be used, so that xcin will use it immediately, otherwise xcin will do the following loading procedures. Firstly xcin will look into the list "tmodule_t *mod_templet" for the "gen_inp" module. The "tmodule_t" is defined in "imodule.h", which is one part if the internal implementation. If it is found, then it will be used. Then xcin will create a "imodule_t" (defined in imodule.h) structer according to the information of "tmodule_t". The created "imodule_t" will have a "conf" data area and a object name "cj", so that the cj input method is ready for usage. And finally "cinput[1].inpmod" will also point to the newly created "imodule_t" structer. If in "mod_templet" the "gen_inp" module is not found, then xcin will use dlopen() call to load the "gen_inp.so" file from the hard disk. It will be added into the "mod_templet" list, and the "cj" input method in "imodule_t" will be created for usage. The important point is that: A module is not an object for operation. It is only a templet and could be used by input methods. A module could be used by many different input methods. Therefore, in the initial definition of module its private data area does not exist. Until it is used by one input method (i.e., an imodule_t is created), then the system will use malloc() call to allocate an area for its data area. The configu- ration in rcfile will be loaded into its data area in this stage (see the following). ---------------------------------------- B. The data structer (include/module.h): Because include/module.h defines the xcin module interface, while include/xcintool.h declares tool functions of xcin, so both the header files are needed for developing new xcin IM modules. In order to encourage everyone to write new IM modules, we open the license restriction for these 2 header files: Although the xcin package as a whole is licensed as GPL, but if anyone wants to write new IM modules of xcin, they are free to use these 2 header files in any style and they will not affect the license status of their works. See for the declaration in CopyRight for details. To add a new IM module, you only have to complete the following data structer and write the corresponding functions: ================================= module.h ============================== typedef struct module_s module_t; struct module_s { char *name; char *version; char *comments; char **valid_objname; enum mtype module_type; int conf_size; int (*init) (void *conf, char *objname, xcin_rc_t *xc); /* called when IM first loaded & initialized. */ int (*xim_init) (void *conf, inpinfo_t *inpinfo); /* called when trigger key occures to switch IM. */ unsigned (*xim_end) (void *conf, inpinfo_t *inpinfo); /* called just before xim_init() to leave IM, not necessary */ unsigned (*keystroke) (void *conf, inpinfo_t *inpinfo, keyinfo_t *keyinfo); /* called to input key code, and output chinese char. */ int (*show_keystroke) (void *conf, simdinfo_t *simdinfo); /* called to show the key stroke */ int (*terminate) (void *conf); /* called when xcin is going to exit. */ }; ============================================================================= Please note that since the module is *used* by the input methods, and the input method status of each IC (Input Context, see the description in "structer" for details) is independent to each other, so the input methods in xcin is just like the servers. They are waiting for the request of xcin and do the appreciated response. Therefore, all the IM modules contain 2 different data areas: The configurations of the input methods, which will enter the module functions via the "void *conf" variable; while the other is the status information of each IC. We do not pass the whole IC into the module functions since the IC structer contains a lot of XIM related details and we hope that the IM modules could be independent to those details. Hence here we only pass "inpinfo_t *inpinfo" (for the function "show_keystroke()" the passed structer is "simdinfo_t *simdinfo") into the module functions. The "inpinfo" sturcter contains the current status of xcin and the input method related information. It is the communication media between xcin and IM modules. 1. Configurations of the input methods: These configurations are read from the rcfile reading system. For example: ============================================================================ (define cj '((SETKEY 1) (AUTO_COMPOSE YES) (AUTO_UPCHAR YES) (AUTO_FULLUP NO) (SPACE_AUTOUP NO) (SELKEY_SHIFT NO) (SPACE_IGNORE NO) (SPACE_RESET YES) (AUTO_RESET NO) (END_KEY NO) (WILD_ENABLE NO) (AUTO_SELECT NO) (SINMD_IN_LINE1 NO) (BEEP_WRONG YES) (BEEP_DUPCHAR YES))) ============================================================================ These configurations are common for all ICs. But because the "cj" input method will use "gen_inp" module, and this module could also be used by other input methods, the input methods other than "cj" will not have the same configurations as "cj". Therefore, the configurations of each input method should be separated. Therefore, we see that in the "module_t" structer there is no field for configuration area. On the other hand it has a "size of the configuration area" filed: "int conf_size". The programmer should assign it, since for each IM module the configuratiion area might be different. XCIN will not allocate an configuration area for the module (the input method object) according to the value of "conf_size" until the module is actually being used. But how do the module functions use this configuration area? One can see that in each function there is a "void *conf" arguement. The configuration area will be passed in via that arguement. In short, if we want to write new modules, we could follow the example shown in the following: ============================================================================ typedef struct { /* the configuration area structer specified */ char *inpname; /* by the IM module */ int setkey; ........... } my_module_datastr_t; int my_module_init(void *conf, char *objname, core_config_t *xc) /* the init() module function */ { my_module_datastr_t *cf = (my_module_datastr_t *)conf; cf->inpname = .....; cf->setkey = .....; } ................. module_t module_ptr = { ...... sizeof(my_module_datastr_t), /* the "conf_size" field */ ...... my_module_init, /* the "init" field */ }; ============================================================================ Please note that the name "module_ptr" is special. It should be used in any case such that when xcin loads the module via dlopen(), each fields of "module_t" could be reached through "module_ptr". 2. The input method status of each IC: The input method status of each IC is maintained by IM modules and xcin system. For each X Window ready to accept the input from xcin, xcin will create an IC for it, and there will be an "inpinfo_t" data structer in it to hold the input method status of the IC: ============================================================================= typedef struct inpinfo_s inpinfo_t; struct inpinfo_s { int imid; /* ID of current IM Context */ void *iccf; /* Internal data of IM for each IC */ char *inp_cname; /* IM Chinese name */ char *inp_ename; /* IM English name */ ubyte_t area3_len; /* Length of area 3 of window (n_char)*/ ubyte_t zh_ascii; /* The zh_ascii mode */ unsigned short xcin_wlen; /* xcin window length */ unsigned guimode; /* GUI mode flag */ ubyte_t keystroke_len; /* # chars of keystroke */ wch_t *s_keystroke; /* keystroke printed in area 3 */ wch_t *suggest_skeystroke; /* keystroke printed in area 3 */ ubyte_t n_selkey; /* # of selection keys */ wch_t *s_selkey; /* the displayed select keys */ unsigned short n_mcch; /* # of chars with the same keystroke */ wch_t *mcch; /* multi-char list */ ubyte_t *mcch_grouping; /* grouping of mcch list */ byte_t mcch_pgstate; /* page state of multi-char */ unsigned short n_lcch; /* # of composed cch list. */ wch_t *lcch; /* composed cch list. */ unsigned short edit_pos; /* editing position in lcch list. */ ubyte_t *lcch_grouping; /* grouping of lcch list */ wch_t cch_publish; /* A published cch. */ char *cch; /* the string for commit. */ }; ============================================================================= This structer will be passed into some special module functions (e.g., the keystroke() function) such that the IM module could join to maintain it. The meaning of each field is as following: imid: The number of the IMC which use this IM module. iccf: Sometimes the IM module should keep the data structers for each IC (In fact it is IMC, since each IC could contain its own IMC or all the ICs could share the same IMC. See "structer" doc for details), then there are 2 approached: One is to maintain the data structer list for each IC in the configuration area (see item 1 above). Then the module could use the value of "imid" to see that which IMC is under operation in the current. Another simpler way is to use the pointer "iccf" to point to the data structer which belongs to the current IMC. Whenever one IMC is under operation, then its corresponding data structer will be there in "iccf". Please note that xcin will not maintain the structer of "iccf" for you. So whenever you want to use this pointer, you should make sure that it is actually points to the data structer you want by yourself. And because "inpinfo" is a common interface for xcin and all the input methods, you have to make sure that everytime when the input method is switched the "iccf" should still point to the correct data sturcter. A simple way for this is to malloc the data structer for "iccf" in xim_init() (see below), and free it in xim_end(). inp_cname: The Chinese name of the input method. inp_ename: The English name of the input method. area3_len: The size of the preediting area (in unit of the number of the English characters). zh_ascii: Currently xcin is under the wide ASCII input mode or not? If yes, its value is 1, otherwise it is 0. xcin_wlen: The currect length of the xcin window. It is set by xcin. guimode: The IM modules could use this to specify the GUI status for display: GUIMOD_SELKEYSPOT: When under the multiple character selection, this setting could inform the GUI system to use the spot light color to display the selection keys. GUIMOD_SINMDLINE1: This system could tell the GUI system to display the "recalling keystrokes" in the original position or in the first line of the xcin main window (the bigger main window). GUIMOD_LISTCHAR: If this is on, then GUI system will print the contents of "inpinfo->lcch" in appreciated position of xcin windows, and the cursor will also appear in the position according to the value of "inpinfo->edit_pos". This is a special design for bimsphone module. When input using this module, the charcters will not go into the XIM client immediately, but remains in the xcin window instead, and the cursor shows the current input position. If this is off, then the same area of the xcin window will be used for multiple characters (phrases) selection list. If in this moment there are contents in "inpinfo->mcch", then they will be printed. keystroke_len: The keystroke length input up to now. s_keystroke: The keystroke input up to now. It will be displayed in the preedit area of xcin window. Please note that xcin will not maintain the contents of this buffer. So the IM modules should maintain it by itself. See the description of "iccf". suggest_skeystroke: The recalling keystroke suggested by the IM module. This field is optional. When the preediting is completed, the IM module could fill the keystroke into this buffer. Then xcin will use its contents to display the recalling keystroke instead of calling show_keystroke() if it finds that the current IM module for recalling keystroke displaying is the same as the IM module under operation. Please note that xcin will not maintain the contents of this buffer. So the IM modules sould maintain it by itself. See the descriptions in "iccf". n_selkey: The number of selection keys for multi-characters selection. s_selkey: The selection keys list. Please note that xcin will not maintain the contents of this buffer. So the IM modules sould maintain it by itself. See the descriptions in "iccf". n_mcch: The number of characters in the buffer: mcch. mcch: The list of multi-characters selection. This length of this list should not be larger than the value of "n_selkey". Please note that xcin will not maintain the contents of this buffer. So the IM modules sould maintain it by itself. mcch_grouping: The group listing of "mcch". If it is NULL, then each character in "mcch" is a distinct item for selection, and the value of "n_mcch" is the number of characters of "mcch". If it is not NULL, the the first element of "mcch_grouping" stands for the total number of items for selection, and the following elements of it stand for the number of characters in each selectable item in "mcch". For example: n_mcch = 9; mcch_grouping[5] = {4, 2, 2, 1, 4} mcch = {ABCDEFGHI} Then according to mcch_grouping[0], there are totally 4 items for selection, and the content of each item is: 1.AB 2.CD 3.E 4.FGHI If the value of "mcch" and "n_mcch" are not changed, but "mcch_grouping" becomes NULL, then the selections will be 1.A 2.B 3.C 4.D 5.E 6.F 7.G 8.H 9.I Therefore, for multi-characters selection, you don't need "mcch_grouping", so you could set it to be NULL. For multi- phrases selection, you could fill the phrases into "mcch" and use "mcch_grouping" to separate each phrases. mcch_pgstate: The current page status of multi-characters selection. The meaning of "one page" is the width which could be displayed in the xcin window entirely. The values could be: MCCH_ONEPG: All the multi-characters could be displayed in one page. MCCH_BEGIN: The multi-characters could not be displayed inside one page. Now it is the 1st page. MCCH_MIDDLE: The multi-characters could not be displayed inside one page. Now it is between the 1st page and the last page. MCCH_END: The multi-characters could not be displayed inside one page. Now it is the last page. n_lcch: The number of characters in the buffer: lcch. lcch: The bimsphone module or other similar modules could store the composed characters inside this buffer. See the description of "guimode -> GUIMOD_LISTCHAR". Please note that xcin will not maintain this buffer. The IM modules should maintain it by themself. edit_pos: The position of the cursor of "lcch". See the description of "guimode -> GUIMOD_LISTCHAR". lcch_grouping: The grouping list of "lcch". This is completely analogy to "mcch_grouping". In the bimsphone or other similar modules, the grouping information could be used to draw underlines of composed characters in the "lcch" buffer to stand for each phrase. For example: A B C D E F G H I --- --- ------- if the contents of "lcch_grouping" is {4,2,2,1,4} and n_lcch=0, then the above underlines will be drew. If it is NULL, then no underlines will be drew. cch_publish: This is the character which is composed OK and could be "published". It will be used for recalling keystroke display. See the following. cch: The string which are ready to be commited to the XIM client. Please note that xcin will not maintain this buffer. So the IM modules should maintain it by themself. ---------------------------------------- C. The description of "module_t": In the following the field with a (*) sign means it should be set, otherwise it is optional (i.e., its value could be 0 or NULL). 1. name (*): Module name. 2. version (*): The module version. Please note that the module version of the system should be date string, e.g., "19990217". When the system loads the module, it will check if the module version is the same as that of the system or not. If not, it will not be loaded (because the module structer might be changed). Therefore, if you want to add a new module into xcin, please refer to the current module definition of a specific xcin version. 3. comments: A brief description of this module. It could be printed via xcin -m <module name> See Usage file for details. 4. valid_objname: Set the valid input method name list which could adapt this module. This last item of this list should be NULL. If it is not set, then the system will assume that this module is adaptable by the input method with the name the same as that of the module. The wild characters "*" or "?" could be used in the name list, for example: {"my_inp", "my_inp_ext_*", "my_inp_ver??", NULL} This means the input methods with the name "my_inp", "my_inp_ext_style1", "my_inp_ext_power", or "my_inp_ver99" .... could adapt this module. 5. module_type (*): Currently xcin only defines one module_type. Hence it should be set to MOD_CINPUT. 6. conf_size (*): The size of the configuration data structer of this module. 7. int (*init) (void *conf, char *objname, xcin_rc_t *xc) (*): The initialization function of this module. It will be called when the module is loaded into xcin system. It should provide all the necessary initialization and should read all the needed configurations from the rcfile. The meaning of its arguements are: conf: pointer to the configuration data structer of this module. objname: the English name of the input method which use this module. xc: the pointer to xcin_rc_t (xcin global data structer), which is useful for the module to obtain some internal information of xcin. It is defined as: typedef struct { char *lc_ctype; /* LC_CTYPE locale category name */ char *lc_messages; /* LC_MESSAGES locale category name */ char *encoding; /* encoding name */ } locale_t; typedef struct { char *rcfile; /* rcfile name. */ char *default_dir; /* Default module directory. */ char *user_dir; /* User data directory. */ locale_t locale; /* Locale name. */ } xcin_rc_t; The return value of this function is True when excuting successfully, or False when excuting false. 8. int (*xim_init) (void *conf, inpinfo_t *inpinfo) (*) This function will be called when a new IC is created (a new XIM client window is ready to accept the input from xcin), or an IC switch its current input method to another one. Then it should do the initialization for this IC and its input method status (i.e., the inpinfo). Please note that since inpinfo is one part of the IC (IMC) data structer, but not modules. All the modules will use it to communicate with xcin. So it is shared by all the modules. Therefore, when a input method is used by an IC, then xcin will call this function, and it should set the initial values of all the fields of inpinfo (since their values are set by the previously used input method, and may not suitable for this input method), and allocate the memories for inpinfo->iccf, inpinfo->s_keystroke, inpinfo->lcch, and inpinfo->cch if necessary. When sucess, this function should return Ture, otherwise should return False. 9. unsigned int (*xim_end) (void *conf, inpinfo_t *inpinfo) (*) When an IC is going to terminate, or this input method is going to be switched out, this function will be called (for the later case, this function will be called before the xim_init() call of the next input method). So we could do some final jobs in this function, e.g., free the buffers inpinfo->iccf, inpinfo->s_keystroke, inpinfo->lcch, and inpinfo->cch if necessary .... etc. The return value of this function is the same as that of the keystroke() function. So it could used for string committing. See below. 10. unsigned int (*keystroke) (void *conf, inpinfo_t *inpinfo, keyinfo_t *keyinfo) (*) This function defines the algorithms to process the keystrokes. When the user type one key via the keyboard, xcin will call this function to handle this key. Then this function should try to explain the meaning of this key, change the window status of xcin if necessary, and return the result to xcin for further processings. The conf and inpinfo is the same as already described, and the definition of keyinfo_t is: typedef struct { KeySym keysym; /* X11 key code. */ unsigned int keystate; /* X11 key state/modifiers */ char keystr[16]; /* X11 key name (in ascii) */ int keystr_len; /* key name length */ } keyinfo_t; The return value of this function could be any bitwise OR combination of the following: IMKEY_ABSORB: The input method absorbs this key quietly. So xcin will not do further processings. This often happens during the character preediting. IMKEY_COMMIT: The input method completes the preediting. Then xcin will commit the characters in "inpinfo->cch" to the XIM client. IMKEY_IGNORE: The key is useless for this input method, so it is not processed here. Then xcin will pass this key to other part for processings. IMKEY_BELL: The input method requests xcin to beep. IMKEY_BELL2: The input method requests xcin to beep in another frequency. Now xcin supports 2 kinds of "beep". Usually they could stand for multi-character selection, or the input error of the user. The input method could support this facility or not depends on its requirements. IMKEY_SHIFTESC: This means the user presses a normal key and a shift key simultaneously, and the input method do not process this key combination. It will inform xcin to pass this key into the wide ASCII/sigle byte ASCII sub-system for handling. IMKEY_SHIFTPHR: This means the user presses a normal key and a shift key simultaneously. But this time it will inform xcin to pass the key combination to the qphrase sub-system for handling. This facility is optional. IMKEY_CTRLPHR: This means the user presses a normal key and a ctrl key simultaneously. Its effect is the same as that of IMKEY_SHIFTPHR. IMKEY_ALTPHR: This means the user presses a normal key and a alt key simultaneously. Its effect is the same as that of IMKEY_SHIFTPHR. IMKEY_FALLBACKPHR: This means the user presses a normal key, but the input method does not process it and wants it be handled by the qphrase subsystem. Its effect is the same as that of IMKEY_SHIFTPHR. 11. int (*show_keystroke) (void *conf, simdinfo_t *simdinfo); This function is use to recall the keystroke of the characters. xcin will call it when needed. The definition of simdinfo_t is: typedef struct { int imid; /* ID of current Input Context */ unsigned short xcin_wlen; /* xcin window length */ unsigned short guimode; /* GUI mode flag */ wch_t cch_publish; /* A published cch. */ wch_t *s_keystroke; /* keystroke of cch_publish returned */ } simdinfo_t; The cch_publish is the character which xcin queries for the keystroke. And its keystroke will be returned to xcin via s_keystroke field. Please note that xcin will not maintain the s_keystroke buffer, so this function should maintain it by itself. A simple method is to declare a static buffer for it, for example: static wch_t my_keystroke[BUF_SIZE]; simdinfo->s_keystroke = my_keystroke; ......... 12. int (*terminate) (void *conf); When xcin is going to unload this input method (e.g., xcin is going to terminate), if this function is defined, then xcin will call this function to do some final jobs, e.g., close the data files .... etc. ---------------------------------------- D. Initialization of the IM modules: There are 2 states of the module initialization. One is when it is loaded and going to be used. In this state all the data structer of the IM module should be initialized. Another state is whenever the user switch to this input method, then the inpinfo_t structer should be initialized here. These 2 state of initializations will be handled in "int (*init)()" and "int (*xim_init)()" functions of the module. As described before, since inpinfo_t is the communication media between xcin and the IM modules, and it belongs to the IMC structer. Therefore, whenever the IMC changes its input method, the inpinfo_t structer will be used by the new input method. Therefore, whenever the input method is switched for using, it should always initialize the contents of inpinfo_t. For the first state, the initialization works includes: 1. Determine the input method name: Because beginning from xcin-2.5.2, the xcinrc can support the <IM_name>@<encoding> format as the input method name, e.g., "cj@big5" means the "cj" input method for Big5 encoding. Therefore, the "objname" arguement of int (*init) (void *conf, char *objname, xcin_rc_t *xc); will probably be different to the actual input method name. Therefore we have to call this function: int get_objenc(char *objname, objenc_t *objenc); where its "objname" is the "objname" arguement of (*init)() function. This function will return the objenc_t structer with the following definition: typedef struct { char objname[50]; /* input method name */ char encoding[50]; /* encoding name */ char objloadname[100]; /* input method name appeared in xcinrc (i.e., <IM_name>@<encoding> */ } objenc_t; 2. Read the configurations of xcinrc: We can use the function int get_resource(char **cmd_list, char *value, int v_size, int n_cmd_list) to read the configurations of xcinrc. The "cmd_list" represents the section labels in xcinrc. Please note that the configuratioins of a specific input method usually appears in the section of that input method. Therefore the "cmd_list[0]" should be set to the name of the input method. For example, if we want to read the AUTO_COMPOSE option of the "cj" input method, which appears in xcinrc with the format: (define cj '((AUTO_COMPOSE 1) .................)) then we should call get_resource() in the following way: char *cmd[2], value[256]; cmd[0] = "cj"; cmd[1] = "AUTO_COMPOSE"; if (get_resource(cmd, value, sizeof(value), 2)) { /* we have read the value of AUTO_COMPOSE */ } However, since an IM module could be used by many kinds of input methods, so the cmd[0] above could not hard code to be "cj". But we could not use objname appeared above either, because the configuration of xcinrc could be: (define cj@big5 '((AUTO_COMPOSE 1) .................)) Therefore, the correct way is use get_objenc() to obtain is return value and use it: objenc_t objenc; ........ ....... if (get_objenc(objname, &objenc) == -1) return False; cmd[0] = objenc.objloadname; ............................ So, the configuration of xcinrc could be read correctly. T.H.Hsieh