(linenum→info "unix/slp.c:2238")

emacs/22.1/src/coding.c

    1: /* Coding system handler (conversion, detection, and etc).
    2:    Copyright (C) 2001, 2002, 2003, 2004, 2005,
    3:                  2006, 2007 Free Software Foundation, Inc.
    4:    Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
    5:      2005, 2006, 2007
    6:      National Institute of Advanced Industrial Science and Technology (AIST)
    7:      Registration Number H14PRO021
    8: 
    9: This file is part of GNU Emacs.
   10: 
   11: GNU Emacs is free software; you can redistribute it and/or modify
   12: it under the terms of the GNU General Public License as published by
   13: the Free Software Foundation; either version 2, or (at your option)
   14: any later version.
   15: 
   16: GNU Emacs is distributed in the hope that it will be useful,
   17: but WITHOUT ANY WARRANTY; without even the implied warranty of
   18: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   19: GNU General Public License for more details.
   20: 
   21: You should have received a copy of the GNU General Public License
   22: along with GNU Emacs; see the file COPYING.  If not, write to
   23: the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
   24: Boston, MA 02110-1301, USA.  */
   25: 
   26: /*** TABLE OF CONTENTS ***
   27: 
   28:   0. General comments
   29:   1. Preamble
   30:   2. Emacs' internal format (emacs-mule) handlers
   31:   3. ISO2022 handlers
   32:   4. Shift-JIS and BIG5 handlers
   33:   5. CCL handlers
   34:   6. End-of-line handlers
   35:   7. C library functions
   36:   8. Emacs Lisp library functions
   37:   9. Post-amble
   38: 
   39: */
   40: 
   41: /*** 0. General comments ***/
   42: 
   43: 
   44: /*** GENERAL NOTE on CODING SYSTEMS ***
   45: 
   46:   A coding system is an encoding mechanism for one or more character
   47:   sets.  Here's a list of coding systems which Emacs can handle.  When
   48:   we say "decode", it means converting some other coding system to
   49:   Emacs' internal format (emacs-mule), and when we say "encode",
   50:   it means converting the coding system emacs-mule to some other
   51:   coding system.
   52: 
   53:   0. Emacs' internal format (emacs-mule)
   54: 
   55:   Emacs itself holds a multi-lingual character in buffers and strings
   56:   in a special format.  Details are described in section 2.
   57: 
   58:   1. ISO2022
   59: 
   60:   The most famous coding system for multiple character sets.  X's
   61:   Compound Text, various EUCs (Extended Unix Code), and coding
   62:   systems used in Internet communication such as ISO-2022-JP are
   63:   all variants of ISO2022.  Details are described in section 3.
   64: 
   65:   2. SJIS (or Shift-JIS or MS-Kanji-Code)
   66: 
   67:   A coding system to encode character sets: ASCII, JISX0201, and
   68:   JISX0208.  Widely used for PC's in Japan.  Details are described in
   69:   section 4.
   70: 
   71:   3. BIG5
   72: 
   73:   A coding system to encode the character sets ASCII and Big5.  Widely
   74:   used for Chinese (mainly in Taiwan and Hong Kong).  Details are
   75:   described in section 4.  In this file, when we write "BIG5"
   76:   (all uppercase), we mean the coding system, and when we write
   77:   "Big5" (capitalized), we mean the character set.
   78: 
   79:   4. Raw text
   80: 
   81:   A coding system for text containing random 8-bit code.  Emacs does
   82:   no code conversion on such text except for end-of-line format.
   83: 
   84:   5. Other
   85: 
   86:   If a user wants to read/write text encoded in a coding system not
   87:   listed above, he can supply a decoder and an encoder for it as CCL
   88:   (Code Conversion Language) programs.  Emacs executes the CCL program
   89:   while reading/writing.
   90: 
   91:   Emacs represents a coding system by a Lisp symbol that has a property
   92:   `coding-system'.  But, before actually using the coding system, the
   93:   information about it is set in a structure of type `struct
   94:   coding_system' for rapid processing.  See section 6 for more details.
   95: 
   96: */
   97: 
   98: /*** GENERAL NOTES on END-OF-LINE FORMAT ***
   99: 
  100:   How end-of-line of text is encoded depends on the operating system.
  101:   For instance, Unix's format is just one byte of `line-feed' code,
  102:   whereas DOS's format is two-byte sequence of `carriage-return' and
  103:   `line-feed' codes.  MacOS's format is usually one byte of
  104:   `carriage-return'.
  105: 
  106:   Since text character encoding and end-of-line encoding are
  107:   independent, any coding system described above can have any
  108:   end-of-line format.  So Emacs has information about end-of-line
  109:   format in each coding-system.  See section 6 for more details.
  110: 
  111: */
  112: 
  113: /*** GENERAL NOTES on `detect_coding_XXX ()' functions ***
  114: 
  115:   These functions check if a text between SRC and SRC_END is encoded
  116:   in the coding system category XXX.  Each returns an integer value in
  117:   which appropriate flag bits for the category XXX are set.  The flag
  118:   bits are defined in macros CODING_CATEGORY_MASK_XXX.  Below is the
  119:   template for these functions.  If MULTIBYTEP is nonzero, 8-bit codes
  120:   of the range 0x80..0x9F are in multibyte form.  */
  121: #if 0
  122: int
  123: detect_coding_emacs_mule (src, src_end, multibytep)
  124:      unsigned char *src, *src_end;
  125:      int multibytep;
  126: {
  127:   ...
  128: }
  129: #endif
  130: 
  131: /*** GENERAL NOTES on `decode_coding_XXX ()' functions ***
  132: 
  133:   These functions decode SRC_BYTES length of unibyte text at SOURCE
  134:   encoded in CODING to Emacs' internal format.  The resulting
  135:   multibyte text goes to a place pointed to by DESTINATION, the length
  136:   of which should not exceed DST_BYTES.
  137: 
  138:   These functions set the information about original and decoded texts
  139:   in the members `produced', `produced_char', `consumed', and
  140:   `consumed_char' of the structure *CODING.  They also set the member
  141:   `result' to one of CODING_FINISH_XXX indicating how the decoding
  142:   finished.
  143: 
  144:   DST_BYTES zero means that the source area and destination area are
  145:   overlapped, which means that we can produce a decoded text until it
  146:   reaches the head of the not-yet-decoded source text.
  147: 
  148:   Below is a template for these functions.  */
  149: #if 0
  150: static void
  151: decode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
  152:      struct coding_system *coding;
  153:      const unsigned char *source;
  154:      unsigned char *destination;
  155:      int src_bytes, dst_bytes;
  156: {
  157:   ...
  158: }
  159: #endif
  160: 
  161: /*** GENERAL NOTES on `encode_coding_XXX ()' functions ***
  162: 
  163:   These functions encode SRC_BYTES length text at SOURCE from Emacs'
  164:   internal multibyte format to CODING.  The resulting unibyte text
  165:   goes to a place pointed to by DESTINATION, the length of which
  166:   should not exceed DST_BYTES.
  167: 
  168:   These functions set the information about original and encoded texts
  169:   in the members `produced', `produced_char', `consumed', and
  170:   `consumed_char' of the structure *CODING.  They also set the member
  171:   `result' to one of CODING_FINISH_XXX indicating how the encoding
  172:   finished.
  173: 
  174:   DST_BYTES zero means that the source area and destination area are
  175:   overlapped, which means that we can produce encoded text until it
  176:   reaches at the head of the not-yet-encoded source text.
  177: 
  178:   Below is a template for these functions.  */
  179: #if 0
  180: static void
  181: encode_coding_XXX (coding, source, destination, src_bytes, dst_bytes)
  182:      struct coding_system *coding;
  183:      unsigned char *source, *destination;
  184:      int src_bytes, dst_bytes;
  185: {
  186:   ...
  187: }
  188: #endif
  189: 
  190: /*** COMMONLY USED MACROS ***/
  191: 
  192: /* The following two macros ONE_MORE_BYTE and TWO_MORE_BYTES safely
  193:    get one, two, and three bytes from the source text respectively.
  194:    If there are not enough bytes in the source, they jump to
  195:    `label_end_of_loop'.  The caller should set variables `coding',
  196:    `src' and `src_end' to appropriate pointer in advance.  These
  197:    macros are called from decoding routines `decode_coding_XXX', thus
  198:    it is assumed that the source text is unibyte.  */
  199: 
  200: #define ONE_MORE_BYTE(c1)                                       \
  201:   do {                                                          \
  202:     if (src >= src_end)                                         \
  203:       {                                                         \
  204:         coding->result = CODING_FINISH_INSUFFICIENT_SRC;       \
  205:         goto label_end_of_loop;                                        \
  206:       }                                                         \
  207:     c1 = *src++;                                                \
  208:   } while (0)
  209: 
  210: #define TWO_MORE_BYTES(c1, c2)                                  \
  211:   do {                                                          \
  212:     if (src + 1 >= src_end)                                     \
  213:       {                                                         \
  214:         coding->result = CODING_FINISH_INSUFFICIENT_SRC;       \
  215:         goto label_end_of_loop;                                        \
  216:       }                                                         \
  217:     c1 = *src++;                                                \
  218:     c2 = *src++;                                                \
  219:   } while (0)
  220: 
  221: 
  222: /* Like ONE_MORE_BYTE, but 8-bit bytes of data at SRC are in multibyte
  223:    form if MULTIBYTEP is nonzero.  In addition, if SRC is not less
  224:    than SRC_END, return with RET.  */
  225: 
  226: #define ONE_MORE_BYTE_CHECK_MULTIBYTE(c1, multibytep, ret)      \
  227:   do {                                                          \
  228:     if (src >= src_end)                                         \
  229:       {                                                         \
  230:         coding->result = CODING_FINISH_INSUFFICIENT_SRC;       \
  231:         return ret;                                            \
  232:       }                                                         \
  233:     c1 = *src++;                                                \
  234:     if (multibytep && c1 == LEADING_CODE_8_BIT_CONTROL)         \
  235:       c1 = *src++ - 0x20;                                       \
  236:   } while (0)
  237: 
  238: /* Set C to the next character at the source text pointed by `src'.
  239:    If there are not enough characters in the source, jump to
  240:    `label_end_of_loop'.  The caller should set variables `coding'
  241:    `src', `src_end', and `translation_table' to appropriate pointers
  242:    in advance.  This macro is used in encoding routines
  243:    `encode_coding_XXX', thus it assumes that the source text is in
  244:    multibyte form except for 8-bit characters.  8-bit characters are
  245:    in multibyte form if coding->src_multibyte is nonzero, else they
  246:    are represented by a single byte.  */
  247: 
  248: #define ONE_MORE_CHAR(c)                                        \
  249:   do {                                                          \
  250:     int len = src_end - src;                                    \
  251:     int bytes;                                                  \
  252:     if (len <= 0)                                               \
  253:       {                                                         \
  254:         coding->result = CODING_FINISH_INSUFFICIENT_SRC;       \
  255:         goto label_end_of_loop;                                        \
  256:       }                                                         \
  257:     if (coding->src_multibyte                                   \
  258:         || UNIBYTE_STR_AS_MULTIBYTE_P (src, len, bytes))       \
  259:       c = STRING_CHAR_AND_LENGTH (src, len, bytes);             \
  260:     else                                                        \
  261:       c = *src, bytes = 1;                                      \
  262:     if (!NILP (translation_table))                              \
  263:       c = translate_char (translation_table, c, -1, 0, 0);      \
  264:     src += bytes;                                               \
  265:   } while (0)
  266: 
  267: 
  268: /* Produce a multibyte form of character C to `dst'.  Jump to
  269:    `label_end_of_loop' if there's not enough space at `dst'.
  270: 
  271:    If we are now in the middle of a composition sequence, the decoded
  272:    character may be ALTCHAR (for the current composition).  In that
  273:    case, the character goes to coding->cmp_data->data instead of
  274:    `dst'.
  275: 
  276:    This macro is used in decoding routines.  */
  277: 
  278: #define EMIT_CHAR(c)                                                    \
  279:   do {                                                                  \
  280:     if (! COMPOSING_P (coding)                                          \
  281:         || coding->composing == COMPOSITION_RELATIVE                   \
  282:         || coding->composing == COMPOSITION_WITH_RULE)                 \
  283:       {                                                                 \
  284:         int bytes = CHAR_BYTES (c);                                    \
  285:         if ((dst + bytes) > (dst_bytes ? dst_end : src))               \
  286:           {                                                            \
  287:             coding->result = CODING_FINISH_INSUFFICIENT_DST;           \
  288:             goto label_end_of_loop;                                    \
  289:           }                                                            \
  290:         dst += CHAR_STRING (c, dst);                                   \
  291:         coding->produced_char++;                                       \
  292:       }                                                                 \
  293:                                                                         \
  294:     if (COMPOSING_P (coding)                                            \
  295:         && coding->composing != COMPOSITION_RELATIVE)                  \
  296:       {                                                                 \
  297:         CODING_ADD_COMPOSITION_COMPONENT (coding, c);                  \
  298:         coding->composition_rule_follows                               \
  299:           = coding->composing != COMPOSITION_WITH_ALTCHARS;            \
  300:       }                                                                 \
  301:   } while (0)
  302: 
  303: 
  304: #define EMIT_ONE_BYTE(c)                                        \
  305:   do {                                                          \
  306:     if (dst >= (dst_bytes ? dst_end : src))                     \
  307:       {                                                         \
  308:         coding->result = CODING_FINISH_INSUFFICIENT_DST;       \
  309:         goto label_end_of_loop;                                        \
  310:       }                                                         \
  311:     *dst++ = c;                                                 \
  312:   } while (0)
  313: 
  314: #define EMIT_TWO_BYTES(c1, c2)                                  \
  315:   do {                                                          \
  316:     if (dst + 2 > (dst_bytes ? dst_end : src))                  \
  317:       {                                                         \
  318:         coding->result = CODING_FINISH_INSUFFICIENT_DST;       \
  319:         goto label_end_of_loop;                                        \
  320:       }                                                         \
  321:     *dst++ = c1, *dst++ = c2;                                   \
  322:   } while (0)
  323: 
  324: #define EMIT_BYTES(from, to)                                    \
  325:   do {                                                          \
  326:     if (dst + (to - from) > (dst_bytes ? dst_end : src))        \
  327:       {                                                         \
  328:         coding->result = CODING_FINISH_INSUFFICIENT_DST;       \
  329:         goto label_end_of_loop;                                        \
  330:       }                                                         \
  331:     while (from < to)                                           \
  332:       *dst++ = *from++;                                         \
  333:   } while (0)
  334: 
  335: ^L
  336: /*** 1. Preamble ***/
  337: 
  338: #ifdef emacs
  339: #include <config.h>
  340: #endif
  341: 
  342: #include <stdio.h>
  343: 
  344: #ifdef emacs
  345: 
  346: #include "lisp.h"
  347: #include "buffer.h"
  348: #include "charset.h"
  349: #include "composite.h"
  350: #include "ccl.h"
  351: #include "coding.h"
  352: #include "window.h"
  353: #include "intervals.h"
  354: 
  355: #else  /* not emacs */
  356: 
  357: #include "mulelib.h"
  358: 
  359: #endif /* not emacs */
  360: 
  361: Lisp_Object Qcoding_system, Qeol_type;
  362: Lisp_Object Qbuffer_file_coding_system;
  363: Lisp_Object Qpost_read_conversion, Qpre_write_conversion;
  364: Lisp_Object Qno_conversion, Qundecided;
  365: Lisp_Object Qcoding_system_history;
  366: Lisp_Object Qsafe_chars;
  367: Lisp_Object Qvalid_codes;
  368: Lisp_Object Qascii_incompatible;
  369: 
  370: extern Lisp_Object Qinsert_file_contents, Qwrite_region;
  371: Lisp_Object Qcall_process, Qcall_process_region;
  372: Lisp_Object Qstart_process, Qopen_network_stream;
  373: Lisp_Object Qtarget_idx;
  374: 
  375: /* If a symbol has this property, evaluate the value to define the
  376:    symbol as a coding system.  */
  377: Lisp_Object Qcoding_system_define_form;
  378: 
  379: Lisp_Object Vselect_safe_coding_system_function;
  380: 
  381: int coding_system_require_warning;
  382: 
  383: /* Mnemonic string for each format of end-of-line.  */
  384: Lisp_Object eol_mnemonic_unix, eol_mnemonic_dos, eol_mnemonic_mac;
  385: /* Mnemonic string to indicate format of end-of-line is not yet
  386:    decided.  */
  387: Lisp_Object eol_mnemonic_undecided;
  388: 
  389: /* Format of end-of-line decided by system.  This is CODING_EOL_LF on
  390:    Unix, CODING_EOL_CRLF on DOS/Windows, and CODING_EOL_CR on Mac.
  391:    This has an effect only for external encoding (i.e. for output to
  392:    file and process), not for in-buffer or Lisp string encoding.  */
  393: int system_eol_type;
  394: 
  395: #ifdef emacs
  396: 
  397: /* Information about which coding system is safe for which chars.
  398:    The value has the form (GENERIC-LIST . NON-GENERIC-ALIST).
  399: 
  400:    GENERIC-LIST is a list of generic coding systems which can encode
  401:    any characters.
  402: 
  403:    NON-GENERIC-ALIST is an alist of non generic coding systems vs the
  404:    corresponding char table that contains safe chars.  */
  405: Lisp_Object Vcoding_system_safe_chars;
  406: 
  407: Lisp_Object Vcoding_system_list, Vcoding_system_alist;
  408: 
  409: Lisp_Object Qcoding_system_p, Qcoding_system_error;
  410: 
  411: /* Coding system emacs-mule and raw-text are for converting only
  412:    end-of-line format.  */
  413: Lisp_Object Qemacs_mule, Qraw_text;
  414: 
  415: Lisp_Object Qutf_8;
  416: 
  417: /* Coding-systems are handed between Emacs Lisp programs and C internal
  418:    routines by the following three variables.  */
  419: /* Coding-system for reading files and receiving data from process.  */
  420: Lisp_Object Vcoding_system_for_read;
  421: /* Coding-system for writing files and sending data to process.  */
  422: Lisp_Object Vcoding_system_for_write;
  423: /* Coding-system actually used in the latest I/O.  */
  424: Lisp_Object Vlast_coding_system_used;
  425: 
  426: /* A vector of length 256 which contains information about special
  427:    Latin codes (especially for dealing with Microsoft codes).  */
  428: Lisp_Object Vlatin_extra_code_table;
  429: 
  430: /* Flag to inhibit code conversion of end-of-line format.  */
  431: int inhibit_eol_conversion;
  432: 
  433: /* Flag to inhibit ISO2022 escape sequence detection.  */
  434: int inhibit_iso_escape_detection;
  435: 
  436: /* Flag to make buffer-file-coding-system inherit from process-coding.  */
  437: int inherit_process_coding_system;
  438: 
  439: /* Coding system to be used to encode text for terminal display.  */
  440: struct coding_system terminal_coding;
  441: 
  442: /* Coding system to be used to encode text for terminal display when
  443:    terminal coding system is nil.  */
  444: struct coding_system safe_terminal_coding;
  445: 
  446: /* Coding system of what is sent from terminal keyboard.  */
  447: struct coding_system keyboard_coding;
  448: 
  449: /* Default coding system to be used to write a file.  */
  450: struct coding_system default_buffer_file_coding;
  451: 
  452: Lisp_Object Vfile_coding_system_alist;
  453: Lisp_Object Vprocess_coding_system_alist;
  454: Lisp_Object Vnetwork_coding_system_alist;
  455: 
  456: Lisp_Object Vlocale_coding_system;
  457: 
  458: #endif /* emacs */
  459: 
  460: Lisp_Object Qcoding_category, Qcoding_category_index;
  461: 
  462: /* List of symbols `coding-category-xxx' ordered by priority.  */
  463: Lisp_Object Vcoding_category_list;
  464: 
  465: /* Table of coding categories (Lisp symbols).  */
  466: Lisp_Object Vcoding_category_table;
  467: 
  468: /* Table of names of symbol for each coding-category.  */
  469: char *coding_category_name[CODING_CATEGORY_IDX_MAX] = {
  470:   "coding-category-emacs-mule",
  471:   "coding-category-sjis",
  472:   "coding-category-iso-7",
  473:   "coding-category-iso-7-tight",
  474:   "coding-category-iso-8-1",
  475:   "coding-category-iso-8-2",
  476:   "coding-category-iso-7-else",
  477:   "coding-category-iso-8-else",
  478:   "coding-category-ccl",
  479:   "coding-category-big5",
  480:   "coding-category-utf-8",
  481:   "coding-category-utf-16-be",
  482:   "coding-category-utf-16-le",
  483:   "coding-category-raw-text",
  484:   "coding-category-binary"
  485: };
  486: 
  487: /* Table of pointers to coding systems corresponding to each coding
  488:    categories.  */
  489: struct coding_system *coding_system_table[CODING_CATEGORY_IDX_MAX];
  490: 
  491: /* Table of coding category masks.  Nth element is a mask for a coding
  492:    category of which priority is Nth.  */
  493: static
  494: int coding_priorities[CODING_CATEGORY_IDX_MAX];
  495: 
  496: /* Flag to tell if we look up translation table on character code
  497:    conversion.  */
  498: Lisp_Object Venable_character_translation;
  499: /* Standard translation table to look up on decoding (reading).  */
  500: Lisp_Object Vstandard_translation_table_for_decode;
  501: /* Standard translation table to look up on encoding (writing).  */
  502: Lisp_Object Vstandard_translation_table_for_encode;
  503: 
  504: Lisp_Object Qtranslation_table;
  505: Lisp_Object Qtranslation_table_id;
  506: Lisp_Object Qtranslation_table_for_decode;
  507: Lisp_Object Qtranslation_table_for_encode;
  508: 
  509: /* Alist of charsets vs revision number.  */
  510: Lisp_Object Vcharset_revision_alist;
  511: 
  512: /* Default coding systems used for process I/O.  */
  513: Lisp_Object Vdefault_process_coding_system;
  514: 
  515: /* Char table for translating Quail and self-inserting input.  */
  516: Lisp_Object Vtranslation_table_for_input;
  517: 
  518: /* Global flag to tell that we can't call post-read-conversion and
  519:    pre-write-conversion functions.  Usually the value is zero, but it
  520:    is set to 1 temporarily while such functions are running.  This is
  521:    to avoid infinite recursive call.  */
  522: static int inhibit_pre_post_conversion;
  523: 
  524: Lisp_Object Qchar_coding_system;
  525: 
  526: /* Return `safe-chars' property of CODING_SYSTEM (symbol).  Don't check
  527:    its validity.  */
  528: 
  529: Lisp_Object
  530: coding_safe_chars (coding_system)
  531:      Lisp_Object coding_system;
  532: {
  533:   Lisp_Object coding_spec, plist, safe_chars;
  534: 
  535:   coding_spec = Fget (coding_system, Qcoding_system);
  536:   plist = XVECTOR (coding_spec)->contents[3];
  537:   safe_chars = Fplist_get (XVECTOR (coding_spec)->contents[3], Qsafe_chars);
  538:   return (CHAR_TABLE_P (safe_chars) ? safe_chars : Qt);
  539: }
  540: 
  541: #define CODING_SAFE_CHAR_P(safe_chars, c) \
  542:   (EQ (safe_chars, Qt) || !NILP (CHAR_TABLE_REF (safe_chars, c)))
  543: 
  544: ^L
  545: /*** 2. Emacs internal format (emacs-mule) handlers ***/
  546: 
  547: /* Emacs' internal format for representation of multiple character
  548:    sets is a kind of multi-byte encoding, i.e. characters are
  549:    represented by variable-length sequences of one-byte codes.
  550: 
  551:    ASCII characters and control characters (e.g. `tab', `newline') are
  552:    represented by one-byte sequences which are their ASCII codes, in
  553:    the range 0x00 through 0x7F.
  554: 
  555:    8-bit characters of the range 0x80..0x9F are represented by
  556:    two-byte sequences of LEADING_CODE_8_BIT_CONTROL and (their 8-bit
  557:    code + 0x20).
  558: 
  559:    8-bit characters of the range 0xA0..0xFF are represented by
  560:    one-byte sequences which are their 8-bit code.
  561: 
  562:    The other characters are represented by a sequence of `base
  563:    leading-code', optional `extended leading-code', and one or two
  564:    `position-code's.  The length of the sequence is determined by the
  565:    base leading-code.  Leading-code takes the range 0x81 through 0x9D,
  566:    whereas extended leading-code and position-code take the range 0xA0
  567:    through 0xFF.  See `charset.h' for more details about leading-code
  568:    and position-code.
  569: 
  570:    --- CODE RANGE of Emacs' internal format ---
  571:    character set        range
  572:    -------------        -----
  573:    ascii                0x00..0x7F
  574:    eight-bit-control    LEADING_CODE_8_BIT_CONTROL + 0xA0..0xBF
  575:    eight-bit-graphic    0xA0..0xBF
  576:    ELSE                 0x81..0x9D + [0xA0..0xFF]+
  577:    ---------------------------------------------
  578: 
  579:    As this is the internal character representation, the format is
  580:    usually not used externally (i.e. in a file or in a data sent to a
  581:    process).  But, it is possible to have a text externally in this
  582:    format (i.e. by encoding by the coding system `emacs-mule').
  583: 
  584:    In that case, a sequence of one-byte codes has a slightly different
  585:    form.
  586: 
  587:    Firstly, all characters in eight-bit-control are represented by
  588:    one-byte sequences which are their 8-bit code.
  589: 
  590:    Next, character composition data are represented by the byte
  591:    sequence of the form: 0x80 METHOD BYTES CHARS COMPONENT ...,
  592:    where,
  593:         METHOD is 0xF0 plus one of composition method (enum
  594:         composition_method),
  595: 
  596:         BYTES is 0xA0 plus the byte length of these composition data,
  597: 
  598:         CHARS is 0xA0 plus the number of characters composed by these
  599:         data,
  600: 
  601:         COMPONENTs are characters of multibyte form or composition
  602:         rules encoded by two-byte of ASCII codes.
  603: 
  604:    In addition, for backward compatibility, the following formats are
  605:    also recognized as composition data on decoding.
  606: 
  607:    0x80 MSEQ ...
  608:    0x80 0xFF MSEQ RULE MSEQ RULE ... MSEQ
  609: 
  610:    Here,
  611:         MSEQ is a multibyte form but in these special format:
  612:           ASCII: 0xA0 ASCII_CODE+0x80,
  613:           other: LEADING_CODE+0x20 FOLLOWING-BYTE ...,
  614:         RULE is a one byte code of the range 0xA0..0xF0 that
  615:         represents a composition rule.
  616:   */
  617: 
  618: enum emacs_code_class_type emacs_code_class[256];
  619: 
  620: /* See the above "GENERAL NOTES on `detect_coding_XXX ()' functions".
  621:    Check if a text is encoded in Emacs' internal format.  If it is,
  622: