Character Encoding and Manipulation.¶
The Prolog library includes a set of built-in predicates designed to support manipulation of sequences of text, represented either as lists, atoms or strings.
{@
The char_type family of predicates support manipulation of individual characters, represented either as numbers (codes) or atoms (chars). Earlier versions of YAP were designed to operate with ASCII characters (in the range -1``127
) as other Prolog. ISO-LATIN-1 support was the first access to non-american text in YAP. UNICODE support has been incrementally added to YAP. The code uses the Operating system wide character support when available. Otherwise, YAP relies the library utf8proc, maintained by the Julia community.
As usual, the YAP API tried to follow SWI-Prolog as much as possible:
-
char_type//22 and code_type//22 support documented SWI-Prolog flags, but are strict in argument checking.
-
letters with no case are considered as lower-case. Hence, a variable can only start with an underscore or a unicode point in category LU.
-
number letters are considered numbers.
-
connectors, dashes, are considered solo characters.
-
YAP does currently distinguish opening and closing quotes.
-
Symbols are processed as Prolog symbols, exception are modifiers that are handled as lower-case letters.
-
the code for end of file is -1; the char for end of file is
end_of_file
. Two key predicates are: -
char_type//22
-
code_type//22
Next we present the different types for text elements. It it is possible for a character to have different types.
- alnum Char is a letter (upper- or lowercase) or digit.
+alpha Char is a letter (upper- or lowercase).
-
csym Char is a letter (upper- or lowercase), digit or the underscore (_). These are valid C and Prolog symbol characters.
-
csymf Char is a letter (upper- or lowercase) or the underscore (_). These are valid first characters for C and Prolog symbols.
-
ascii Char is a 7-bit ASCII character (0..127).
-
white Char is a space or tab, i.e. white space inside a line.
-
cntrl Char is an ASCII control character (0..31).
-
digit Char is a digit.
-
digit(Weight) Char is a digit with value Weight. I.e. char_type(X, digit(6) yields X = '6'. Useful for parsing numbers.
xdigit(Weight) Char is a hexadecimal digit with value Weight. I.e. char_type(a, xdigit(X) yields X = '10'. Useful for parsing numbers.
-
graph Char produces a visible mark on a page when printed. Note that the space is not included!
-
lower Char is a lowercase letter.
-
lower(Upper) Char is a lowercase version of Upper. Only true if Char is lowercase and Upper uppercase.
-
to_lower(Upper) Char is a lowercase version of Upper. For non-letters, or letter without case, Char and Lower are the same. See also upcase_atom//22 and downcase_atom//22.
-
upper Char is an uppercase letter.
-
upper(Lower) Char is an uppercase version of Lower. Only true if Char is uppercase and Lower lowercase.
-
to_upper(Lower) Char is an uppercase version of Lower. For non-letters, or letter without case, Char and Lower are the same. See also upcase_atom//22 and downcase_atom//22.
-
punct Char is a punctuation character. This is a graph character that is not a letter or digit.
-
space Char is some form of layout character (tab, vertical tab, newline, etc.).
-
end_of_file Char is -1.
-
end_of_line Char ends a line (ASCII: 10..13).
-
newline Char is a newline character (10).
-
period Char counts as the end of a sentence (.,!,?).
-
quote Char is a quote character (", ', `).
-
paren(Close) Char is an open parenthesis and Close is the corresponding close parenthesis.
-
prolog_var_start Char can start a Prolog variable name.
-
prolog_atom_start Char can start a unquoted Prolog atom that is not a symbol.
-
prolog_identifier_continue Char can continue a Prolog variable name or atom.
-
prolog_prolog_symbol Char is a Prolog symbol character. Sequences of Prolog symbol characters glue together to form an unquoted atom. Examples are =.., =, etc.
- char_type/2
- all_char_conersions/1
- current_char_conversion/2
- char_conversion/2
-
[force_char_conversion
-
[force_char_conversion
- code_type_prolog_identifier_continue/1
- code_type_prolog_atom_start/1
- code_type_var_start/1
- code_type_paren/1
- code_type_quote/1
- code_type_period/1
- code_type_newline/1
- code_type_end_of_line/1
- code_type_end_of_file/1
- char_type_prolog_prolog_symbol/1
- char_type_prolog_identifier_continue/1
- char_type_prolog_atom_start/1
- char_type_var_start/1
- char_type_paren/1
- char_type_quote/1
- char_type_period/1
- char_type_newline/1
- char_type_end_of_line/1
- char_type_end_of_file/1
- code_type_white/1
- char_type_white/1
- code_type_upper/1
- char_type_upper/1
- code_type_space/1
- char_type_space/1
- code_type_punct/1
- char_type_punct/1
- char_type_lower/1
- code_type_lower/1
- code_type_graph/1
- char_type_graph/1
- code_type_xdigit/1
- char_type_xdigit/1
- code_type_digit/1
- char_type_digit/1
- code_type_csymf/1
- char_type_csymf/1
- char_type_csym/1
- code_type_csym/1
- code_type_cntrl/1
- char_type_cntrl/1
- code_type_ascii/1
- char_type_ascii/1
- char_type_alnum/1
- code_type_alnum/1
- char_type_alpha/1
- code_type_alpha/1
- change_type_of_char/2
- to_lower/2
- to_upper/2
- encoding/2
- enc_map
Functions:¶
1. static Int p_change_type_of_char(USES_REGS1):
1. static encoding_t enc_os_default(encoding_t rc):
1. int Yap_encoding_error(int ch, seq_type_t code, struct stream_desc *st):
1. encoding_t Yap_SystemEncoding(void):
1. static encoding_t DefaultEncoding(void):
1. encoding_t Yap_DefaultEncoding(void):
1. void Yap_SetDefaultEncoding(encoding_t new_encoding):
1. static Int get_default_encoding(USES_REGS1):
1. static Int p_encoding(USES_REGS1):
1. static int get_char(Term t): : get_char( + Code, -Char) if the number Code represents a valid Unicode point, the atom Char will represent the same unicode point.
1. static int get_code(Term t): : "get_char( +Char -Code" )
@class get_char_1
if the atom Char represents a valid Unicode point, the number Coder will represent the same unicode point.
1. static int get_char_or_code(Term t, bool *is_char): : "get_char_or_code( +CharOrCode, -CodeOrChar" )
@class get_char_or_code_2
convert from char to code or from code to char.
1. static bool to_upper(Term t, Term t2 USES_REGS):
1. static bool to_lower(Term t, Term t2 USES_REGS):
1. static Int touppe/2(USES_REGS1):
1. static Int tolowe/2(USES_REGS1):
1. static bool type_alpha(int ch):
1. static Int code_type_alpha(USES_REGS1):
1. static Int char_type_alpha(USES_REGS1):
1. static bool type_alnum(int ch):
1. static Int code_type_alnum(USES_REGS1):
1. static Int char_type_alnum(USES_REGS1):
1. static bool type_ascii(int ch):
1. static Int char_type_ascii(USES_REGS1):
1. static Int code_type_ascii(USES_REGS1):
1. static bool type_cntrl(int ch):
1. static Int char_type_cntrl(USES_REGS1):
1. static Int code_type_cntrl(USES_REGS1):
1. static bool type_csym(int ch):
1. static Int code_type_csym(USES_REGS1):
1. static Int char_type_csym(USES_REGS1):
1. static bool type_csymf(int ch):
1. static Int char_type_csymf(USES_REGS1):
1. static Int code_type_csymf(USES_REGS1):
1. static bool type_digit(int ch):
1. static Int char_type_digit(USES_REGS1):
1. static Int code_type_digit(USES_REGS1):
1. static bool type_xdigit(int ch):
1. static Int char_type_xdigit(USES_REGS1):
1. static Int code_type_xdigit(USES_REGS1):
1. static bool type_graph(int ch):
1. static Int char_type_graph(USES_REGS1):
1. static Int code_type_graph(USES_REGS1):
1. static bool type_lower(int ch):
1. static Int code_type_lower(USES_REGS1):
1. static Int char_type_lower(USES_REGS1):
1. static bool type_punct(int ch):
1. static Int char_type_punct(USES_REGS1):
1. static Int code_type_punct(USES_REGS1):
1. static bool type_space(int ch):
1. static Int char_type_space(USES_REGS1):
1. static Int code_type_space(USES_REGS1):
1. static bool type_upper(int ch):
1. static Int char_type_upper(USES_REGS1):
1. static Int code_type_upper(USES_REGS1):
1. static bool type_white(int ch):
1. static Int char_type_white(USES_REGS1):
1. static Int code_type_white(USES_REGS1):
1. static Int char_type_end_of_file(USES_REGS1):
1. static Int char_type_end_of_line(USES_REGS1):
1. static Int char_type_newline(USES_REGS1):
1. static Int char_type_period(USES_REGS1):
1. static Int char_type_quote(USES_REGS1):
1. static Int char_type_paren(USES_REGS1):
1. static Int char_type_prolog_var_start(USES_REGS1):
1. static Int char_type_prolog_atom_start(USES_REGS1):
1. static Int char_type_prolog_identifier_continue(USES_REGS1):
1. static Int char_type_prolog_prolog_symbol(USES_REGS1):
1. static Int code_type_end_of_file(USES_REGS1):
1. static Int code_type_end_of_line(USES_REGS1):
1. static Int code_type_newline(USES_REGS1):
1. static Int code_type_period(USES_REGS1):
1. static Int code_type_quote(USES_REGS1):
1. static Int code_type_paren(USES_REGS1):
1. static Int code_type_prolog_var_start(USES_REGS1):
1. static Int code_type_prolog_atom_start(USES_REGS1):
1. static Int code_type_prolog_identifier_continue(USES_REGS1):
1. static Int code_type_prolog_prolog_symbol(USES_REGS1):
1. static Int code_char(USES_REGS1):
1. static Int char_code(USES_REGS1):
1. static Int p_force_char_conversion(USES_REGS1):
1. Enable the ISO char conversion mechanism *static Int p_disable_char_conversion(USES_REGS1):
1. static Int char_conversion(USES_REGS1):
1. static Int p_current_char_conversion(USES_REGS1):
1. static Int p_all_char_conversions(USES_REGS1):
1. void Yap_InitChtypes(void):