Trivial UTF-8 Manual
Table of Contents
[in package TRIVIAL-UTF-8]
1 The trivial-utf-8 ASDF System
- Description: A small library for doing UTF-8-based input and output.
- Licence: ZLIB
- Author: Marijn Haverbeke mailto:marijnh@gmail.com
- Maintainer: Gábor Melis mailto:mega@retes.hu
- Homepage: https://common-lisp.net/project/trivial-utf-8/
- Bug tracker: https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues
- Source control: GIT
2 Introduction
Trivial UTF-8 is a small library for doing UTF-8-based in- and
output on a Lisp implementation that already supports Unicode -
meaning char-code
and code-char
deal with Unicode character codes.
The rationale for the existence of this library is that while Unicode-enabled implementations usually do provide some kind of interface to dealing with character encodings, these are typically not terribly flexible or uniform.
The Babel library solves a similar problem while understanding more encodings. Trivial UTF-8 was written before Babel existed, but for new projects you might be better off going with Babel. The one plus that Trivial UTF-8 has is that it doesn't depend on any other libraries.
3 Links
Here is the official repository and the HTML documentation for the latest version.
4 Reference
[function] utf-8-byte-length string
Calculate the amount of bytes needed to encode
string
.
[function] string-to-utf-8-bytes string &key null-terminate
Convert
string
into an array of unsigned bytes containing its UTF-8 representation. Ifnull-terminate
, add an extra 0 byte at the end.
[function] utf-8-group-size byte
Determine the amount of bytes that are part of the character whose encoding starts with
byte
. May signalutf-8-decoding-error
.
[function] utf-8-bytes-to-string bytes &key (start 0) (end (length bytes))
Convert the
start
,end
subsequence of the array ofbytes
containing UTF-8 encoded characters to astring
. The element type ofbytes
may be anything as long as it can becoerce
d into an(unsigned-bytes 8)
array. May signalutf-8-decoding-error
.
[function] read-utf-8-string input &key null-terminated stop-at-eof (char-length -1) (byte-length -1)
Read UTF-8 encoded data from
input
, a byte stream, and construct a string with the characters found. Whennull-terminated
is given, stop reading at a null character. Ifstop-at-eof
, then stop atend-of-file
without raising an error. Thechar-length
andbyte-length
parameters can be used to specify the max amount of characters or bytes to read, where -1 means no limit. May signalutf-8-decoding-error
.