Strings
Our string manipulation module is split into several submodules, each with a particular use case.
String buffers (sb)
A string buffer is a dynamically resizeable string container. It is to be used when you want to compose a new string by chunks. String buffers content is always properly NULL-terminated.
String buffers are typically declared and initialized a single line of code:
-
SB_1k(name)
andSB_8k(name)
declare and initialize a stack-allocated string buffer. If the buffer grows beyond its initial size, it will be heap-reallocated. It will be automatically wiped after at the end of the scope. -
t_SB_1k(name)
andt_SB_8k(name)
declare and initialize a t-stack-allocated string buffer. t-stack-allocated buffers remain in the t-stack even if reallocated, thus they don’t need to be wiped, but they can only be reallocated in the current t-frame.
A heap-allocated string buffers can also be initialized with sb_init()
. In
that case, it must always be wiped after use with sb_wipe()
.
Once you have a string buffer, writing in it will be done using the sb_add*
functions. There are a lot of variants, some are generic and can be found in
lib-common/str-buf.h
, others are specific to a protocol implementation or to
a product. The most commonly used functions are:
-
sb_addc(sb, c)
: append a single character -
sb_adds(sb, s)
: append the content of a C-string (const char *) -
sb_add(sb, s, len)
: append len bytes from the pointer s -
sb_addf(sb, fmt, …)
: printf-like formatting -
…
String literals
For string literals we avoid const char *
as much as possible because in most
use cases, const char *
ends up being traversed several times. To avoid that
issue, we use a structure named lstr_t
that contains the string, its length,
and ownership information (that way we know who is responsible of the
de-allocation of the data).
Beyond the traversing issue, using lstr_t
enables several optimizations and
avoids string copies in several places. Since we pass a pair (pointer, length)
with no guarantee that the string is NULL-terminated, we can easily extract
substrings without copying the content of the old string in a newly allocated
buffer…
Strings in lstr_t
are accessible either as constant strings or as
non-constant strings. The non-constant case must be used carefully. In
particular, it is not appropriate to assume a lstr_t
is writable in a library
call.
We have several initializers for lstr_t
(note that the _V
variants can be
used anywhere in the code while "normal" variant can only be used as an
initializer):
-
LSTR
: this builds alstr_t
from aconst char *
by callingstrlen
on the string. Thelstr_t
does not own the associated memory. When the string is known at compile time, thestrlen
is also performed at compile time, making it as efficient asLSTR_IMMED
. This is the recommended macro that should be used in most cases. -
LSTR_OPT
: idem, but the input string can be null. -
LSTR_IMMED
/LSTR_IMMED_V
: initialize alstr_t
from a literal string expressionname = LSTR_IMMED_V("foo")
. This allows resolution of the string at compile time. -
LSTR_INIT
/LSTR_INIT_V
: this takes a pointer and a length and builds a newlstr_t
that does not own the associated memory. -
lstr_init_
: this builds alstr_t
from a memory chunk, a length and an indication about the type of memory (ownership flag). -
LSTR_DATA
/LSTR_DATA_V
: initialize alstr_t
from an anonymous data chunk.
We also have constants:
-
LSTR_NULL
/LSTR_NULL_V
: alstr_t
pointing to the NULL pointer with a length of 0 -
LSTR_EMPTY
/LSTR_EMPTY_V
: alstr_t
pointing to the empty string""
, with a length of 0
lstr_t
must be wiped. In particular when storing a lstr_t
in a structure,
you must take care at calling lstr_wipe
when destructing the structure. This
implies that you always ensure the ownership flag of the lstr_t
is correctly
set: a lstr_t
should not be owned more than once. In case you need a copy
"without ownership" you can use lstr_dupc()
, or in case you want to transfer
ownership you can use lstr_transfer
.
struct foo_t {
lstr_t name;
};
void foo_wipe(struct foo_t *foo)
{
lstr_wipe(&bar->name);
}
GENERIC_DELETE(struct foo_t, foo)
struct bar_t {
lstr_t name;
}
void bar_wipe(struct bar_t *bar)
{
lstr_wipe(&bar->name);
}
GENERIC_DELETE(struct bar_t, bar)
/* Example of foo taking ownership of the name of bar. This works without
* recopy and without risk of double ownership (and thus of double free), since
* we properly transfer the ownership of the lstr_t to foo.
*/
struct foo_t *bar_to_foo(struct bar_t **bar)
{
struct foo_t *foo = p_new(struct foo_t, 1);
lstr_transfer(&foo->name, &(*bar)->name);
bar_delete(bar);
return foo;
}
/* Example of taking a constant copy of the name of bar. This works as long as
* we're sure foo's lifetime is shorter than bar's one.
*/
struct foo_t *foo_from_bar(const struct bar_t *bar)
{
struct foo_t *foo = p_new(struct foo_t, 1);
foo->name = lstr_dupc(bar->name);
return foo;
}
/* Example of taking a full copy of the name of bar. This always work but
* implies we have a new allocated string for each structure.
*/
struct foo_t *copy_foo_from_bar(const struct bar_t *bar)
{
struct foo_t *foo = p_new(struct foo_t, 1);
foo->name = lstr_dup(bar->name);
return foo;
}
If you want to print a lstr_t
using a printf-like function, you should use
the %*pM
format in conjonction with the LSTR_FMT_ARG
macro:
printf("name=%*pM", LSTR_FMT_ARG(foo->name));
The lstr_fmt
and t_lstr_fmt
macros can be used to create new lstr_t
that
are the result of the expansion of a printf-like format string. t_lstr_fmt
is
broadly used to generate error messages.
Parsing streams (pstreams)
The pstream_t
structure defines a chunk of data that needs to be parsed. It
is optimized for sequential parsing that involves moving either the beginning
of the chunk or its end without moving the other bound. As a consequence it
uses a dual-pointer representation (beginning and end of the chunk).
In conjonction with the ctype
module that allows the definition of sets of
characters (like isalpha, is alnum, isspace… but also supports custom sets),
this provides an efficient and elegant solution for reading data, extracting
chunks, skipping others and ensure separators are correct.
The pstream_t
is not supposed to own the memory, it just refers to a chunk we
want to parse. It can be initialized either using a (pointer, length) pair
(ps_init
), using a C-string (ps_initstr
) or a lstr_t
(ps_initlstr
).
The pstream_t
API provides the PS_WANT
macro that can be used to return an
error in case a condition is not met.
struct user_t {
lstr_t firstname;
lstr_t lastname;
int age;
bool gender;
};
/* Read a record with the following format: firstname;lastname;age;gender
* whitespaces are ignored, firstname and lastname should contain only alpha
* characters, gender can be either M or F
*/
int parse_record(pstream_t *ps, struct user_t *user)
{
#define READ_SEMICOLON() \
ps_ltrim(ps); \
PS_WANT(ps_getc(ps) == ';'); \
ps_ltrim(ps);
pstream_t n;
ps_ltrim(ps);
/* Read the first name */
n = ps_get_span(ps, &ctype_isalpha);
PS_WANT(!ps_done(&n)); /* ensure the name is not empty */
user->firstname = LSTR_PS_V(n);
READ_SEMICOLON();
/* Read the last name */
n = ps_get_span(ps, &ctype_isalpha);
PS_WANT(!ps_done(&n)); /* ensure the name is not empty */
user->lastname = LSTR_PS_V(n);
READ_SEMICOLON();
/* Read the age */
user->age = ps_geti(ps);
PS_WANT(user->age > 0);
READ_SEMICOLON();
/* Read the gender */
switch (RETHROW(ps_getc(ps))) {
case 'M':
user->gender = true;
break;
case 'F':
user->gender = false;
break;
default:
return -1;
}
/* Ensure we properly finished the stream */
ps_ltrim(ps);
return ps_done(ps) ? 0 : -1;
}
/* Read a record with the same format, but no constraint on firstname and
* lastname content except that they cannot contain a semi-colon. Whitespaces
* are not ignored.
*/
int parse_record2(pstream_t *ps, struct user_t *user)
{
pstream_t n;
/* Read the first name */
RETHROW(ps_get_ps_chr_and_skip(ps, ';', &n));
PS_WANT(!ps_done(&n)); /* ensure the name is not empty */
user->firstname = LSTR_PS_V(n);
/* Read the last name */
RETHROW(ps_get_ps_chr_and_skip(ps, ';', &n));
PS_WANT(!ps_done(&n)); /* ensure the name is not empty */
user->lastname = LSTR_PS_V(n);
/* Read the age */
user->age = ps_geti(ps);
PS_WANT(user->age > 0);
RETHROW(ps_skipc(ps, ';'));
/* Read the gender */
switch (RETHROW(ps_getc(ps))) {
case 'M':
user->gender = true;
break;
case 'F':
user->gender = false;
break;
default:
return -1;
}
/* Ensure we properly finished the stream */
return ps_done(ps) ? 0 : -1;
}