Libparserutils
Data Structures | Defines | Typedefs | Functions | Variables

codec_utf16.c File Reference

#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <parserutils/charset/mibenum.h>
#include <parserutils/charset/utf16.h>
#include "charset/codecs/codec_impl.h"
#include "utils/endian.h"
#include "utils/utils.h"

Go to the source code of this file.

Data Structures

struct  charset_utf16_codec
 UTF-16 charset codec. More...

Defines

#define INVAL_BUFSIZE   (32)
#define READ_BUFSIZE   (8)
#define WRITE_BUFSIZE   (8)

Typedefs

typedef struct charset_utf16_codec charset_utf16_codec
 UTF-16 charset codec.

Functions

static bool charset_utf16_codec_handles_charset (const char *charset)
 Determine whether this codec handles a specific charset.
static parserutils_error charset_utf16_codec_create (const char *charset, parserutils_charset_codec **codec)
 Create a UTF-16 codec.
static parserutils_error charset_utf16_codec_destroy (parserutils_charset_codec *codec)
 Destroy a UTF-16 codec.
static parserutils_error charset_utf16_codec_encode (parserutils_charset_codec *codec, const uint8_t **source, size_t *sourcelen, uint8_t **dest, size_t *destlen)
 Encode a chunk of UCS-4 (big endian) data into UTF-16.
static parserutils_error charset_utf16_codec_decode (parserutils_charset_codec *codec, const uint8_t **source, size_t *sourcelen, uint8_t **dest, size_t *destlen)
 Decode a chunk of UTF-16 data into UCS-4 (big endian)
static parserutils_error charset_utf16_codec_reset (parserutils_charset_codec *codec)
 Clear a UTF-16 codec's encoding state.
static parserutils_error charset_utf16_codec_read_char (charset_utf16_codec *c, const uint8_t **source, size_t *sourcelen, uint8_t **dest, size_t *destlen)
 Read a character from the UTF-16 to UCS-4 (big endian)
static parserutils_error charset_utf16_codec_output_decoded_char (charset_utf16_codec *c, uint32_t ucs4, uint8_t **dest, size_t *destlen)
 Output a UCS-4 character (big endian)

Variables

const parserutils_charset_handler charset_utf16_codec_handler

Define Documentation

#define INVAL_BUFSIZE   (32)

Definition at line 25 of file codec_utf16.c.

Referenced by charset_utf16_codec_decode(), and charset_utf16_codec_read_char().

#define READ_BUFSIZE   (8)

Definition at line 31 of file codec_utf16.c.

#define WRITE_BUFSIZE   (8)

Definition at line 37 of file codec_utf16.c.

Referenced by charset_utf16_codec_encode().


Typedef Documentation

UTF-16 charset codec.


Function Documentation

parserutils_error charset_utf16_codec_create ( const char *  charset,
parserutils_charset_codec **  codec 
) [static]
parserutils_error charset_utf16_codec_decode ( parserutils_charset_codec codec,
const uint8_t **  source,
size_t *  sourcelen,
uint8_t **  dest,
size_t *  destlen 
) [static]

Decode a chunk of UTF-16 data into UCS-4 (big endian)

Parameters:
codecThe codec to use
sourcePointer to pointer to source data
sourcelenPointer to length (in bytes) of source data
destPointer to pointer to output buffer
destlenPointer to length (in bytes) of output buffer
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small, PARSERUTILS_INVALID if a character cannot be represented and the codec's error handling mode is set to STRICT,

On exit, ::source will point immediately _after_ the last input character read, if the result is _OK or _NOMEM. Any remaining output for the character will be buffered by the codec for writing on the next call.

In the case of the result being _INVALID, ::source will point _at_ the last input character read; nothing will be written or buffered for the failed character. It is up to the client to fix the cause of the failure and retry the decoding process.

Note that, if failure occurs whilst attempting to write any output buffered by the last call, then ::source and ::sourcelen will remain unchanged (as nothing more has been read).

If STRICT error handling is configured and an illegal sequence is split over two calls, then _INVALID will be returned from the second call, but ::source will point mid-way through the invalid sequence (i.e. it will be unmodified over the second call). In addition, the internal incomplete-sequence buffer will be emptied, such that subsequent calls will progress, rather than re-evaluating the same invalid sequence.

::sourcelen will be reduced appropriately on exit.

::dest will point immediately _after_ the last character written.

::destlen will be reduced appropriately on exit.

Call this with a source length of 0 to flush the output buffer.

Definition at line 293 of file codec_utf16.c.

References charset_utf16_codec_read_char(), endian_host_to_big(), charset_utf16_codec::inval_buf, INVAL_BUFSIZE, charset_utf16_codec::inval_len, max, min, PARSERUTILS_NOMEM, PARSERUTILS_OK, charset_utf16_codec::read_buf, and charset_utf16_codec::read_len.

Referenced by charset_utf16_codec_create().

parserutils_error charset_utf16_codec_destroy ( parserutils_charset_codec codec) [static]

Destroy a UTF-16 codec.

Parameters:
codecThe codec to destroy
Returns:
PARSERUTILS_OK on success, appropriate error otherwise

Definition at line 127 of file codec_utf16.c.

References PARSERUTILS_OK, and UNUSED.

Referenced by charset_utf16_codec_create().

parserutils_error charset_utf16_codec_encode ( parserutils_charset_codec codec,
const uint8_t **  source,
size_t *  sourcelen,
uint8_t **  dest,
size_t *  destlen 
) [static]

Encode a chunk of UCS-4 (big endian) data into UTF-16.

Parameters:
codecThe codec to use
sourcePointer to pointer to source data
sourcelenPointer to length (in bytes) of source data
destPointer to pointer to output buffer
destlenPointer to length (in bytes) of output buffer
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small, PARSERUTILS_INVALID if a character cannot be represented and the codec's error handling mode is set to STRICT,

On exit, ::source will point immediately _after_ the last input character read. Any remaining output for the character will be buffered by the codec for writing on the next call.

Note that, if failure occurs whilst attempting to write any output buffered by the last call, then ::source and ::sourcelen will remain unchanged (as nothing more has been read).

::sourcelen will be reduced appropriately on exit.

::dest will point immediately _after_ the last character written.

::destlen will be reduced appropriately on exit.

Definition at line 161 of file codec_utf16.c.

References endian_big_to_host(), len, parserutils_charset_utf16_from_ucs4(), PARSERUTILS_NOMEM, PARSERUTILS_OK, charset_utf16_codec::write_buf, WRITE_BUFSIZE, and charset_utf16_codec::write_len.

Referenced by charset_utf16_codec_create().

bool charset_utf16_codec_handles_charset ( const char *  charset) [static]

Determine whether this codec handles a specific charset.

Parameters:
charsetCharset to test
Returns:
true if handleable, false otherwise

Definition at line 74 of file codec_utf16.c.

References parserutils_charset_mibenum_from_name(), and SLEN.

parserutils_error charset_utf16_codec_output_decoded_char ( charset_utf16_codec c,
uint32_t  ucs4,
uint8_t **  dest,
size_t *  destlen 
) [inline, static]

Output a UCS-4 character (big endian)

Parameters:
cCodec to use
ucs4UCS-4 character (host endian)
destPointer to pointer to output buffer
destlenPointer to output buffer length
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small,

Definition at line 523 of file codec_utf16.c.

References endian_host_to_big(), PARSERUTILS_NOMEM, PARSERUTILS_OK, charset_utf16_codec::read_buf, and charset_utf16_codec::read_len.

Referenced by charset_utf16_codec_read_char().

parserutils_error charset_utf16_codec_read_char ( charset_utf16_codec c,
const uint8_t **  source,
size_t *  sourcelen,
uint8_t **  dest,
size_t *  destlen 
) [inline, static]

Read a character from the UTF-16 to UCS-4 (big endian)

Parameters:
cThe codec
sourcePointer to pointer to source buffer (updated on exit)
sourcelenPointer to length of source buffer (updated on exit)
destPointer to pointer to output buffer (updated on exit)
destlenPointer to length of output buffer (updated on exit)
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small, PARSERUTILS_INVALID if a character cannot be represented and the codec's error handling mode is set to STRICT,

On exit, ::source will point immediately _after_ the last input character read, if the result is _OK or _NOMEM. Any remaining output for the character will be buffered by the codec for writing on the next call.

In the case of the result being _INVALID, ::source will point _at_ the last input character read; nothing will be written or buffered for the failed character. It is up to the client to fix the cause of the failure and retry the decoding process.

::sourcelen will be reduced appropriately on exit.

::dest will point immediately _after_ the last character written.

::destlen will be reduced appropriately on exit.

Definition at line 423 of file codec_utf16.c.

References charset_utf16_codec::base, charset_utf16_codec_output_decoded_char(), parserutils_charset_codec::errormode, charset_utf16_codec::inval_buf, INVAL_BUFSIZE, charset_utf16_codec::inval_len, PARSERUTILS_CHARSET_CODEC_ERROR_STRICT, parserutils_charset_utf16_next_paranoid(), parserutils_charset_utf16_to_ucs4(), PARSERUTILS_INVALID, PARSERUTILS_NEEDDATA, PARSERUTILS_NOMEM, and PARSERUTILS_OK.

Referenced by charset_utf16_codec_decode().

parserutils_error charset_utf16_codec_reset ( parserutils_charset_codec codec) [static]

Clear a UTF-16 codec's encoding state.

Parameters:
codecThe codec to reset
Returns:
PARSERUTILS_OK on success, appropriate error otherwise

Definition at line 378 of file codec_utf16.c.

References charset_utf16_codec::inval_buf, charset_utf16_codec::inval_len, PARSERUTILS_OK, charset_utf16_codec::read_buf, charset_utf16_codec::read_len, charset_utf16_codec::write_buf, and charset_utf16_codec::write_len.

Referenced by charset_utf16_codec_create().


Variable Documentation

Initial value:

Definition at line 542 of file codec_utf16.c.