codecs.infrastructure

← codecs

Underpinning infrastructure for the codecs module.

Classes

_class AsciiIncrementalDecoder(IncrementalDecoder)

Decoder for ISO/IEC 4873-DV, and base class for simple sensu lato extended ASCII decoders. Decoders for more complex cases, such as ISO-2022-JP, do not inherit from this class.

ISO/IEC 4873-DV is, as of the current (third) edition of ISO/IEC 4873, the same as what people usually mean when they say "ASCII" (i.e. an eighth bit exists but is never used, and backspace composition is not a thing which exists for encoding characters).

_AsciiIncrementalDecoder.decode(data_in,final)

Implements IncrementalDecoder.decode

_class AsciiIncrementalEncoder(IncrementalEncoder)

Encoder for ISO/IEC 4873-DV, and base class for simple sensu lato extended ASCII encoders. Encoders for more complex cases, such as ISO-2022-JP, do not inherit from this class.

ISO/IEC 4873-DV is, as of the current (third) edition of ISO/IEC 4873, the same as what people usually mean when they say "ASCII" (i.e. an eighth bit exists but is never used, and backspace composition is not a thing which exists for encoding characters).

_let x = AsciiIncrementalEncoder(errors)

AsciiIncrementalEncoder.__init__(errors)

_AsciiIncrementalEncoder.encode(string_in,final)

Implements IncrementalEncoder.encode

_AsciiIncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_AsciiIncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_AsciiIncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_class BaseEbcdicIncrementalDecoder(IncrementalDecoder)

Base class for EBCDIC decoders.

On its own, it is only capable of decoding U+3000 (from x'0E', x'40', x'40', x'0F'); hence, it should not, generally speaking, be used directly.

_BaseEbcdicIncrementalDecoder.decode(data_in,final)

Implements IncrementalDecoder.decode

_BaseEbcdicIncrementalDecoder.getstate()

Implements IncrementalDecoder.getstate

_BaseEbcdicIncrementalDecoder.reset()

Implements IncrementalDecoder.reset

_BaseEbcdicIncrementalDecoder.setstate(state)

Implements IncrementalDecoder.setstate

_class BaseEbcdicIncrementalEncoder(IncrementalEncoder)

Base class for EBCDIC encoders.

On its own, it is only capable of encoding U+3000 (as x'0E', x'40', x'40', x'0F'); hence, it should not, generally speaking, be used directly.

_BaseEbcdicIncrementalEncoder.encode(string,final)

Implements IncrementalEncoder.encode

_BaseEbcdicIncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_BaseEbcdicIncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_BaseEbcdicIncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_class ByteCatenator(object)

Helper class for maintaining a stream to which bytes objects will be repeatedly catenated in place.

_let x = ByteCatenator()

ByteCatenator.__init__()

_ByteCatenator.add(data)

_ByteCatenator.getvalue()

_class IncrementalDecoder(object)

Incremental decoder, allowing more Unicode data to be generated as more encoded data is obtained. Note that the return values from decode are not guaranteed to encompass all data which has been passed in, until it is called with final=True.

This is the base class and should not be instantiated directly.

_let x = IncrementalDecoder(errors)

IncrementalDecoder.__init__(errors)

_IncrementalDecoder._handle_truncation(out,unused,final,data,offset,leader)

Helper function used by subclasses to handle any pending data when returning from decode.

_IncrementalDecoder.decode(data_in,final)

Passes the given bytes in to the encoder, and returns a Unicode string. When final=False, the return value might not represent the entire input (some of which may become represented at the start of the value returned by the next call). When final=True, all of the input will be represented, and an error will be generated if it is truncated.

_IncrementalDecoder.getstate()

Returns an arbitrary object encapsulating decoder state.

_IncrementalDecoder.reset()

Reset decoder to initial state, without outputting, discarding any pending data.

_IncrementalDecoder.setstate(state)

Sets decoder state to one previously returned by getstate().

_class IncrementalEncoder(object)

Incremental encoder, allowing more encoded data to be generated as more Unicode data is obtained. Note that the return values from encode are not guaranteed to encompass all data which has been passed in, until it is called with final=True.

This is the base class and should not be instantiated directly.

_let x = IncrementalEncoder(errors)

IncrementalEncoder.__init__(errors)

_IncrementalEncoder.encode(string,final)

Passes the given string in to the encoder, and returns a sequence of bytes. When final=False, the return value might not represent the entire input (some of which may become represented at the start of the value returned by the next call). When final=True, all of the input will be represented, and any final state change sequence required by the encoding will be outputted.

_IncrementalEncoder.getstate()

Returns an arbitrary object encapsulating encoder state.

_IncrementalEncoder.reset()

Reset encoder to initial state, without outputting, discarding any pending data.

_IncrementalEncoder.setstate(state)

Sets encoder state to one previously returned by getstate().

_class KurokoCodecInfo(object)

Descriptor for the registered encoder and decoder for a given label. Has five members:

  • name: the label covered by this descriptor.
  • encode: encode a complete Unicode sequence.
  • decode: decode a complete byte sequence.
  • incrementalencoder: IncrementalEncoder subclass.
  • incrementaldecoder: IncrementalDecoder subclass.

_let x = KurokoCodecInfo(label,encoder,decoder)

KurokoCodecInfo.__init__(label,encoder,decoder)

_KurokoCodecInfo.decode(data,errors)

Decode a complete byte sequence to a complete Unicode stream. Semantic of name passed to errors= is as documented for lookup_error().

_KurokoCodecInfo.encode(string,errors)

Encode a complete Unicode sequence to a complete byte string. Semantic of name passed to errors= is as documented for lookup_error().

_class StringCatenator(object)

Helper class for maintaining a stream to which str objects will be repeatedly catenated in place.

_let x = StringCatenator()

StringCatenator.__init__()

_StringCatenator.add(string)

_StringCatenator.getvalue()

_class UndefinedIncrementalDecoder(IncrementalDecoder)

Decoder which errors out on all input. For use on input for which decoding should not be attempted. Error handler is honoured, and called once per non-empty decode method call.

_UndefinedIncrementalDecoder.decode(data,final)

_class UndefinedIncrementalEncoder(IncrementalEncoder)

Encoder which errors out on all input. For use on input for which encoding should not be attempted. Error handler is ignored.

_let x = UndefinedIncrementalEncoder(errors)

UndefinedIncrementalEncoder.__init__(errors)

_UndefinedIncrementalEncoder.encode(string,final)

_class decodesto7bit(object)

Decoding map for a 7-bit set, wrapping an decoding map for an 8-bit EUC or EUC-superset encoding.

_let x = decodesto7bit(base)

decodesto7bit.__init__(base)

_key in decodesto7bit

decodesto7bit.__contains__(key)

_decodesto7bit[key]

decodesto7bit.__getitem__(key)

_for x in decodesto7bit:

decodesto7bit.__iter__()

_decodesto7bit.keys()

_class encodesto7bit(object)

Encoding map for a 7-bit set, wrapping an encoding map for an 8-bit EUC or EUC-superset encoding.

_let x = encodesto7bit(base)

encodesto7bit.__init__(base)

_key in encodesto7bit

encodesto7bit.__contains__(key)

_encodesto7bit[key]

encodesto7bit.__getitem__(key)

_for x in encodesto7bit:

encodesto7bit.__iter__()

_encodesto7bit.keys()

Functions

_backslashreplace_errors(exc)

Handler for backslashreplace errors: replace unencodable character with Python/Kuroko style escape sequence. For Basic Multilingual Plane characters, this also matches JavaScript; beyond that, they differ.

_decode(data,label,web,errors)

Decode a complete byte sequence in the given encoding to a complete Unicode stream. Semantic of the web= argument is the same as with lookup(). Semantic of name passed to errors= is as documented for lookup_error().

Can be simply accessed as codecs.decode.

_encode(string,label,web,errors)

Encode a complete Unicode sequence to a complete byte string in the given encoding. Semantic of the web= argument is the same as with lookup(). Semantic of name passed to errors= is as documented for lookup_error().

Can be simply accessed as codecs.encode.

_ignore_errors(exc)

Handler for ignore errors: skip invalid sequences.

_lazy_property(method)

Like property(…), but memoises the value returned. The return value is assumed to be constant at the class level, i.e. the same for all instances.

_lookup(label,web)

Obtain a KurokoCodecInfo for a given label. If web=False (the default), will always succeed, but the resulting KurokoCodecInfo might be unable to encode and/or unable to decode if the label is not recognised in that direction. If web=True, will raise KeyError if the label is not a WHATWG-permitted label, and will map certain labels to undefined per the WHATWG spec.

Can be simply accessed as codecs.lookup.

_lookup_error(name)

Look up an error handler function registered with a certain name. By default, the following are registered. It is important to note that nothing obligates a codec to actually use the error handler if it is not deemed possible or appropriate, and so specifying a non-strict error handler will not guarantee an exception will not be raised, especially when working with a codec which is not a "normal" text encoding (e.g. undefined or inverse-base64).

  • strict: raise an exception.
  • ignore: skip invalid substrings. Not always recommended: can facilitate masked injection.
  • replace: insert a replacement character (decoding) or question mark (encoding).
  • warnreplace: like replace but prints a message to stderr; good for debugging.
  • backslashreplace: replace with Python/Kuroko style Unicode escapes. Note that this only matches JavaScript escape syntax for Basic Multilingual Plane characters. Encoding only.
  • xmlcharrefreplace: replace with HTML/XML numerical entities. Note that this will, per WHATWG, never generate entities for Shift Out, Shift In and Escape (i.e. when encoding to a stateful encoding which uses them, e.g. ISO-2022-JP), instead generating an entity for the replacement character. Encoding only.

_register_error(name,handler)

Reister a new error handler. The handler should be a function taking a UnicodeError and either raising an exception or returning a tuple of (substitute, resume_index). The substitute should be bytes (usually expected to be in ASCII) for a UnicodeEncodeError, str otherwise.

_register_kuroko_codec(labels,incremental_encoder_class,incremental_decoder_class)

Register a given IncrementalEncoder subclass and a given IncrementalDecoder subclass with a given list of labels. Usually, this is expected to include the encoding name, along with a list labels for aliases and/or subsets of the encoding. Either coder class may be None, if the encoder/decoder labels are being registered asymmetrically.

_replace_errors(exc)

Handler for replace errors: insert replacement character (if decoding) or question mark (if encoding).

_strict_errors(exc)

Handler for strict errors: raise the exception.

_warnreplace_errors(exc)

Handler for warnreplace errors: insert replacement character (if decoding) or question mark (if encoding) and print a warning to stderr.

_xmlcharrefreplace_errors(exc)

Handler for xmlcharrefreplace errors: replace unencodable character with XML numeric entity for the character unless it is Shift Out, Shift In or Escape, in which case insert the XML numeric entity for the replacement character (as stipulated by WHATWG for ISO-2022-JP).

Exceptions

_class UnicodeDecodeError(UnicodeError)

UnicodeError subclass raised when an error is encountered in the process of decoding.

_class UnicodeEncodeError(UnicodeError)

UnicodeError subclass raised when an error is encountered in the process of encoding.

_class UnicodeError(ValueError)

Exception raised when an error is encountered or detected in the process of encoding or decoding. May instead be passed to a handler when not in strict mode. Contains machine-readable information about the error encountered, allowing approaches to respond to it.

_let x = UnicodeError(encoding,object,start,end,reason)

UnicodeError.__init__(encoding,object,start,end,reason)