codecs.dbextra

← codecs

This module includes some additional variable-width or wide encodings not specified by WHATWG.

As such, none of the codecs in this module should be used in HTML.

Classes

_class AsciiJohabIncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for the PC Johab encoding (code page 1361).

_property AsciiJohabIncrementalDecoder.decoding_map

_class AsciiJohabIncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for the PC Johab encoding (code page 1361).

_property AsciiJohabIncrementalEncoder.encoding_map

_class Big5NonEtenKanaIncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for Big5 with non-ETEN layout of kana, Cyrillic, list markers.

The other ETEN extension section (the one retained by Microsoft's version) is still included.

Although this is the kana/Cyrillic/list marker layout included in the UTC's BIG5.TXT, it is the less common of the two (most extension schemes for Big5 use the ETEN layout), and has several problems (katakana lacks the vowel extender, and Cyrillic lacks several capitals) which the ETEN layout does not have. However, this codec corresponds roughly to Python's big5, and more closely to its (built-in, as opposed to if/when Python aliases it to mbcs) cp950.

_property Big5NonEtenKanaIncrementalDecoder.decoding_map

_class Big5NonEtenKanaIncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for Big5 with non-ETEN layout of kana, Cyrillic, list markers.

The other ETEN extension section (the one retained by Microsoft's version) is still included.

Although this is the kana/Cyrillic/list marker layout included in the UTC's BIG5.TXT, it is the less common of the two (most extension schemes for Big5 use the ETEN layout), and has several problems (katakana lacks the vowel extender, and Cyrillic lacks several capitals) which the ETEN layout does not have. However, this codec corresponds roughly to Python's big5, and more closely to its (built-in, as opposed to if/when Python aliases it to mbcs) cp950.

_property Big5NonEtenKanaIncrementalEncoder.encoding_map

_class Cesu8IncrementalDecoder(Utf8IncrementalDecoder)

IncrementalDecoder implementation for CESU-8, a deprecated UTF-8-like encoding still used by some systems, such as TCL, and still mis-called "utf8" in some places for legacy reasons.

_Cesu8IncrementalDecoder._error_handler(error)

_class Cesu8IncrementalEncoder(IncrementalEncoder)

IncrementalEncoder implementation for CESU-8, a deprecated UTF-8-like encoding still used by some systems, such as TCL, and still mis-called "utf8" in some places for legacy reasons.

_Cesu8IncrementalEncoder.encode(string,final)

Implements IncrementalEncoder.encode

_Cesu8IncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_Cesu8IncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_Cesu8IncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_class EbcdicJohabIncrementalDecoder(BaseEbcdicIncrementalDecoder)

IncrementalDecoder implementation for code page 1364, a stateful EBCDIC variant of Johab.

_property EbcdicJohabIncrementalDecoder.dbcshost_decode

_property EbcdicJohabIncrementalDecoder.sbcs_decode

_class EbcdicJohabIncrementalEncoder(BaseEbcdicIncrementalEncoder)

IncrementalEncoder implementation for code page 1364, a stateful EBCDIC variant of Johab.

_property EbcdicJohabIncrementalEncoder.dbcshost_encode

_property EbcdicJohabIncrementalEncoder.sbcs_encode

_class EucJis2004IncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for the JIS X 0213 version of EUC-JP.

_property EucJis2004IncrementalDecoder.decoding_map

_class EucJis2004IncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for the JIS X 0213 version of EUC-JP.

_property EucJis2004IncrementalEncoder.encoding_map

_class EucJpFullIncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for EUC-JP, including JIS X 0212.

_property EucJpFullIncrementalEncoder.encoding_map

_class HzIncrementalDecoder(IncrementalDecoder)

IncrementalDecoder implementation for HZ-GB-2312 (Usenet simplified Chinese).

This is an old scheme for embedding GB 2312 data into a pure ASCII stream.

_HzIncrementalDecoder.decode(data_in,final)

Implements IncrementalDecoder.decode

_HzIncrementalDecoder.getstate()

Implements IncrementalDecoder.getstate

_HzIncrementalDecoder.reset()

Implements IncrementalDecoder.reset

_HzIncrementalDecoder.setstate(state)

Implements IncrementalDecoder.setstate

_class HzIncrementalEncoder(IncrementalEncoder)

IncrementalEncoder implementation for HZ-GB-2312 (Usenet simplified Chinese).

This is an old scheme for embedding GB 2312 data into a pure ASCII stream.

_HzIncrementalEncoder.encode(string,final)

Implements IncrementalEncoder.encode

_HzIncrementalEncoder.ensure_state_number(state,out)

_HzIncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_HzIncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_HzIncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_class Iso2022CnIncrementalDecoder(Iso2022NonJpIncrementalDecoder)

IncrementalDecoder implementation for ISO-2022-CN (7-bit stateful Chinese).

ISO-2022-CN-Ext is not included (it requires a much larger set of tables and is very rare).

_property Iso2022CnIncrementalDecoder.decodes

_class Iso2022CnIncrementalEncoder(Iso2022NonJpIncrementalEncoder)

IncrementalEncoder implementation for ISO-2022-CN (7-bit stateful Chinese).

ISO-2022-CN-Ext is not included (it requires a much larger set of tables and is very rare).

_property Iso2022CnIncrementalEncoder.encodes

_class Iso2022Jp1IncrementalEncoder(Iso2022JpIncrementalEncoder)

IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0212.

This differs from the ISO-2022-JP encoder in that it will encode to JIS X 0212, and does so whenever possible (i.e. it will favour it over any web extensions to JIS X 0208).

_property Iso2022Jp1IncrementalEncoder.encodes_dbcs

_property Iso2022Jp1IncrementalEncoder.encodes_sbcs

_class Iso2022Jp2004IncrementalEncoder(Iso2022JpIncrementalEncoder)

IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0213-2004.

_property Iso2022Jp2004IncrementalEncoder.encodes_dbcs

_property Iso2022Jp2004IncrementalEncoder.encodes_sbcs

_class Iso2022Jp2IncrementalEncoder(Iso2022JpIncrementalEncoder)

IncrementalEncoder implementation for 7-bit stateful Japanese with multilingual extensions.

_property Iso2022Jp2IncrementalEncoder.encode_supershift_greek

_property Iso2022Jp2IncrementalEncoder.encode_supershift_latin

_property Iso2022Jp2IncrementalEncoder.encodes_dbcs

_property Iso2022Jp2IncrementalEncoder.encodes_sbcs

_class Iso2022Jp3IncrementalEncoder(Iso2022JpIncrementalEncoder)

IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0213-2000.

_property Iso2022Jp3IncrementalEncoder.encodes_dbcs

_property Iso2022Jp3IncrementalEncoder.encodes_sbcs

_class Iso2022JpExtIncrementalEncoder(Iso2022JpIncrementalEncoder)

IncrementalEncoder implementation for 7-bit stateful Japanese.

This differs from the ISO-2022-JP-1 encoder in that it preserves katakana width.

_property Iso2022JpExtIncrementalEncoder.encodes_dbcs

_property Iso2022JpExtIncrementalEncoder.encodes_sbcs

_class Iso2022KrIncrementalDecoder(Iso2022NonJpIncrementalDecoder)

IncrementalDecoder implementation for ISO-2022-KR (7-bit stateful Korean, South).

_property Iso2022KrIncrementalDecoder.decodes

_class Iso2022KrIncrementalEncoder(Iso2022NonJpIncrementalEncoder)

IncrementalEncoder implementation for ISO-2022-KR (7-bit stateful Korean, South).

_Iso2022KrIncrementalEncoder.run_prelude(out)

_property Iso2022KrIncrementalEncoder.encodes

_class Iso2022NonJpIncrementalDecoder(IncrementalDecoder)

IncrementalDecoder subclass, base class for ISO-2022-KR and ISO-2022-CN. Not used directly.

_Iso2022NonJpIncrementalDecoder.decode(data_in,final)

Implements IncrementalDecoder.decode

_Iso2022NonJpIncrementalDecoder.getstate()

Implements IncrementalDecoder.getstate

_Iso2022NonJpIncrementalDecoder.reset()

Implements IncrementalDecoder.reset

_Iso2022NonJpIncrementalDecoder.setstate(state)

Implements IncrementalDecoder.setstate

_class Iso2022NonJpIncrementalEncoder(IncrementalEncoder)

IncrementalEncoder subclass, base class for ISO-2022-KR and ISO-2022-CN. Not used directly.

_Iso2022NonJpIncrementalEncoder.encode(string,final)

Implements IncrementalEncoder.encode

_Iso2022NonJpIncrementalEncoder.ensure_shift_designation(state,out)

_Iso2022NonJpIncrementalEncoder.ensure_shift_state(state,out)

_Iso2022NonJpIncrementalEncoder.ensure_super3_designation(state,out)

_Iso2022NonJpIncrementalEncoder.ensure_super_designation(state,out)

_Iso2022NonJpIncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_Iso2022NonJpIncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_Iso2022NonJpIncrementalEncoder.run_prelude(out)

_Iso2022NonJpIncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_class JapaneseAutodetectIncrementalDecoder(IncrementalDecoder)

IncrementalDecoder implementation for the automatic "Japanese" character encoding option.

This will attempt to interpret the stream as the web versions of ISO-2022-JP, Shift_JIS and EUC-JP, as well as UTF-8, at once, and start returning the data once it has narrowed it down to one. If it fails to narrow it down conclusively, it will wait until the final call before making an educated guess. If it doesn't seem to be any of them, it will raise ValueError.

_let x = JapaneseAutodetectIncrementalDecoder(errors)

JapaneseAutodetectIncrementalDecoder.__init__(errors)

_JapaneseAutodetectIncrementalDecoder.decode(data,final)

Implements IncrementalDecoder.decode

_JapaneseAutodetectIncrementalDecoder.getstate()

Implements IncrementalDecoder.getstate

_JapaneseAutodetectIncrementalDecoder.reset()

Implements IncrementalDecoder.reset

_JapaneseAutodetectIncrementalDecoder.setstate(state)

Implements IncrementalDecoder.setstate

_class JisEncodingIncrementalDecoder(Iso2022JpIncrementalDecoder)

IncrementalDecoder implementation for 7-bit stateful Japanese.

This is differs from the ISO-2022-JP decoder in that it will:

  • Decode 1978 JIS with a separate table, including 1978 JIS, NEC extensions and IBM backports.
  • Accept and decode extensions from ISO-2022-JP-2 (and -1), ISO-2022-JP-3 and ISO-2022-JP-2004.
  • Not generate an error for immediately concatenated JIS-Kanji→ASCII→JIS-Kanji designations.
  • Accept katakana via Shift Out / Shift In.

This is used as the decoder for all other ISO-2022-JP variants besides plain ISO-2022-JP.

_property JisEncodingIncrementalDecoder.decode_shiftout

_property JisEncodingIncrementalDecoder.decode_supershift_greek

_property JisEncodingIncrementalDecoder.decode_supershift_latin

_property JisEncodingIncrementalDecoder.decodes_dbcs

_property JisEncodingIncrementalDecoder.decodes_sbcs

_class JisEncodingIncrementalEncoder(Iso2022JpIncrementalEncoder)

IncrementalEncoder implementation for 7-bit stateful Japanese with all features.

This differs from the ISO-2022-JP encoder in that it will:

  • Encode forms present in 1978 JIS but simplified by (and absent in) 1983 JIS to 1978 JIS.
  • For characters not present in either table, try JIS X 0212, 2000 JIS and 2004 JIS in that order.
  • For characters not present in any JIS set, try GB 2312 and Wansung.
  • Preserve width of katakana.

_property JisEncodingIncrementalEncoder.encode_supershift_greek

_property JisEncodingIncrementalEncoder.encode_supershift_latin

_property JisEncodingIncrementalEncoder.encodes_dbcs

_property JisEncodingIncrementalEncoder.encodes_sbcs

_class ShiftJis2004IncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for the JIS X 0213 version of Shift_JIS.

_property ShiftJis2004IncrementalDecoder.decoding_map

_class ShiftJis2004IncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for the JIS X 0213 version of Shift_JIS.

_property ShiftJis2004IncrementalEncoder.encoding_map

_class Utf32BeIncrementalDecoder(Utf32IncrementalDecoder)

IncrementalDecoder implementation for UTF-32, big endian, without a byte order mark.

_class Utf32BeIncrementalEncoder(Utf32IncrementalEncoder)

IncrementalEncoder implementation for UTF-32, big endian, without a byte order mark.

_class Utf32IncrementalDecoder(IncrementalDecoder)

IncrementalDecoder implementation for UTF-32, detected byte order, removing any byte order mark.

_Utf32IncrementalDecoder.decode(data_in,final)

Implements IncrementalDecoder.decode

_Utf32IncrementalDecoder.getstate()

Implements IncrementalDecoder.getstate

_Utf32IncrementalDecoder.reset()

Implements IncrementalDecoder.reset

_Utf32IncrementalDecoder.setstate(state)

Implements IncrementalDecoder.setstate

_class Utf32IncrementalEncoder(IncrementalEncoder)

IncrementalEncoder implementation for UTF-32 with byte order mark.

_Utf32IncrementalEncoder.encode(string,final)

Implements IncrementalEncoder.encode

_Utf32IncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_Utf32IncrementalEncoder.push_word(word,out)

_Utf32IncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_Utf32IncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_property Utf32IncrementalEncoder.encoding_map

_class Utf32LeIncrementalDecoder(Utf32IncrementalDecoder)

IncrementalDecoder implementation for UTF-32, little endian, without a byte order mark.

_class Utf32LeIncrementalEncoder(Utf32IncrementalEncoder)

IncrementalEncoder implementation for UTF-32, little endian, without a byte order mark.

_class Utf7IncrementalDecoder(IncrementalDecoder)

IncrementalDecoder implementation for UTF-7, a largely obsolete (and forbidden in HTML5) scheme for mixing ASCII with Base64'd UTF-16BE in e-mail.

_let x = Utf7IncrementalDecoder(errors)

Utf7IncrementalDecoder.__init__(errors)

_Utf7IncrementalDecoder.decode(data_in,final)

Implements IncrementalDecoder.decode

_Utf7IncrementalDecoder.getstate()

Implements IncrementalDecoder.getstate

_Utf7IncrementalDecoder.reset()

Implements IncrementalDecoder.reset

_Utf7IncrementalDecoder.setstate(state)

Implements IncrementalDecoder.setstate

_class Utf7IncrementalEncoder(IncrementalEncoder)

IncrementalEncoder implementation for UTF-7, a largely obsolete (and forbidden in HTML5) scheme for mixing ASCII with Base64'd UTF-16BE in e-mail.

_let x = Utf7IncrementalEncoder(errors)

Utf7IncrementalEncoder.__init__(errors)

_Utf7IncrementalEncoder.encode(data,final)

Implements IncrementalEncoder.encode

_Utf7IncrementalEncoder.getstate()

Implements IncrementalEncoder.getstate

_Utf7IncrementalEncoder.reset()

Implements IncrementalEncoder.reset

_Utf7IncrementalEncoder.setstate(state)

Implements IncrementalEncoder.setstate

_class XMacChineseSimpIncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for EUC-CN, Apple version (hence slightly reduced lead byte range).

Mappings to more-recently added characters are used for the vertical forms, rather than Apple transcoding hints (or GB18030 private use codes).

_property XMacChineseSimpIncrementalDecoder.decoding_map

_class XMacChineseSimpIncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for EUC-CN, Apple version (hence slightly reduced lead byte range).

Mappings to more-recently added characters are used for the vertical forms, rather than Apple transcoding hints (or GB18030 private use codes).

_property XMacChineseSimpIncrementalEncoder.encoding_map

_class XMacChineseTradIncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for Big5 with Apple's additions and reduced lead byte range.

The Unicode mappings are partly changed to be closer to Apple's (as opposed to Microsoft's) correspondences; however, Microsoft's are retained where following Apple's would have required PUA transcoding hints to round-trip.

_property XMacChineseTradIncrementalDecoder.decoding_map

_class XMacChineseTradIncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for Big5 with Apple's additions and reduced lead byte range.

The Unicode mappings are partly changed to be closer to Apple's (as opposed to Microsoft's) correspondences; however, Microsoft's are retained where following Apple's would have required PUA transcoding hints to round-trip.

_property XMacChineseTradIncrementalEncoder.encoding_map

_class XMacKoreanIncrementalDecoder(AsciiIncrementalDecoder)

IncrementalDecoder implementation for the HangulTalk (MacKorean) encoding.

HangulTalk is notorious for frequently not corresponding one-to-one to Unicode; the mappings used here are somewhat updated and improved compared to all versions of Apple's mappings and especially the Adobe CID mappings. However, bear in mind that content will not necessarily be decoded to the same Unicode sequences as by other implementations. In particular, decoding to the Apple's Corporate Private Use Area has been avoided for the most part, even where this results in poorly matched and/or convergent decoded forms, since preserving legibility has been afforded greater priority than round tripping.

_property XMacKoreanIncrementalDecoder.decoding_map

_class XMacKoreanIncrementalEncoder(AsciiIncrementalEncoder)

IncrementalEncoder implementation for the HangulTalk (MacKorean) encoding.

HangulTalk is notorious for frequently not corresponding one-to-one to Unicode. In places, multiple Unicode representations will be accepted for a given HangulTalk representation.

_property XMacKoreanIncrementalEncoder.encoding_map

Other Members

let data_7bit = _DBExtraData7Bit

let data_8bit = _DBExtraData8Bit

let more_dbdata = _MoreDBData