This module includes some additional variable-width or wide encodings not specified by WHATWG.
As such, none of the codecs in this module should be used in HTML.
Classes
_class AsciiJohabIncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for the PC Johab encoding (code page 1361).
_property AsciiJohabIncrementalDecoder.decoding_map
_class AsciiJohabIncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for the PC Johab encoding (code page 1361).
_property AsciiJohabIncrementalEncoder.encoding_map
_class Big5NonEtenKanaIncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for Big5 with non-ETEN layout of kana, Cyrillic, list markers.
The other ETEN extension section (the one retained by Microsoft's version) is still included.
Although this is the kana/Cyrillic/list marker layout included in the UTC's BIG5.TXT, it is the less common of the two (most extension schemes for Big5 use the ETEN layout), and has several problems (katakana lacks the vowel extender, and Cyrillic lacks several capitals) which the ETEN layout does not have. However, this codec corresponds roughly to Python's big5
, and more closely to its (built-in, as opposed to if/when Python aliases it to mbcs
) cp950
.
_property Big5NonEtenKanaIncrementalDecoder.decoding_map
_class Big5NonEtenKanaIncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for Big5 with non-ETEN layout of kana, Cyrillic, list markers.
The other ETEN extension section (the one retained by Microsoft's version) is still included.
Although this is the kana/Cyrillic/list marker layout included in the UTC's BIG5.TXT, it is the less common of the two (most extension schemes for Big5 use the ETEN layout), and has several problems (katakana lacks the vowel extender, and Cyrillic lacks several capitals) which the ETEN layout does not have. However, this codec corresponds roughly to Python's big5
, and more closely to its (built-in, as opposed to if/when Python aliases it to mbcs
) cp950
.
_property Big5NonEtenKanaIncrementalEncoder.encoding_map
_class Cesu8IncrementalDecoder(Utf8IncrementalDecoder)
IncrementalDecoder implementation for CESU-8, a deprecated UTF-8-like encoding still used by some systems, such as TCL, and still mis-called "utf8" in some places for legacy reasons.
_Cesu8IncrementalDecoder._error_handler(error)
_class Cesu8IncrementalEncoder(IncrementalEncoder)
IncrementalEncoder implementation for CESU-8, a deprecated UTF-8-like encoding still used by some systems, such as TCL, and still mis-called "utf8" in some places for legacy reasons.
_Cesu8IncrementalEncoder.encode(string,final)
Implements IncrementalEncoder.encode
_Cesu8IncrementalEncoder.getstate()
Implements IncrementalEncoder.getstate
_Cesu8IncrementalEncoder.reset()
Implements IncrementalEncoder.reset
_Cesu8IncrementalEncoder.setstate(state)
Implements IncrementalEncoder.setstate
_class EbcdicJohabIncrementalDecoder(BaseEbcdicIncrementalDecoder)
_class EbcdicJohabIncrementalEncoder(BaseEbcdicIncrementalEncoder)
_class EucJis2004IncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for the JIS X 0213 version of EUC-JP.
_property EucJis2004IncrementalDecoder.decoding_map
_class EucJis2004IncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for the JIS X 0213 version of EUC-JP.
_property EucJis2004IncrementalEncoder.encoding_map
_class EucJpFullIncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for EUC-JP, including JIS X 0212.
_property EucJpFullIncrementalEncoder.encoding_map
_class HzIncrementalDecoder(IncrementalDecoder)
IncrementalDecoder implementation for HZ-GB-2312 (Usenet simplified Chinese).
This is an old scheme for embedding GB 2312 data into a pure ASCII stream.
_HzIncrementalDecoder.decode(data_in,final)
Implements IncrementalDecoder.decode
_HzIncrementalDecoder.getstate()
Implements IncrementalDecoder.getstate
_HzIncrementalDecoder.reset()
Implements IncrementalDecoder.reset
_HzIncrementalDecoder.setstate(state)
Implements IncrementalDecoder.setstate
_class HzIncrementalEncoder(IncrementalEncoder)
IncrementalEncoder implementation for HZ-GB-2312 (Usenet simplified Chinese).
This is an old scheme for embedding GB 2312 data into a pure ASCII stream.
_HzIncrementalEncoder.encode(string,final)
Implements IncrementalEncoder.encode
_HzIncrementalEncoder.ensure_state_number(state,out)
_HzIncrementalEncoder.getstate()
Implements IncrementalEncoder.getstate
_HzIncrementalEncoder.reset()
Implements IncrementalEncoder.reset
_HzIncrementalEncoder.setstate(state)
Implements IncrementalEncoder.setstate
_class Iso2022CnIncrementalDecoder(Iso2022NonJpIncrementalDecoder)
IncrementalDecoder implementation for ISO-2022-CN (7-bit stateful Chinese).
ISO-2022-CN-Ext is not included (it requires a much larger set of tables and is very rare).
_property Iso2022CnIncrementalDecoder.decodes
_class Iso2022CnIncrementalEncoder(Iso2022NonJpIncrementalEncoder)
IncrementalEncoder implementation for ISO-2022-CN (7-bit stateful Chinese).
ISO-2022-CN-Ext is not included (it requires a much larger set of tables and is very rare).
_property Iso2022CnIncrementalEncoder.encodes
_class Iso2022Jp1IncrementalEncoder(Iso2022JpIncrementalEncoder)
IncrementalEncoder implementation for 7-bit stateful Japanese with JIS X 0212.
This differs from the ISO-2022-JP encoder in that it will encode to JIS X 0212, and does so whenever possible (i.e. it will favour it over any web extensions to JIS X 0208).
_property Iso2022Jp1IncrementalEncoder.encodes_dbcs
_property Iso2022Jp1IncrementalEncoder.encodes_sbcs
_class Iso2022Jp2004IncrementalEncoder(Iso2022JpIncrementalEncoder)
_class Iso2022Jp2IncrementalEncoder(Iso2022JpIncrementalEncoder)
IncrementalEncoder implementation for 7-bit stateful Japanese with multilingual extensions.
_property Iso2022Jp2IncrementalEncoder.encode_supershift_greek
_property Iso2022Jp2IncrementalEncoder.encode_supershift_latin
_property Iso2022Jp2IncrementalEncoder.encodes_dbcs
_property Iso2022Jp2IncrementalEncoder.encodes_sbcs
_class Iso2022Jp3IncrementalEncoder(Iso2022JpIncrementalEncoder)
_class Iso2022JpExtIncrementalEncoder(Iso2022JpIncrementalEncoder)
_class Iso2022KrIncrementalDecoder(Iso2022NonJpIncrementalDecoder)
IncrementalDecoder implementation for ISO-2022-KR (7-bit stateful Korean, South).
_property Iso2022KrIncrementalDecoder.decodes
_class Iso2022KrIncrementalEncoder(Iso2022NonJpIncrementalEncoder)
_class Iso2022NonJpIncrementalDecoder(IncrementalDecoder)
IncrementalDecoder subclass, base class for ISO-2022-KR and ISO-2022-CN. Not used directly.
_Iso2022NonJpIncrementalDecoder.decode(data_in,final)
Implements IncrementalDecoder.decode
_Iso2022NonJpIncrementalDecoder.getstate()
Implements IncrementalDecoder.getstate
_Iso2022NonJpIncrementalDecoder.reset()
Implements IncrementalDecoder.reset
_Iso2022NonJpIncrementalDecoder.setstate(state)
Implements IncrementalDecoder.setstate
_class Iso2022NonJpIncrementalEncoder(IncrementalEncoder)
IncrementalEncoder subclass, base class for ISO-2022-KR and ISO-2022-CN. Not used directly.
_Iso2022NonJpIncrementalEncoder.encode(string,final)
Implements IncrementalEncoder.encode
_Iso2022NonJpIncrementalEncoder.ensure_shift_designation(state,out)
_Iso2022NonJpIncrementalEncoder.ensure_shift_state(state,out)
_Iso2022NonJpIncrementalEncoder.ensure_super3_designation(state,out)
_Iso2022NonJpIncrementalEncoder.ensure_super_designation(state,out)
_Iso2022NonJpIncrementalEncoder.getstate()
Implements IncrementalEncoder.getstate
_Iso2022NonJpIncrementalEncoder.reset()
Implements IncrementalEncoder.reset
_Iso2022NonJpIncrementalEncoder.run_prelude(out)
_Iso2022NonJpIncrementalEncoder.setstate(state)
Implements IncrementalEncoder.setstate
_class JapaneseAutodetectIncrementalDecoder(IncrementalDecoder)
IncrementalDecoder implementation for the automatic "Japanese" character encoding option.
This will attempt to interpret the stream as the web versions of ISO-2022-JP, Shift_JIS and EUC-JP, as well as UTF-8, at once, and start returning the data once it has narrowed it down to one. If it fails to narrow it down conclusively, it will wait until the final call before making an educated guess. If it doesn't seem to be any of them, it will raise ValueError
.
_let x = JapaneseAutodetectIncrementalDecoder(errors)
_JapaneseAutodetectIncrementalDecoder.decode(data,final)
Implements IncrementalDecoder.decode
_JapaneseAutodetectIncrementalDecoder.getstate()
Implements IncrementalDecoder.getstate
_JapaneseAutodetectIncrementalDecoder.reset()
Implements IncrementalDecoder.reset
_JapaneseAutodetectIncrementalDecoder.setstate(state)
Implements IncrementalDecoder.setstate
_class JisEncodingIncrementalDecoder(Iso2022JpIncrementalDecoder)
IncrementalDecoder implementation for 7-bit stateful Japanese.
This is differs from the ISO-2022-JP decoder in that it will:
- Decode 1978 JIS with a separate table, including 1978 JIS, NEC extensions and IBM backports.
- Accept and decode extensions from ISO-2022-JP-2 (and -1), ISO-2022-JP-3 and ISO-2022-JP-2004.
- Not generate an error for immediately concatenated JIS-Kanji→ASCII→JIS-Kanji designations.
- Accept katakana via Shift Out / Shift In.
This is used as the decoder for all other ISO-2022-JP variants besides plain ISO-2022-JP.
_property JisEncodingIncrementalDecoder.decode_shiftout
_property JisEncodingIncrementalDecoder.decode_supershift_greek
_property JisEncodingIncrementalDecoder.decode_supershift_latin
_property JisEncodingIncrementalDecoder.decodes_dbcs
_property JisEncodingIncrementalDecoder.decodes_sbcs
_class JisEncodingIncrementalEncoder(Iso2022JpIncrementalEncoder)
IncrementalEncoder implementation for 7-bit stateful Japanese with all features.
This differs from the ISO-2022-JP encoder in that it will:
- Encode forms present in 1978 JIS but simplified by (and absent in) 1983 JIS to 1978 JIS.
- For characters not present in either table, try JIS X 0212, 2000 JIS and 2004 JIS in that order.
- For characters not present in any JIS set, try GB 2312 and Wansung.
- Preserve width of katakana.
_property JisEncodingIncrementalEncoder.encode_supershift_greek
_property JisEncodingIncrementalEncoder.encode_supershift_latin
_property JisEncodingIncrementalEncoder.encodes_dbcs
_property JisEncodingIncrementalEncoder.encodes_sbcs
_class ShiftJis2004IncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for the JIS X 0213 version of Shift_JIS.
_property ShiftJis2004IncrementalDecoder.decoding_map
_class ShiftJis2004IncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for the JIS X 0213 version of Shift_JIS.
_property ShiftJis2004IncrementalEncoder.encoding_map
_class Utf32BeIncrementalDecoder(Utf32IncrementalDecoder)
IncrementalDecoder implementation for UTF-32, big endian, without a byte order mark.
_class Utf32BeIncrementalEncoder(Utf32IncrementalEncoder)
IncrementalEncoder implementation for UTF-32, big endian, without a byte order mark.
_class Utf32IncrementalDecoder(IncrementalDecoder)
IncrementalDecoder implementation for UTF-32, detected byte order, removing any byte order mark.
_Utf32IncrementalDecoder.decode(data_in,final)
Implements IncrementalDecoder.decode
_Utf32IncrementalDecoder.getstate()
Implements IncrementalDecoder.getstate
_Utf32IncrementalDecoder.reset()
Implements IncrementalDecoder.reset
_Utf32IncrementalDecoder.setstate(state)
Implements IncrementalDecoder.setstate
_class Utf32IncrementalEncoder(IncrementalEncoder)
IncrementalEncoder implementation for UTF-32 with byte order mark.
_Utf32IncrementalEncoder.encode(string,final)
Implements IncrementalEncoder.encode
_Utf32IncrementalEncoder.getstate()
Implements IncrementalEncoder.getstate
_Utf32IncrementalEncoder.push_word(word,out)
_Utf32IncrementalEncoder.reset()
Implements IncrementalEncoder.reset
_Utf32IncrementalEncoder.setstate(state)
Implements IncrementalEncoder.setstate
_property Utf32IncrementalEncoder.encoding_map
_class Utf32LeIncrementalDecoder(Utf32IncrementalDecoder)
IncrementalDecoder implementation for UTF-32, little endian, without a byte order mark.
_class Utf32LeIncrementalEncoder(Utf32IncrementalEncoder)
IncrementalEncoder implementation for UTF-32, little endian, without a byte order mark.
_class Utf7IncrementalDecoder(IncrementalDecoder)
IncrementalDecoder implementation for UTF-7, a largely obsolete (and forbidden in HTML5) scheme for mixing ASCII with Base64'd UTF-16BE in e-mail.
_let x = Utf7IncrementalDecoder(errors)
_Utf7IncrementalDecoder.decode(data_in,final)
Implements IncrementalDecoder.decode
_Utf7IncrementalDecoder.getstate()
Implements IncrementalDecoder.getstate
_Utf7IncrementalDecoder.reset()
Implements IncrementalDecoder.reset
_Utf7IncrementalDecoder.setstate(state)
Implements IncrementalDecoder.setstate
_class Utf7IncrementalEncoder(IncrementalEncoder)
IncrementalEncoder implementation for UTF-7, a largely obsolete (and forbidden in HTML5) scheme for mixing ASCII with Base64'd UTF-16BE in e-mail.
_let x = Utf7IncrementalEncoder(errors)
_Utf7IncrementalEncoder.encode(data,final)
Implements IncrementalEncoder.encode
_Utf7IncrementalEncoder.getstate()
Implements IncrementalEncoder.getstate
_Utf7IncrementalEncoder.reset()
Implements IncrementalEncoder.reset
_Utf7IncrementalEncoder.setstate(state)
Implements IncrementalEncoder.setstate
_class XMacChineseSimpIncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for EUC-CN, Apple version (hence slightly reduced lead byte range).
Mappings to more-recently added characters are used for the vertical forms, rather than Apple transcoding hints (or GB18030 private use codes).
_property XMacChineseSimpIncrementalDecoder.decoding_map
_class XMacChineseSimpIncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for EUC-CN, Apple version (hence slightly reduced lead byte range).
Mappings to more-recently added characters are used for the vertical forms, rather than Apple transcoding hints (or GB18030 private use codes).
_property XMacChineseSimpIncrementalEncoder.encoding_map
_class XMacChineseTradIncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for Big5 with Apple's additions and reduced lead byte range.
The Unicode mappings are partly changed to be closer to Apple's (as opposed to Microsoft's) correspondences; however, Microsoft's are retained where following Apple's would have required PUA transcoding hints to round-trip.
_property XMacChineseTradIncrementalDecoder.decoding_map
_class XMacChineseTradIncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for Big5 with Apple's additions and reduced lead byte range.
The Unicode mappings are partly changed to be closer to Apple's (as opposed to Microsoft's) correspondences; however, Microsoft's are retained where following Apple's would have required PUA transcoding hints to round-trip.
_property XMacChineseTradIncrementalEncoder.encoding_map
_class XMacKoreanIncrementalDecoder(AsciiIncrementalDecoder)
IncrementalDecoder implementation for the HangulTalk (MacKorean) encoding.
HangulTalk is notorious for frequently not corresponding one-to-one to Unicode; the mappings used here are somewhat updated and improved compared to all versions of Apple's mappings and especially the Adobe CID mappings. However, bear in mind that content will not necessarily be decoded to the same Unicode sequences as by other implementations. In particular, decoding to the Apple's Corporate Private Use Area has been avoided for the most part, even where this results in poorly matched and/or convergent decoded forms, since preserving legibility has been afforded greater priority than round tripping.
_property XMacKoreanIncrementalDecoder.decoding_map
_class XMacKoreanIncrementalEncoder(AsciiIncrementalEncoder)
IncrementalEncoder implementation for the HangulTalk (MacKorean) encoding.
HangulTalk is notorious for frequently not corresponding one-to-one to Unicode. In places, multiple Unicode representations will be accepted for a given HangulTalk representation.
_property XMacKoreanIncrementalEncoder.encoding_map