Author Topic: CompressString: Character encoding conversion failed?  (Read 723 times)

Chilkat

  • Administrator
  • Full Member
  • *****
  • Posts: 103
  • Karma: +6/-0
    • View Profile
CompressString: Character encoding conversion failed?
« on: December 06, 2017, 09:49:24 AM »
I get the following in the LastErrorText for a call to CompressString:
Code: [Select]
ChilkatLog:
  CompressString:
    DllDate: Nov 27 2017
    ChilkatVersion: 9.5.0.70
    UnlockPrefix: xxx
    Architecture: Little Endian; 64-bit
    Language: .NET 4.6 VS2017 / x64
    VerboseLogging: 0
    Character encoding conversion failed.
    Charset: windows-1252
    ConvertedToNumBytes: 4573
    Set the Charset property equal to an appropriate charset (see http://www.chilkatsoft.com/p/p_463.asp)
    Failed.
  --CompressString
--ChilkatLog

Chilkat

  • Administrator
  • Full Member
  • *****
  • Posts: 103
  • Karma: +6/-0
    • View Profile
Re: CompressString: Character encoding conversion failed?
« Reply #1 on: December 06, 2017, 09:52:04 AM »
For many methods that accept a string input, such as for hashing, encryption, compression, etc., it's
imperative to specify the exact bytes that are hashed/encrypted/compressed/...

When a string is passed, a string is a sequence of characters (i.e. symbols).  For example 'A'.
In us-ascii, 'A' is the byte value 0x41.    In utf-16 it's two bytes 0x0041. 

The charset specifies the byte representation of the string.  See https://www.example-code.com/charset101.asp
In other words, you cannot hash/compress/encrypt a "string", you can only do so with a particular byte
representation of the string.

In this case, the default byte representation is ANSI (which is windows-1252,  i.e. the single-byte per char Western European charset)
This has always been the default.   If, for example, you pass a Korean char in the string, then the conversion
to the windows-1252 byte representation will fail because, since it's a 1-byte per char representation, it's only capable
of defining code points for us-ascii and Western European language chars.  You would need a Charset such as utf-8, utf-16, or
something else..

Chilkat fails the method call if it encounters chars not represented by the Charset.