[Neo] RST API character encoding + unicode

Alastair James al.james at gmail.com
Tue Apr 13 01:01:50 CEST 2010


Hi!


> The example you mentioned: "\u2018Hello world\u2018", is probably properly
> escaped since 2018 is LEFT SINGLE QUOTATION MARK [2], and quotation marks
> should be escaped, but is this the sequence you are experiencing problems
> with? If that gets returned unescaped, that could be a problem, but if it's
> another char, like \u00f6, then it should be allowed to be represented
> unescaped as a proper unicode char.
>

Yes, that is an example string that is failing. However, \u2018 is a fancy
quote (http://www.fileformat.info/info/unicode/char/2018/index.htm) not one
of the quotes that needs to be encoded, so it should not matter if its
returned unencoded or not (its part of the JSON encoding).


> It might still be a bug though, so if you could provide some more details
> that would be great. Also could you check if the truncation of messages
> occurs in your client due to decoding issues, on the server or on the wire.
>

The web service returns a HTTP CONTENT LENGTH header of the same length and
is received by my client, so I think its a bug.

I tell you what I think the issue is, I noticed the enclosed example
contains two unicode charcters, each of which is a three byte sequence in
UTF-8.

The response is exactly 4 characters too short (mising ut"}), so its almost
like the output length is the number of characters not the number of bytes.

Looking at GenericWebService.java -> line 86 -> method addHeaders

builder = builder.header( HttpHeaders.CONTENT_LENGTH,
                String.valueOf(
builder.clone().build().getEntity().toString().length() ) );

Won't that set the content length to the number of characters NOT bytes?

Al


More information about the User mailing list