[Neo] RST API character encoding + unicode
Alastair James
al.james at gmail.com
Tue Apr 13 01:01:50 CEST 2010
Hi!
> The example you mentioned: "\u2018Hello world\u2018", is probably properly
> escaped since 2018 is LEFT SINGLE QUOTATION MARK [2], and quotation marks
> should be escaped, but is this the sequence you are experiencing problems
> with? If that gets returned unescaped, that could be a problem, but if it's
> another char, like \u00f6, then it should be allowed to be represented
> unescaped as a proper unicode char.
>
Yes, that is an example string that is failing. However, \u2018 is a fancy
quote (http://www.fileformat.info/info/unicode/char/2018/index.htm) not one
of the quotes that needs to be encoded, so it should not matter if its
returned unencoded or not (its part of the JSON encoding).
> It might still be a bug though, so if you could provide some more details
> that would be great. Also could you check if the truncation of messages
> occurs in your client due to decoding issues, on the server or on the wire.
>
The web service returns a HTTP CONTENT LENGTH header of the same length and
is received by my client, so I think its a bug.
I tell you what I think the issue is, I noticed the enclosed example
contains two unicode charcters, each of which is a three byte sequence in
UTF-8.
The response is exactly 4 characters too short (mising ut"}), so its almost
like the output length is the number of characters not the number of bytes.
Looking at GenericWebService.java -> line 86 -> method addHeaders
builder = builder.header( HttpHeaders.CONTENT_LENGTH,
String.valueOf(
builder.clone().build().getEntity().toString().length() ) );
Won't that set the content length to the number of characters NOT bytes?
Al
More information about the User
mailing list