- JA:
tw: failures: [["x ", "x"], ["X ", "X"], ["xゞx", "xヽ"]]
icu: failures: [["x ", "x"], ["X ", "X"]]
Character 'ゞ'
, code point 0x309E
, is not in NFD (its normalized version is 0x309D 0x3099
), but there is an entry for denormalized version of this string in FCE table - 309E; [0E 25, 05, 05][, DA 95, 05]
. As all strings are normalized first, we don't use this entry, but instead build collation elements for this character from CE's for 0x309D
and 0x3099
that are [0E 25, 05, 05]
and [, DA 95, 05]
. That doesn't cause any issue in the default locale, because the results are identical. But when 'ゝ'
(code point 0x309D
) is tailored from [0E 25, 05, 05]
to [0E 29, 5, 5]
in JA locale we get wrong [0E 29, 05, 05][, DA 95, 05]
collation elements for 'ゞ'
.
Only one test failure, but in practice there might be more cases like this one. The problem is that FCE table contains denormalized code points and as we normalize all strings before collation we fail to find collation elements. It's a bit unexpected and I'm not sure how we can fix it.
Tests failures for all other locales are identical to the ones of ICU, that might be considered a good result if we think of ICU as a reference implementation.
Hey @KL-7, I've got a few small corrections for this (awesome) writeup:
Otherwise, this rocks. Thanks!