SID: locale qualifier: Add Extension U example to BCP 47#857
SID: locale qualifier: Add Extension U example to BCP 47#857bact wants to merge 1 commit intopackage-url:mainfrom
Conversation
Should locale support BCP 47 extensions (Extension T and Extension U) as well? BCP 47 Extension U (RFC 6067) is needed if we to support information beyond language - like digit and calendar systems. For example, Sun/Oracle Java SE 8 has a ja_JP_JP locale variant that uses Japanese imperial calendar as default calendar. https://docs.oracle.com/javase/8/docs/technotes/guides/intl/calendar.doc.html This could be expressed in BCP 47 with Extension U as ja-JP-u-ca-japanese. https://datatracker.ietf.org/doc/rfc6067/ Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com> Signed-off-by: Arthit Suriyawongkul <arthit@gmail.com>
|
This is kind of complicated, and maybe there is some elaboration that needs to happen for the locale field. It makes sense to me that for software that is distributed in multiple locales and locale selection results in different files, eg for Windows you would want a locale qualifier to specify that you used Win10_22H2_Japanese_x64v1.iso and not Win10_22H2_German_x64v1.iso. However, I'm pretty sure there is only one Java distribution and ja_JP_JP is a configuration option that you apply when using that distribution. In that case, specifying ja-JP-u-ca-japanese probably isn't necessary. I wouldn't consider it to be a distinct software package, at least. Maybe there's something I'm missing. I didn't know there was a locale qualifier for this. However, if it were a distinct software package, using BCP47 would mean that somebody naming Java SE 8 ja_JP_JP would need to know ja_JP_JP should be written ja-JP-u-ca-japanese in the PURL, and I would assume very few people know this. Likewise, somebody naming Windows would need to know that Japanese should be written ja, English should be written as en-US, and EnglishInternational should be written as en, and that's likely to cause confusion. Or should it? IIRC because Microsoft is an American company it uses American English for the en locale (or the invariant locale) on the English disk and then puts several other locales like en-GB en-IN on the EnglishInternational disk *. I don't know how you would specify the case where the distribution you're using is affected by locale, but the way the vendor divides the supported locales into distributions does not align with the way ISO/IETF structured the locale codes. Maybe the locale field is should not mention ISO 639-1 and BCP 47 and instead it should be the locale as the vendor names it. Another potential issue is, if a vendor has a distribution for the invariant locale, and then another distribution for specific locales, how should the invariant locale be named? The locale qualifier can't be omitted or empty because then the reader can't tell the difference between invariant and just unspecified. But then unless there is a designated value for invariant it's left up to whoever is assembling the SBOM and there will be different values filled in for the same distribution (eg zh or zh-Hans or zh-Hans-CN are all potentially valid if the invariant locale is based on zh-Hans-CN). * This may be an oversimplification. The one EnglishInternational disk can include multiple installation images, and I don't remember if that's the case for all the different locales on the disk. It may not be a good example, but I'm sure there is software out there that has one distribution for the primary locale and another "international" distribution for all other locales, or software that has one specific distribution for all CJK languages. |
|
Thank you @matt-phylum - your observation is correct. I have rechecked, the But here we go, Wayback Machine saves us (Thanks to the Internet Archive). It is Thai localization of Java 2 (1.2.2) and it is considered a separated package, not just a configuration. Back then (2000), when Sun Microsystems Thailand office feels the local market pressure to have Thai support on Java but the cycles at the US headquarters was too slow (Java 2 SE 1.3 was about to release and new features can't be accepted), two offices somehow agreed to distribute Java 2 SE v.1.2.2 Thai Localization as separated packages (for Solaris and for Windows). These packages can only be downloaded from http://www.sun.co.th/developers/thailand (wayback machine) and not on the global Sun or Java websites. In this case, their locale qualifier could be These bug reports can confirm the separated status of this special-arrangement edition, as the reporters found that some of the features implemented in the localized v1.2.2 version are missing in the main line v1.4: |
It could be implied from the existing text that mentioned the use of BCP 47, but we can make it more explicit that BCP 47 here also include its extensions as well (currently it has two extensions: Extension T and Extension U).
Extension U (RFC 6067) is required if we want to support locale information beyond language - like digit and calendar systems.
For example, Sun/Oracle Java SE 8 has a
ja_JP_JPlocale variant that uses Japanese imperial calendar as default calendar -- this could be expressed in BCP 47 with Extension U asja-JP-u-ca-japanese.This PR specifies BCP 47 Extension U for the
localequalifier and add an Extension U example for illustration.