Source code for the Java Development Kit (JDK) would be redone in UTF-8 (Unicode Transformation Format) to facilitate better-defined encoding, under a plan afoot in the OpenJDK Java community.
The proposal, created in early January and updated on February 28, can be found at bugs.openjdk.org. It describes the current state of source code in the JDK as an “ill-defined encoding,” with no official declaration of the encoding used, while adding it is mostly ASCII but with a few non-ASCII characters that are not well-defined.
The current situation creates unnecessary problems when working with the JDK codebase, for no other reason than historical baggage, the proposal states.
UTF-8, the byte-oriented encoding form of Unicode that is considered the web’s standard for character encoding, was designated the default charset of standard Java APIs, with the release of JDK 18 in March 2022.
The new proposal would convert the codebase in JDK to UTF-8 by taking the following steps:
- Tell Git that text files are encoded in UTF-8.
- Examine the codebase for text files containing non-ASCII characters and convert them to UTF-8 if they are not already UTF-8.
- Update the tools used in building Java to recognize that files now are in UTF-8 and to treat them accordingly, by updating compiler flags.