Tim Mousaw wrote:So, I'm confused why this doesn't just work assuming javac pays attention to the encoding of my source file.
Your mistake is in thinking that
javac auto-detects the encoding of a source file. It doesn't, and for a good reason: you
can't reliably detect the encoding of a source file.
Let's pretend I saved my source code file using an encoding that looks almost exactly like
UTF-8, but the byte sequence that encodes
'€' in this special encoding actually translates to
'¿' in
UTF-8.
If
javac tries to auto-detect the encoding of my source code file, it might mistakenly detect
UTF-8 and think my source code file contains
'¿', when I really intended it to interpret the character as
'€'.
Text files do not store any metadata about the encoding that was used to encode the text they contain, so there is no reliable way for applications to know how to read a text file. When you look at the encoding of a text file in the bottom right corner of Notepad++, that is not actually a property of the text file itself. It's just an educated guess that Notepad++ made.
So
javac has two ways to deal with this:
Assume that the source file uses the same encoding as your system locale's default.Have you tell it the encoding to use through the -encoding option.
On Windows, the system locale's default encoding is almost never
UTF-8. Instead, it is probably some Microsoft variant of an ANSI codepage.
So, you saved your source file as
UTF-8, with the Euro sign encoded as
[0xE2, 0x82, 0xAC], but without using the
-encoding option,
javac might interpret that byte sequence as
"€" instead of
"€", if your system default codepage is
Windows-1252.
Unless I'm mistaken, there's no way to render € with 7-bit ASCII. While I understand what you are saying, limiting myself to 7-bit ASCII seems to make the point my post impossible.
Which means you are forced to supply the
-encoding option explicitly. Build tools like
Maven do this for you automatically.