Forum:

Programmer Certification (OCPJP)

Sybex CSG17 - Chapter 1 Page 35 - The € symbol

Greenhorn

Posts: 16

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

After reading page 35 about the valid identifiers, I got to be curious about whether $, ¥, and € where valid variable names on their own. I wrote a small java program:

To my surprise, when I tried to compile this from Git Bash via javac VariablePlay.java, I got the following exception:

I then commented out lines 5 and 6 and was able to run and get "foo" and "baz" to print out. What I discovered is that I believe my default encoding is causing the euro symbol to not compile. If I instead compile with javac -encoding UTF-8 VariablePlay.java, it successfully compiles and I get "foo", "bar", and "baz".

I did see in the Errata OCP 17 Developer Study Guide topic that there is a comment that the only currency symbol we should expect to see on the OCP 17 exam is the $. So, this is more a comment to the curious observer on this fact. Perhaps something to note in a future version.

Tim Mousaw

Greenhorn

Posts: 16

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

I meant to mention I am running Windows 11 Home.

Stephan van Hulst

Saloon Keeper

Posts: 15619

366

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

What tool did you use to create and save the source file?

Tim Mousaw

Greenhorn

Posts: 16

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

I used Notepad++ and the file itself is UTF-8 encoded. At least, according to Notepad++ it is...

Stephan van Hulst

Saloon Keeper

Posts: 15619

366

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

Java uses your system locale's default encoding as the default when you don't specify it on the command line.

This is appropriate if your source code files are also saved using the system locale's default encoding.

In reality, many different text editors use many different encoding settings. I think in Notepad++ you can configure the encoding to use for new files, so if you set it to the system locale's default encoding, you won't have to specify the -encoding option when calling javac.

But the easiest way to deal with all of this is to simply limit the characters you use in source files to those that can be encoded in 7-bit ASCII.

Tim Mousaw

Greenhorn

Posts: 16

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

In reality, many different text editors use many different encoding settings. I think in Notepad++ you can configure the encoding to use for new files, so if you set it to the system locale's default encoding, you won't have to specify the -encoding option when calling javac.

I checked my preferences in Notepad++ using Settings -> Preferences -> New Document and the default is already set to UTF-8. So, I'm confused why this doesn't just work assuming javac pays attention to the encoding of my source file.

But the easiest way to deal with all of this is to simply limit the characters you use in source files to those that can be encoded in 7-bit ASCII.

Unless I'm mistaken, there's no way to render € with 7-bit ASCII. While I understand what you are saying, limiting myself to 7-bit ASCII seems to make the point my post impossible.

Stephan van Hulst

Saloon Keeper

Posts: 15619

366

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

Tim Mousaw wrote:So, I'm confused why this doesn't just work assuming javac pays attention to the encoding of my source file.

Your mistake is in thinking that javac auto-detects the encoding of a source file. It doesn't, and for a good reason: you can't reliably detect the encoding of a source file.

Let's pretend I saved my source code file using an encoding that looks almost exactly like UTF-8, but the byte sequence that encodes '€' in this special encoding actually translates to '¿' in UTF-8.

If javac tries to auto-detect the encoding of my source code file, it might mistakenly detect UTF-8 and think my source code file contains '¿', when I really intended it to interpret the character as '€'.

Text files do not store any metadata about the encoding that was used to encode the text they contain, so there is no reliable way for applications to know how to read a text file. When you look at the encoding of a text file in the bottom right corner of Notepad++, that is not actually a property of the text file itself. It's just an educated guess that Notepad++ made.

So javac has two ways to deal with this:

Assume that the source file uses the same encoding as your system locale's default.

Have you tell it the encoding to use through the -encoding option.

On Windows, the system locale's default encoding is almost never UTF-8. Instead, it is probably some Microsoft variant of an ANSI codepage.

So, you saved your source file as UTF-8, with the Euro sign encoded as [0xE2, 0x82, 0xAC], but without using the -encoding option, javac might interpret that byte sequence as "â‚¬" instead of "€", if your system default codepage is Windows-1252.

Unless I'm mistaken, there's no way to render € with 7-bit ASCII. While I understand what you are saying, limiting myself to 7-bit ASCII seems to make the point my post impossible.

Which means you are forced to supply the -encoding option explicitly. Build tools like Maven do this for you automatically.

Campbell Ritchie

Marshal

Posts: 79422

377

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

What an interesting problem! I wish I had seen this thead earlier. Have a cow for bringing it up I have seen similar error messages when using a word processor and getting " changed to “ or ”.

Tim Mousaw wrote:. . . whether $, ¥, and € where valid variable names . . . .

Let's look in the JLS (=Java® Language Specification. It refers to this method, which says it includes currency symbols, as you have already found out. I tried copying'n'pasting your class onto JShell, and it compiled first try.
Agree with Stephan; it is an encoding problem. I can't reproduce it because I hardly ever use Windows®, and my computer usually defaults to UTF‑8. Try changing the encoding to platform default and to UTF‑16 and see what happens. You can encode neither € nor ¥ in 7‑bit ASCII, and their encoding in UTF‑8 is different from in UTF‑16. Stephan has told you what the actual encoding is.

Tim Mousaw

Greenhorn

Posts: 16

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

Try changing the encoding to platform default and to UTF‑16 and see what happens.

Notepad++ supports both UTF-16 big-endian and little-endian. I updated the default to these and compiled the same program and got many errors. I'd include the errors below, but it makes the post too long. If I compile either of these files with the -encoding UTF-16, they compile and run just fine in Git Bash.

I also tried the same file in Ubuntu 22.04.3 LTS on WSL 2. Created it using the vi editor. It compiled and ran on my Linux installation no problem without specifying the encoding.

Campbell Ritchie

Marshal

Posts: 79422

377

posted 3 weeks ago

Number of slices to send:

Optional 'thank-you' note:

Send

Tim Mousaw wrote:. . . Created it using the vi editor. . . .

Carfeul. You might start a vi versus emacs argument
Obviously on a Linux box the default encoding is UTF‑8 and Java® assumed that when compiling your file.