Wednesday, February 16, 2011

unmappable character for encoding UTF8

Maven daily build starts to throw error for many Java source files with such an error "unmappable character for encoding UTF8". The message itself is complaining some characters are not good with javac -encoding UTF-8 which is the default encoding for our Linux build server.

Here is one example of these characters:
 * <p/>
 * <!¡ª0 indicate success ,1- indicate failure¨¤
 * <p/>
 * <status> 0 or 1<status>
 * <p/>
 * <!¡ªif status is 1 ,then have the following node¨¤


Here are a couple of options to resolve/work-around this issue:
  1. Remove these characters from the source codes
  2. Use utf-8 representations like copyright sign '\u00a9'
  3. Add -encoding ISO-8859-1 parameter to javac command (cp1252, Latin-1 are equivalent encoding)
  4. Maven: add property  <project.build.sourceEncoding>ISO-8859-1</project.build.sourceEncoding> 
  5. Ant: Add encoding to build script
  6. <javac srcdir="src" destdir="classes"   encoding="ISO-8859-1" debug="true" />     
  7. Save java source file as UTF-8 (Developer can configure this in Eclipse IDE like below)

P.S.
Found the change caused the issue, we use maven-compiler-plugin, previously it was 1.5 version which was ok (with cp1252), but after change to 1.6, it will result in compilation error. We can add <encoding> to fix that because different JDK version might have different default encoding.
           <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>

                    <encoding>ISO-8859-1</encoding>
                </configuration>
            </plugin>

No comments:

Post a Comment