Problem: non-European characters aren't handled correctly in form posts.
Solution: make sure everything is UTF-8 encoded.
- If you're using JSP, add a charset parameter to your page directives:
<%@ page contentType="text/html; charset=UTF-8" %>
This sets the encoding that the server will use to encode the page, and adds a Content-Type header line to tell the browser how to decode the bytes it gets from the server. It's probably not strictly required if what you're only concerned about posting. But you want the results of your posts to display correctly, right?
- For POST requests:
- In your form tags, add an accept-charset parameter:
<form action="/foo" accept-charset="utf-8">
This tells the browser to encode the user's form input as utf-8. Works with the Struts html:form tag, too.
- Add a request filter that sets the character encoding to utf-8. This tells the server how to decode the form parameters correctly. Otherwise, it will try to decode them as Latin-1.
There is a sample class called SetCharacterEncodingFilter in the Apache Tomcat distribution that will work fine.
- For GET requests: non-ASCII characters in request parameters should be URL-encoded by your browser as unicode bytes. In Tomcat, you can set URIEncoding="UTF-8" in conf/server.xml to make sure it handles these bytes correctly.
I just found
this page, which goes into more detail.