Sunday, November 19, 2006

hCard Search Engine

What?

  • hCards are a microformat version of vCard (contact info)
  • Microformats are "semantic web"
  • Microformats have geek chic

Why?

  • Familiarize me with AWSP / get fresh pair of eyes on it
  • Java sample app for Developer's Corner
  • Showcase uniqueness of AWSP: searching on tag contents

Demo
中国


Sunday, November 12, 2006

Saturday, November 11, 2006

Handling UTF-8 Form Posts

Problem: non-European characters aren't handled correctly in form posts.

Solution: make sure everything is UTF-8 encoded.

  1. If you're using JSP, add a charset parameter to your page directives:

    <%@ page contentType="text/html; charset=UTF-8" %>

    This sets the encoding that the server will use to encode the page, and adds a Content-Type header line to tell the browser how to decode the bytes it gets from the server. It's probably not strictly required if what you're only concerned about posting. But you want the results of your posts to display correctly, right?

  2. For POST requests:

    • In your form tags, add an accept-charset parameter:

      <form action="/foo" accept-charset="utf-8">

      This tells the browser to encode the user's form input as utf-8. Works with the Struts html:form tag, too.

    • Add a request filter that sets the character encoding to utf-8. This tells the server how to decode the form parameters correctly. Otherwise, it will try to decode them as Latin-1.

      There is a sample class called SetCharacterEncodingFilter in the Apache Tomcat distribution that will work fine.

  3. For GET requests: non-ASCII characters in request parameters should be URL-encoded by your browser as unicode bytes. In Tomcat, you can set URIEncoding="UTF-8" in conf/server.xml to make sure it handles these bytes correctly.


I just found this page, which goes into more detail.