1. About Me
  2. Links
    1. Home
    2. Archives
  3. Sites I Dig
    1. Twitter
    2. GitHub
    3. Agile Alliance
    4. Agile Iowa

Tomcat, Weak ETags, and JavaScript/CSS Caching

Being known as the HTTP/REST guy in the company, I was pulled into an interesting conversation about CSS/JavaScript caching issues this week and was fortunate enough to learn a couple of things on the way. It seems to be a common (anti)pattern in the Java world to consistently fight JavaScript browser caching issues by simply adding a query parameter to your script tags:
<script type="text/javascript"
        src="myscript.js?version=<%= Application.VERSION %>">
</script>
I've seen (and used) a number of variations on the same theme, including going so far as to create a JSP custom tag for more advanced schemes. The obvious solution is to use the Last-Modified header with the timestamp of the file. According to the spec, the client can utilize this value to create a conditional GET request by adding the If-Modified-Since header. If we look at all of the major browsers, they dutifully follow this pattern by sending the If-Modified-Since header the next time the resource is requested. The "workflow" goes something like this:
GET /some-resource.html

HTTP/1.1 200 OK
Last-Modified: Wed, 26 Sep 2007 04:58:08 GMT

<html>
   <head><title>Some Resource</title></head>
   <body></body>
</html>
The next time the browser asks for the file and the file remains unchanged:
GET /some-resource.html
If-Modified-Since: Wed, 26 Sep 2007 04:58:08 GMT

HTTP/1.1 304 Not Modified
Last-Modified: Wed, 26 Sep 2007 04:58:08 GMT
Note the use of the 304 status code and no message body. This is an indication to the client that it is free to use the cached version of the resource. What if the file has changed? This is as simple as returning the content with an updated Last-Modified date as seen below:
GET /some-resource.html
If-Modified-Since: Wed, 26 Sep 2007 04:58:08 GMT

HTTP/1.1 200 OK
Last-Modified: Thu, 27 Sep 2007 05:00:00 GMT

<html>
   <head><title>Some Updated Resource</title></head>
   <body></body>
</html>
Now that the server has returned the updated resource, the client should use update its caches with the latest version and Last-Modified information. Easy huh? Well you'd think so... This works fine on most browsers, and unfortunately it doesn't work quite as you would expect in IE. Using Fiddler you can track what's actually going on and see that IE ignores the 200 + content returned via the conditional GET and takes the version of the resource from cache anyway! This behavior in IE is, in my experience, the cause of many of our caching woes. Fortunately, there is a lesser known cousin to Last-Modified that IE supports pretty well. It's a HTTP header known as ETag (Entity tag). ETags are also used to identify whether a resource has changed, and can be created a number of ways, including taking a hash of the response body or serializing the Last-Modified timestamp. The same workflow is used for ETag processing, but with a couple of different headers:
GET /some-resource.html

HTTP/1.1 200 OK
ETag: "1234567890"

<html>
   <head><title>Some Resource</title></head>
   <body></body>
</html>
GET /some-resource.html
If-None-Match: "123456789"

HTTP/1.1 304 Not Modified
ETag: "1234567890"
GET /some-resource.html
If-None-Match: "123456789"

HTTP/1.1 200 OK
ETag: "0987654321"

<html>
   <head><title>Some Updated Resource</title></head>
   <body></body>
</html>
Notice, same workflow, different headers. The difference in this case is that IE handles the 200 as expected, replacing the cached version with the new content and updating the ETag metadata in the cache for this resource. So to properly handle caching in IE all we have to do is set the ETag for JavaScript files. But how do we do that... Well, the title of the post mentioned Tomcat, and this is where we actually talk about it. As it turns out, there's a "dark side" to ETag processing. Something called a Weak ETag. Weak ETags are prefaced with a "W/" and would look like this from our above example:
ETag: W/"1234567890"
The notion of a "Weak" ETag as it states in the spec is
a weak value changes whenever the meaning of an entity changes
As I interpret it, let's say that you're downloading a Java source file via HTTP. You could take a hash of the program, excluding comments and whitespace and return this as a Weak ETag. Subsequent updates to the comments or formatting of the document would not change the actual "meaning" of the returned result. If the code itsself changed, however a new Weak ETag would be generated and returned. Weak ETags are not, as far as I can tell, very well supported by browsers. The problem is that Tomcat shows loyalties to the dark side when it comes to static content. Tomcat's FileDirContext class does not populate the ETag for static content, leaving the decision about an ETag to DefaultServlet. DefaultServlet simply generates a Weak ETag (by concatenating the content length and the last modified time in milliseconds), sending it back to the browser to basically be ignored. In my quest to figure out how to prevent these cache problems I turned to Google and the Tomcat source for help. I was hoping to find a configuration setting to prevent the Weak ETag behavior I was seeing for static content but turned up nothing. Instead I found a little gem hiding in the context.xml configuration file. The Resources Element As it turns out, you can configure your own context for serving static content using the Resources element in context.xml It looks like this:
<context>
     <Resources className="org.example.StrongETagDirContext" />
     ...
</context>
I extended the FileDirContext class and overrode the getAttributes() method:
public Attributes getAttributes() {
   ResourceAttributes r = (ResourceAttributes) super.getAttributes();

   int cl = r.getContentLength();
   long lmt = r.getLastModifiedTime();
   
   String strongETag = String.format("\"s%-s%\"", cl, lmt);
   r.setETag( strongETag );
}
This associates a strong ETag (instead of Tomcat's default Weak ETags) with each static resource served up. Now we have a conditional GET request that behaves well in all browsers and we can get rid of those hacks we've been using forever. I'm not sure Tomcat is doing the right thing with using Weak ETags by default, and I'll probably post some of these comments to the Tomcat mailing list for consideration, but for now I've got caching behaving as I would expect.