ETags in Akka HTTP

I have recently been involved in implementing ETags and Last-Modified header support in one of the services based on Akka-http.

I have prepared a quite comprehensive example project that shows how to implement those capabilities in Akka-http based projects.

In this post I’ll describe in a practical manner what ETags are and how to support them in your own projects.

Side note: I’ll focus on ETags and have a section on Last-Modified header at the end.

Quick introduction to ETags

ETag is basically a additional HTTP header that’s returned by the server that can be treated like a checksum of the response.
The client can later use this value when sending consecutive requests to the same endpoint to indicate what version of the resource it has seen before.

Based on the value of ETag provided by the client, the server can decide not to return the HTTP body, and indicate this by returning HTTP code 304 – Not Modified.

When client receives back a 304 response this means that the resource that client has received previously is still up-to-date and there is no need to send it again by the server.

Note that this approach requires the HTTP client (or library) to keep cached responses on it’s side, and in case of 304 response, the data should be read from there.

Wikipedia has a very good article on ETags

Note also that the server sets value of the ETag header, and clients should use If-None-Match

First request
curl -v http://localhost:8080/books-etags/1
...
> GET /books-etags/1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
> 
< HTTP/1.1 200 OK
< ETag: W/"1e8ec132952ddfe628c6e2ff6a66d843"
< Last-Modified: Tue, 25 Oct 2016 12:45:00 GMT
* Server akka-http/10.0.0 is not blacklisted
< Server: akka-http/10.0.0
< Date: Fri, 25 Nov 2016 10:43:43 GMT
< Content-Type: application/json
< Content-Length: 197
...

And the body follows

Second Request
curl -v -H "If-None-Match: W/\"1e8ec132952ddfe628c6e2ff6a66d843\""  http://localhost:8080/books-etags/1
...
> GET /books-etags/1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
> If-None-Match: W/"1e8ec132952ddfe628c6e2ff6a66d843"
> 
< HTTP/1.1 304 Not Modified
< ETag: W/"1e8ec132952ddfe628c6e2ff6a66d843"
< Last-Modified: Tue, 25 Oct 2016 12:45:00 GMT
* Server akka-http/10.0.0 is not blacklisted
< Server: akka-http/10.0.0
< Date: Fri, 25 Nov 2016 10:47:26 GMT
< 

Second response doesn’t have any body

Third request

After the resource on the server was updated:

curl -v -H "If-None-Match: W/\"1e8ec132952ddfe628c6e2ff6a66d843\""  http://localhost:8080/books-etags/1
...
> GET /books-etags/1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
> If-None-Match: W/"1e8ec132952ddfe628c6e2ff6a66d843"
> 
< HTTP/1.1 200 OK
< ETag: W/"3049afc59dbff41160acfcbb0f3273ec"
< Last-Modified: Tue, 25 Oct 2016 12:45:00 GMT
* Server akka-http/10.0.0 is not blacklisted
< Server: akka-http/10.0.0
< Date: Fri, 25 Nov 2016 10:55:29 GMT
< Content-Type: application/json
< Content-Length: 197
< 
...

And the new body follows

I encourage you to download the sample project I have prepared: https://github.com/wlk/akka-http-etag-example which will allow you to try out those commands yourself, I have also added more debug logging to see how your request flows through on the server side.

Implementing ETags support

Akka-http has already implemented a conditional directive that allows us to use ETags quite effectively

So what’s left for us is to properly include it in our routing and pass correct arguments.

In my sample project there is a class BooksApi

This is most important snippet – I have written comments inline:

// There are 3 cases to consider
path("books-etags" / IntNumber) { id =>
  optionalHeaderValueByName("If-None-Match") {
    case Some(_) =>
      // First case, we get request with some value of "If-None-Match" header, right now we are unable to say if it's valid ETag
      booksService.getBookLastUpdatedById(id) match {
        case Some(lastUpdated) =>
          // "conditional" directive receives the ETag extracted from the request, and compares it to "lightweightBookETag(lastUpdated)"
          // which was calculated only based on the date of the book - we didn't have to fetch full object from the DB
          conditional(lightweightBookETag(lastUpdated), lastUpdated) {
            // "conditional" directive will return 304 if ETag or "If-Modified-Since" were valid
            // in this case we don't need to fetch anything more from DB
            complete {
              // If ETag was invalid (for example outdated), we continue, this time fetching full object from DB
              // Fetching full object from memory is more time consuming
              booksService.findById(id) match {
                case Some(book) => book
                case None       => throw new RuntimeException("This shouldn't happen")
              }
            }
          }
        case None => complete {
          // If resource doesn't exist we don't set any headers
          HttpResponse(NotFound, entity = "Not found")
        }
      }
    case None =>
      // Second case, request doesn't contain "If-None-Match" header
      // we know that we have to return 200 with full body, so we do that (alternatively we return 404 if resource wasn't found)
      booksService.findById(id) match {
        case Some(book) =>
          conditional(bookETag(book), book.lastUpdated) {
            complete {
              book
            }
          }
        case None =>
          complete {
            HttpResponse(NotFound, entity = "Not found")
          }
      }
  }
}

A word about Last-Modified header

In some cases ETag and Last-Modified value could serve the same purposes, even in my project I calculate ETag based on lastUpdated date, because I know that each time a resource changes, it will also update lastUpdated date.

Last-Modified is sometimes simpler to understand, but it’s not as universal as ETags, here are some cases were it won’t work but ETags would:

  • Collections
    When collection has multiple resources, we can calculate combined ETag as concatenation of all ETags of individual resources, and then hash them with MD5 (or any other hashing algorithm)
    In case of Last-Modified we don’t have a way to do this, because if we look at Max(Last-Modified) for all elements, we won’t notice for example removal of elements from collection

  • Modification date is not available
    There are resources which don’t have information about modification date, they can change any time as well. In those cases ETags are the only option

Summary

Support for ETags and Last-Modified header is quite easy to add for Akka-http projects.

I have shown how to add this to a single endpoint, the drawback is that it makes code much more verbose as there are 3 cases that require handling, each requires different path the request needs to go through.

This probably could be eliminated by defining more generic function that can encapsulate all the logic, but I decided to leave this out for now.

Notes on creating microservices-based applications

This post is a collection of tips and notes I gathered while working on microservices-based applications for last couple of months.

The notes are divided in a couple of sections that focus on the different areas during development and running your services.

I have decided to write more low level notes/tips to focus on specific problems, for more high-level overview see: The Twelve-Factor App

Project Setup

  • Each service should be a self-contained project, hosted in a separate repository.
  • The microservices shouldn’t have any code level dependencies on each other
    • For example they shouldn’t depend on each other during build time
  • All dependencies should be factored into separate libraries
    • Also keep them as small as possible
  • Ideally only dependencies you have should be the open source libraries that you use
    • As a workaround, you can also open source your own libraries
  • The README.md should have some basic description of what the project does, what are steps to start developing the project
  • Ideally you should have instructions on how to run the project inside the docker container
    • This will help other developers but also if you use something like Kubernetes it will help down the line
  • After adopting docker as main tool to deploy the code, you should create appropriate repository in ECR or Docker Hub to host your containers

Specification

  • Apply API-first principles
  • Use a widely supported tools like RAML or Swagger to design your API endpoints and schemas first
  • Iteratively implement new endpoints, replacing static examples of the responses with live endpoints
  • Setup infrastructure to validate your schemas
    • Integration testing seems like a good step, your schemas can be validated as a “proxy” during testing

Implementation

  • Make sure that you handle error responses by other services or applications you depend on
  • Make sure that you set correct response type – HTTP Header
  • You should also handle API versioning, ideally this should be done on the higher level as well
  • Add support for X-Trace-Token, make sure to pass it around as you make further HTTP requests to other services
  • Also add X-Trace-Token to all log messages.
  • Ideally you could implement a Zipkin-like service to help with that

Monitoring

  • Your services should have a standard health check endpoint
    • You should standardize on what data is shown there
    • Format should be readable by the monitoring infrastructure
    • During health-checking, the service should send ping requests to all services it depends on and report status of those connections
  • You should have also tools to perform instrumentation / metric collection
    • Tools like Prometheus, NewRelic, Grafana or similar can be very helpful here
  • Logging should be written to standard output
  • Error logs should be written to standard error
  • Those logs should be captured by the tooling around docker containers (like Kubernetes) and redirected to Kibana or similar tool

Configuration

  • Make sure that you set sensible defaults for all configurable parameters
    • For example the defaults should allow you to run service on localhost for development
  • Configuration that changes in each environment (for example testing and production) should be read through environment variables
    • Which could be also configured by the Kubernetes or alternative approaches
  • Configuration shouldn’t change while your service is running
    • It’s better to design for applications that can quickly restart and apply new configuration than have long running processes that can change their config

Resiliency

  • Set a reasonable timeout for all outgoing calls you make
    • Also consider implementing circuit breakers like Hystrix to improve resiliency even more by avoiding cascading failures
  • Make sure your application can continue running while services it depends on are down
    • Make sure your application doesn’t require any manual administration when dependencies are down and later start up
  • Your service should start up even when dependencies are not available
    • For example you shouldn’t make any pre-startup checks if database is connectable
  • Make sure that increased rates or complexities of incoming requests won’t kill your application
    • Implement measures to protect your service from abuse
    • For example set a maximum page[limit] to avoid making heavy database calls or to limit response size
  • Setup error reporting service
    • Services like Airbrake or Rollbar will notify you of any errors that your service generates

Scaling

  • Services should follow shared-nothing practices
    • You shouldn’t directly modify state of other services or databases that you don’t ‘own’
    • You also shouldn’t allow other services to modify your internals state
  • Service should be effectively state less
    • All durable state should exist in the database
    • Caching is OK, but your service should function correctly without it
  • It should be possible to start more copies of your service without modifying existing ones
  • Prefer horizontal scalability over vertical one
  • Don’t use mechanisms like sticky sessions
    • These usually can prevent you from handling load evenly among instances of your service

Other

  • The gap between testing and production environments should be a small as possible
    • Ideally these environments should differ only by environment variables and scaling
  • Setup a traffic mirroring service
    • A portion of your live production traffic could be sent over to testing environments
    • This will allow you to spot bugs more easily
  • One-off admin processes that need to be run during deployment should ideally be automated
    • Or at least those scripts should be bundled with your application
    • For example: database schema migrations