Scala Patterns To Avoid: Implicit Arguments With Default Values

There is a tendency for the Scala projects to prefer more explicit programming style. The biggest aspect of that is in my opinion the type system of the Scala language, programmers often start writing their functions by defining types of the arguments and type of the result, only to write the body of the function as last step. That’s also because we have the Scala compiler to help us.

I recently stumbled on a snippet that contradicts this rule and can be a source of hard to spot bugs for the person unfamiliar with the code.

The problem happens when you try to mix both features of the Scala language that make sense if used on their own (although I’ll argue with that a little in a moment), but when used together they create a very serious problem.

The features (as you probably guessed) are:

  • Implicit arguments
  • Default function arguments values

Implicit arguments

Implicit arguments when used in separation make a lot of sense, they allow for context-like arguments to be passed through the whole call stack. Furthermore they are the major building block for more advanced features like typeclasses.
Many major libraries or projects would not exist without it, but because this is a very powerful feature, it’s use should be limited.

Default function arguments

In my option this feature should be avoided almost everywhere – they make it very hard to use functions as arguments and pass function around.
It’s better to use either currying (multiple parameter lists) or just define function few times taking different set of parameters in each case.

I think it makes sense to use default arguments in 2 cases:

  • Backwards compatibility – you have an existing case class and want do add another attribute to it without changing your code or underlying database structures
  • Complicated, builder-like constructors – you have constructor that takes a ton of arguments, and users would like to configure only a limited set each time, using defaults for everything else

The problematic code

Here’s a snippet of the code I stumbled on:

case class User(name: String)

object User {
  val Default = User(name = `unknown`)
}

def updateCampaign(c: Campaign)(implicit user: User = User.Default) {
  // save updated campaign
  // add audit log entry indicating `user` as a person performing action
}

Here’s what I think is wrong here:

  • Caller is not aware that updateCampaign actually takes any argument unless he/she reads the source or looks up documentation (if there’s any).
    You can perfectly well write code that looks like this:
val user = User(`test`)
val campaign = Campaign(...)
updateCampaign(campaign)

And you will not see anything wrong – the compiler won’t complain. Unless the caller is aware there’s another parameter expected, there’s no straightforward way to learn it.

  • If someone overrides updateCampaign function this information will be lost forever, all calls to the overridden version will use the default argument.

The above snippet contradicts the explicitness rule, also Scala compiler will not help you spotting a bug.

Of course the code will still run – the only problem is that every update to the campaign will be attributed to User.Default which is not what we are expecting – but there’s no way to express that intend when using default arguments.

Fixing the code

To fix it, simply remove the default value for the user argument:

def updateCampaign(c: Campaign)(implicit user: User) {
  // save updated campaign
  // add audit log entry indicating `user` as a person performing action
}

After that change you will get the compilation error forcing you to fix the issue.

Scala compiler is now able to detect this issue right away, you don’t have to remember which functions take which arguments because you can leverage the compiler.

Summary

The outlined scenario is one of many examples where a programmer can use Scala compiler to his/her benefit. A simple change to the code will result in a compiler pointing out the problem right away, forcing you to address it before code is released.

It’s also one of the many logic-related mistakes that are very hard to spot in testing – the “audit logging” in this case is a side effect to the regular responsibility of the code.

Flyway Database Migrations – Best Practices

This post is attempts cover some of best practices in using Flyway, the database migration tool, that I have learned or established after using it in few projects. It’s aimed at users already familiar with Flyway or any other database migration tools, so I’ll try to skip over the very basics.

I’m focusing on Flyway here, but I think the best practices will apply to majority of other database migration tools.

Transactional Migrations

Each individual database migration is wrapped by Flyway inside a transaction. This means that:

  • You shouldn’t use transactions explicitly in your plain SQL migrations
  • If you are running multiple transactions at once (for example updating from version 1 to 4), this means that some it will stop running after first failure – some migrations will work, some won’t.

Dealing With Rollbacks

Flyway doesn’t support rollbacks explicitly, instead you should always create new migration that should revert recently introduced changes. This is the same patter used by git revert $COMMIT.

Rollbacks During Development

During development those rules should be more relaxed. When you are working on creating correct database models or just migrations, you will often have a need to reset database at the specific state and re-run the migration. I have always found this a little annoying and here’s the process I’ve been following:

(Please note that this is a very bad idea to do this on the production database, but during development that’s in fact a good idea)

  • After I realize that I need to modify the last migration I have executed
  • Manually revert database changes
  • Delete last entry from schema_version
  • Run flywayInfo to confirm that Flyway doesn’t see my last migration
  • Run flywayMigrate to ‘re-run’ migration

I have found this method to work well in practice, especially when each individual migration is small and the iteration cycle is small. The only requirement is that the database migration I’m editing hasn’t been applied anywhere else outside of my local environment.

Use Baseline At The Start

Baseline command is used to introduce a database migration tool to the existing project, here’s how you should use it:

  • Create a database dump as SQL file
  • Add this file to Flyway by running flywayBaseline introducing this file as V1__Baseline.sql

What many people fail to do is to manually inspect the database dump to eliminate unnecessary or unwanted data. You should cleanup the database dump so it will contain only “database schemas” and “reference information”.

Performance Tips

In cases when you are adding a large piece of reference data – tables with thousands or hundreds of thousands of rows, it’s good idea performance-wise to follow following schematics:

CREATE TABLE X ....

-- LOAD DATA

CREATE INDEX A ON TABLE X ...

Creating indexes after loading in the data speeds up the process, because the database doesn’t need to keep the index up to date as you add rows, it will only perform a single index creation operation, the bigger your table is, the bigger the impact.

Missing Migrations Warning

It’s possible to programmatically inspect schema_version (You can also use Flyway API but this requires adding dependency to your project) table and issue warning in the following cases:

  • You migrations that were not executed
  • You have migrations that resulted in errors
  • You have more migrations that were performed that application knows about
  • Etc…

Missing Migrations Error

This idea builds on the previous one, instead of issuing warning you can selectively issue either warnings or errors or even exit your application when it detects serious problems with the state of the database.

Setup Database Users

Flyway will display information about which user performed the migration, each user/operator should have it’s own account in the database so this information will be persisted. Avoid using generic users or same user that your application uses for accessing database.

Don’t Create Users With Migrations

User and permission management shouldn’t be handled by database migrations. Here’s the reason behind it:

If you run your application in new or different environment you might need to create different users or setup different permissions (for example IP addresses) – I have found it easier not to include users as part of migration procedure.

Editing Previously Executed Migration

Let’s say you have found a bug in the migration that’s already been executed on the production database. You have 2 options here:

Create new migration that will fix the bug

This is the ideal scenario, but sometimes you would like to take the shortcut

Advanced Users can take the shortcut

There is a limited set of cases when the shortcut can be taken, I don’t have the list, so use your own judgement 🙂

Here are the rough steps:

  • Edit migration file in place
  • Manually perform the diff migration between what’s in the database and your new migration file (*)
  • Run flywayInfo, you should get an exception complaining about checksum mismatch
  • Edit row in schema_version table updating checksum from existing value to the new one

(*) In order for this method to work it should be possible to create that diff and it should be relatively simple to perform to ensure you don’t introduce any issues

Summary

Flyway (or SQL database migration tools in general) is a tool that in my opinion is a must have for any serious project.

I have shown some of the patterns and best practices that I have worked out after using if in multiple projects.

Designing APIs with JSON API Specification

JSON API specification is a tool that greatly helps and simplifies the process of designing RESTful JSON based APIs and very frequently serves as great anti-bikeshedding tool that limits unnecessary discussions in any team.

In my opinion it doesn’t get it’s deserved attention and this post aims to provide good introduction to that topic.

My goal is also not to repeat official documentation, but rather try to add something new to the topic. The specification is also quite readable, so I suggest reading it if you want to learn more.

What is JSON API

JSON API is a specification describing multiple aspects of how a proper RESTful API should look like, it was released in 2015.

Following aspects of API are covered by it:

  • URL Structure
  • How a request should look like
  • How a response should look like
  • How to update/delete/create resources
  • How errors are reported and how they should be handled
  • Advanced features like server side includes
  • And many more aspects

Motivation For Using JSON API

The team was tasked with designing and implement new API for general use. At the time it wasn’t clear what the client will be, so it wasn’t possible to create a API that meets requirements of specific client, API had to be generic enough to serve many different clients with various use cases.

After initial research the team members divided endpoints to implement among themselves and everyone quickly went to implement their own endpoints and problems started to appear, to put things simple, API was inconsistent:

  • Different endpoints used different ways of passing arguments
  • There was no standard around reporting errors
  • Some endpoints lacked pagination, some had support
  • There was no consistency among returned resources – same resources were named differently, different resources had the same names
  • No consistency among identifiers, dates, numbers, ordering, sorting, etc
  • Because of poor tooling it was almost impossible to detect problems with the documentation
  • The documentation was incomplete or just wrong

Does it sound familiar?

Why You Should Follow Specification

There are several traits in JSON API specification that make it very fitting modern RESTful APIs.

Anti-bikeshedding

Frequently you might be faced with trivial decisions to make, but somehow you end up with the situation that because the matter is that trivial/simple everyone (on the team) has their own idea how particular feature should be build.

For example ‘pagination’: Should we call pagination query parameters offset and limit or page and size?

JSON API specification already has all those decisions made for you, so instead of focusing on irrelevant issues, sometimes even nitpicking, you should just accept whatever the specifications says and move on to working on the important issues.

It’s true that sometimes a specification might make a choice that you don’t agree with, but in my opinion it usually is more important to accept it instead of focusing on irrelevant details.

Iterative Development

I’m in favor of API first approach, the process is more or less as follows:

  • Given business requirements you prepare few examples of API responses
  • After verification (or at the same time) you prepare API request/response schemas, tweak examples to match schema
  • At this time you can plug in those endpoints as part of CI process, also it’s possible now to start developing client side API interactions
  • Once schema for your endpoints is stable you can start to implement real server responses

Because at every step in the process there’s extensive testing in place (which requires setting up if you don’t have it), as soon all your tests pass you should be able to transition to the next step pretty painlessly.

JSON API fits in that model quite well, the specification comes with JSON Schema (see next point).

Additionally if you don’t unnecessarily constrain yourself when designing your API specification, it should be quite easy to always add or change some aspects of the responses your server is returning.

Validation

JSON API comes with it’s own formal description specified as JSON Schema. You can validate your API design against it:

  • during your development process
  • as part of CI process

Also you can take it step further and validate your API responses against it, there are tools that can serve as HTTP proxies that proxy though all requests and responses validating their schemas along the way, usually they could be plugged in as part of integration test suite executed on CI server.

JSON API Basics

When designing JSON API compatible API, I think the top priority is to establish what resources your system has and what are the relationships between them.

In this and next section I’ll be using following API as an example: https://github.com/endpoints/endpoints-example – I suggest to clone it locally and play around, I’ll modify examples in this post to improve readability. It’s probably not the perfect design of the API, but it’s useful as learning example.

Tip: Use tools like Postman to browse API, Postman makes all relative URLs that are returned by the server clickable so you can very easily browse API just by pointing and clicking at parts of the server response.

Hypermedia

HATEOAS (Hypermedia as the Engine of Application State) is the concept that constrains REST in which the all client interaction with the server happens entirely through server provided dynamic links. In ideal scenario this would mean that the client can be “dumb” by not making any assumptions about the application it’s interacting with and all logic could be performed by the client.

There are several features (which will be covered in the next points) that make it possible to create a Hypermedia-style applications based on JSON API

Resources

When designing RESTful APIs the most important concept are the resources. The example API has 3 types of resources:

Authors


Let’s analyze it in some detail:

  • id – identifier of the resource
  • type – identifies a type of the object. A ‘type’ would have it’s own specification which would list all required and optional attributes, what relationships an object can have, etc
  • attributes – a list of all attributes this resource has, in our case there are 5 different ones
  • relationships – identifies how this resource is linked to other resources in our system, here we have links to books and photos
  • links – a link used to fetch that resource, if this would have been a collection of resources, we would have pagination links here
Books


A book object looks also quite simple. Note that it has a lot more relationships that allow you to traverse the API.

Chapters


Chapters is a simple resource type, included here for completeness.

Relationships

Relationships is another very important concept, they allow us to traverse resources in our system and because all links are always created dynamically, the client doesn’t need to hard-code any of them.

Each of the relationships.self links can be followed, for example by going to /v1/authors/1/books a client will get a collection of books written by author with id 1.

Now let’s take this one step further, what if I would like to get an author and all books written by him? Normally I would need to make 2 requests to /v1/authors/1 and /v1/authors/1/books.

JSON API specifies a feature called “server side includes” which allows me to combine those 2 requests into one as follows: /v1/authors/1?include=books complete response

Here’s a snippet of the response:

Let’s analyze the included section:
This attribute is a collection of all books written by that author, each of the included resources has it’s type that allows us to decode what type of the resource it is – there’s nothing from preventing us from including more types of relationships, for example you could include both authors and stores in the following request: /v1/authors/1?include=books,stores.

See also what happened data.relationships.books.data attribute, the data attribute wasn’t available before, it appeared only after explicitly including that relationship – this will help with matching which resource included what “included resources” because included section of the response is a flat collection.

Examples summary

I have shown 2 basic concepts of JSON API: resources and relationships.

The specification defines much more detail in the following aspects:

  • Sorting
  • Pagination
  • Error handling
  • Filtering
  • Creating and updating resources
  • And many more

I didn’t cover them here because those concepts are very straightforward to understand once you learn about resources and relationships, but you can read more in the specification.

Tooling

Libraries

There are multiple client and server side libraries, official list lists at least few libraries per language/technology. There’s plenty to choose from depending on your preferences.

Some of notable examples I have used or tried:

Take a look at examples section which shows few example implementations which offer interactive tools to browse and play with the API.

Here’s an example of how you could encode a Author class using scala-jsonapi from the example response shown before:

Editor support

I have had most success using Visual Studio Code with ‘plain’ JSON Tools plugin. In general you don’t need any special tools or editors to design in this format – it’s just JSON.

JSON Schema

JSON Schema allows you to validate your JSON documents against the schema (formal specification). There’s official JSON Schema provided as part of JSON API specification, meaning that your document can be validated against it.

As long as your API documentation passes schema validation you should be quite sure that JSON API clients shouldn’t have any problems with accessing your resources.

Following snippet is an example of how you could validate UUIDs with JSON Schema:

Problems With JSON API

JSON API as any significant tool comes with some drawbacks, I have spent around 8 months with it (as API designer and back-end developer) and here’s what I was able to observe.

Generic API

One of the biggest problems with JSON API is that the API you are designing in most cases will be or will end up being quite generic, I think this comes from the fact that you are thinking in terms of resources while designing it, because usually the actions reflect more specific requests from the clients.

The good thing is that this design will cover a lot of use cases and satisfy a lot of clients, but as result some clients will have to do additional work in order to get what they need.

Note that this isn’t necessarily a disadvantage, especially in scenario when the client isn’t known or doesn’t exist yet, in those cases you need to stay generic and specialize later.

Weird Workarounds

JSON API is quite a restrictive specification, you are not allowed to do many things that are easy to do when there’s no specification preventing you. After a while you will be able to develop few patters how to work around those limitations.

Actions

There might be cases when you need to compensate for the fact that your API doesn’t expose ability to perform actions and only operate in terms of resources.

The most common scenario is that when client wants server to perform some work, instead of sending POST request with that action, the client needs to add document to a work request collection. The server in turn needs to return resource location in the work completed collection and client should send GET request to that collection to receive it’s result. In this case instead of single POST request you sometimes will need to make 2 or more.

Image Upload

Another, very common scenario is uploading images – something a lot of APIs allow clients to do and take for granted. This is problematic because both client and server must use media type application/vnd.api+json which prevents you from sending image/png, see this SO link: https://stackoverflow.com/q/4083702/312026 for suggestions.

Slower Pace

As every tool that requires more work the pace is slower at the beginning. This factor is even multiplied when the team hasn’t worked with JSON API or any similar specification before.

Comparison

GraphQL

GraphQL serves as an alternative to RESTful approach (not just JSON API). Compared to JSON API it has very similar area that both projects try to cover, but GraphQL does things very differently in terms of how queries and responses look like.

I think one aspect to stress is that GraphQL works by embedding child resources as a recursive tree, whereas in JSON API the embedded (included) resources are always included as a flattened list, making it slightly harder to interpret.
IMO this is a great advantage of GraphQL and I’m very interested in trying it out in the next project.

Swagger / Open API / RAML

I decided to put those things into single category because they serve similar purpose. In majority of the cases you can think of them as complementary to JSON API. These tools serve as a way to help you design API and generate documentation (Swagger goes even further with automatic API playground).

They (at least in their base form) tend to focus on generating documentation, but don’t try to impose any formal requirements or specification how either API endpoints, requests or responses should look like. Therefore they are very versatile and can be used on their own when designing API without any specification.

My suggested approach is to:

  • Design your API according to (JSON API) specification
  • Use Swagger or RAML at the same time to give your API documentation a structure
  • Generate pretty HTML documentation out of RAML or use automatic Swagger tool to do this

Summary

My post is a introduction to the topic of JSON API, and I think this tool is a great way to solve many problems around API design.

I think it didn’t get enough attention in the past and still doesn’t, additionally there seem to be a lack of support from commercial users, and newer tools like GraphQL or Swagger are picking up the market. According to the official website, last release of JSON API specification was in 2015 and there were no update since.

The JSON API has many benefits I think I have been able to point out and in my opinion it still is a good tool to help design large APIs that will live for a long time, with many different clients integrating. At the same time it’s true that for smaller project or those with short lifespan, JSON API would most likely be blocking you from making progress and just sticking with plain RAML or Swagger while applying some best practices would be better.

I’m very interested in that space and watch it closely.