skip to content
Gary Holland's blog

Serverless websocket APIs in AWS

/ 7 min read

Building applications with real-time updates is fun and managing servers is not.

In this post I’m going to explain how we can build a websocket API to enable fun real-time interactions in an application while using serverless Lambda functions for all our logic and thereby avoiding any boring server maintenance.

I’m going to focus on the what and the why rather than the how — if I included all the code in this post then it would be way too long! Further posts will go into detail on the code. N.B. my explanations will involve references to the AWS CDK. If you’re managing your infrastructure some other way, you will have to figure out how to translate these concepts.

Serverless websockets?

A websocket is a persistent, bidirectional connection between a client and a server. A serverless function is a unit of code which is executed on-demand in an ephemeral environment which is provisioned on the fly and torn down again after the execution is complete. Given that “persistent” and “ephemeral” are literally antonyms, the idea of a serverless websocket API might seem somewhere between “counterintuitive” and “impossible to implement”.

In fact, it is surprisingly simple to implement on AWS. AWS’s API Gateway service can handle all the boring parts of managing websocket connections for us and forward the events it receives to different Lambda functions according to routing logic that we specify. Those Lambda functions can then execute any application logic we want. Abstracting the application logic away from the websocket layer neatly solves the persistent/serverless conflict.

There’s another benefit to this approach, too. It forces us to write our application logic as a series of stateless functions, which keeps the code for each one as simple as possible and therefore as maintainable as possible.

Persistent storage

To build a serverless websocket API that allows users to send messages to each other, we only need one more thing: a persistent store of connection IDs. Because all our application logic will be in Lambda functions, one Lambda will need to be able to instruct API Gateway to send messages to the connected clients. Each such instruction must specify both a message and the connection ID of the client who should receive it. Therefore we need to store each connection ID in a database table that the Lambda functions can access. We’ll use DynamoDB.

Architecture

This is all the information we need to figure out our basic architecture, which is pictured below.

  1. An API Gateway Websocket API accepts connections and forwards all events it receives to Lambda functions according to our own routing logic

  2. A DynamoDB table to store connection IDs

  3. Three Lambda functions handle each type of event that we want to support:

    i. Connect — when a client connects to the Websocket API, we store its connection ID in a DynamoDB table

    ii. Message — when a client sends a message, we fetch all the connection IDs apart from that of the client sending the message from DynamoDB and use them to instruct the Websocket API to forward the message to them all

    iii. Disconnect — when a client disconnects from the Websocket API, we remove their connection ID from the DynamoDB table

Serverless websocket API architecture

Now that we know the shape of the solution, let’s zoom in on the individual pieces and explore exactly what they do.

Websocket API Endpoint

In API Gateway you can create a WebSocket API as a stateful frontend for an AWS service (such as Lambda or DynamoDB) or for an HTTP endpoint. The WebSocket API invokes your backend based on the content of the messages it receives from client apps.

API Gateway docs

The most important thing to understand here is how routing works. If we were building a REST API, our routing would be based on a series of distinct paths (URLs), with each representing a different resource or action. But because the WebSocket API uses one persistent connection, we can’t use different paths for each action. Instead, we use something called a routeSelectionExpression to tell API Gateway how to decide which Lambda should be triggered by an incoming message. The default routeSelectionExpression is $request.body.action, which refers to a field called action on the JSON object that is sent to the API by a client. We can use a different field name if we set routeSelectionExpression in the WebSocketApi constructor in the CDK, but let’s stick with the default for now.

The WebSocketAPI comes with three routes out of the box: connect, disconnect and default. connect and disconnect are triggered when a client connects or disconnects, respectively. We can add as many custom routes as we like to support different actions that a connected client can perform. The default route is triggered by any event that doesn’t trigger connect, disconnect or one of our custom routes. This isn’t required, but you can use it to return a generic “That’s not a valid action” error.

The @aws-cdk/aws-apigatewayv2-integrations-alpha npm package has contains a WebSocketLambdaIntegration construct which we can use to link a WebSocketApi route with a specific Lambda function. This is a nice abstraction over some of the nitty-gritty of our infrastructure; one we’ve created a WebSocketApi we just need provide a routeSelectionExpression and a WebSocketLambdaIntegration in a call to its addRoute method and we have added a new route.

Connections table

We’re using DynamoDB for our persistent storage. This quote sums up the reason nicely:

DynamoDB is aligned with the values of Serverless applications: automatic scaling according to your application load, pay-per-what-you-use pricing, easy to get started with, and no servers to manage. This makes DynamoDB a very popular choice for Serverless applications running in AWS.

Serverless Framework docs

The main downside to DynamoDB is that learning to model data in it involves a steeper learning curve than you might experience with other database options. Fortunately, our needs are so simple that we won’t need to worry about that. (If you do want to start using DynamoDB for more complex data, you could do a lot worse than to read this article by Alex DeBrie.)

Each item we store in the database only really needs one field: the connectionId. But we are also going to add a room field, which will the partition key we use for querying the table and will also make the application easier to extend in the future. For now, all clients will connect to the same room, but it would be straightforward to introduce multiple rooms later thanks to this field.

With these two fields, our database table will look like this when there are three clients connected:

PKconnectionId
ROOM#1abc123
ROOM#1def456
ROOM#1ghi789

Lambdas

We have three Lambda functions, each performing one specific job. Because we’re only using one room for now, we can hard code the room ID (1) somewhere for now.

Connect

The connection Lambda takes the connectionId from the incoming request and adds it to the database with DynamoDB’s put method.

Message

The message Lambda is the most interesting, although it’s still pretty simple! First of all we need to note that unlike our other Lambdas, this one needs more information than just the connectionId to work — it also needs the message. So when a client calls this route, the body of their message will look something like {"action":"message", "messageBody": "hello, world!"}. You’ll probably want some error handling in place in case the JSON has "action": "message" but not messageBody.

Once we’ve got the connectionId and messageBody from the incoming request, we use DynamoDB’s query method to fetch all the connectionIds for our hard-coded room except the one that matches the incoming request, and then we call postToConnection on an API Gateway Management API Client to forward the messageBody to each connectionId.

Disconnect

The disconnection Lambda takes the connectionId from the incoming request and deletes the matching record from the table with DynamoDB’s delete method.

Conclusion

That covers the architecture of a serverless WebSocket API built in AWS with Lambdas and DynamoDB. In the next posts, we’ll start to look at the CDK code for building the infrastructure we’ve described here and the application code for the Lambdas.