Decomposing the Database - Database-as-a-Service Interface

Published on 07 Oct 2021

Sometimes, clients just need a database to query. It could be because they need to query or fetch large amounts of data, or perhaps because external parties are already using tool chains that require a SQL endpoint to work against (think about tools like Tableau, which are often used to gain insights into business metrics). In these situations, allowing clients to view data that your service manages in a database can make sense, but we should take care to separate the database we expose from the database we use inside our service boundary.

One approach I have seen work well is to create a dedicated database designed to be exposed as a read-only endpoint, and have this database populated when the data in the underlying database changes. In effect, in the same way that a service could expose a stream of events as one endpoint, and a synchronous API as another endpoint, it could also expose a database to external consumers. We see an example of the Orders service, which exposes a read/write endpoint via an API, and a database as a read-only interface. A mapping engine takes changes in the internal database, and works out what changes need to be made in the external database.

The mapping engine could ignore the changes entirely, expose the change directly, or something in between. The key thing is that the mapping engine acts as an abstraction layer between the internal and external databases. When our internal database changes structure, the mapping engine will need to change to ensure that the public-facing database remains consistent. In virtually all cases, the mapping engine will lag behind writes made to the internal database; typically, the choice of mapping engine implementation will determine this lag. Clients reading from the exposed database need to understand that they are therefore seeing potentially stale data, and you may find it appropriate to programmatically expose information regarding when the external database was last updated.

Implementing a Mapping Engine

The detail here is in working out how to update—namely, how you implement the mapping engine. We’ve already looked at a change data capture system, which would be an excellent choice here. In fact, that is likely to be the most robust solution while also providing the most up-to-date view. Another option would be to have a batch process just copy the data over, although this can be problematic as it is likely to lead to a longer lag between internal and external databases, and determining which data needs to be copied over can be difficult with some schemas. A third option could be to listen to events fired from the service in question, and use that to update the external database.

In the past, I would have used a batch job to handle this. Nowadays, though, I’d probably utilize a dedicated change data capture system, perhaps something like Debezium. I’ve been bitten too many times by batch processes not running or taking too long to run. With the world moving away from batch jobs, and wanting data faster, batch is giving way to real time. Getting a change data capture system in place to handle this makes sense, especially if you are considering using it to expose events outside your service boundary.

Compared to Views

This pattern is more sophisticated than a simple database view. Database views are typically tied to a particular technology stack: if I want to present a view of an Oracle database, both the underlying database and the schema hosting the views both run on Oracle. With this approach, the database we expose can be a totally different technology stack. We could use Cassandra inside our service, but present a traditional SQL database as a public-facing endpoint.

This pattern gives more flexibility than database views, but at an added cost. If the needs of your consumers can be satisfied with a simple database view, this is likely to be less work to implement in the first instance. Just be aware that this may place restrictions on how this interface can evolve. You could start with the use of a database view and consider a shift to a dedicated reporting database later on.

Where to Use It

Obviously, as the database that is exposed as an endpoint is read-only, this is useful only for clients who need read-only access. It fits reporting use cases very well—situations where your clients may need to join across large amounts of data that a given service holds. This idea could be extended to then import this database’s data into a larger data warehouse, allowing for data from multiple services to be queried.

Don’t underestimate the work required to ensure that this external database projection is kept properly up-to-date. Depending on how your current service is implemented, this might be a complex undertaking.

References