Simplifying Compliant Multi-Tenancy - Vim

Simplifying Compliant Multi-Tenancy

Isolating tenant data is a valuable feature in every SaaS platform. If a customer accesses the data of another, it can break the trust between the SaaS provider and the customer and create commercial and privacy risks.

In systems that store and process sensitive data, like health care systems, this is even more critical, as personal health information (PHI) accessed by the wrong user can violate privacy laws like HIPAA, expose sensitive information, and put a company at risk for legal issues.

Ready to learn more?
Request a demo

In microservice-enabled systems, tenant isolation introduces additional challenges and complexities, as each microservice is responsible for its data structure, migrations, etc.

This post aims to explore how SaaS organizations in healthcare can approach isolating tenant data.

Single tenancy

Many organizations may first approach this by completely segregating each tenant. In other words, they will implement a “single tenancy” approach. Let’s break this down with an illustration:

In the drawing, we can identify 2 modules –

  • Orchestrator – routes the traffic to the relevant tenant for operational API calls.
  • Tenant Management – responsible for managing the tenants, including creating the resources needed.

In a model such as this, it’s easy to focus on the operational flow, but in doing this, we fail to take into account how a new tenant is created, which is a crucial part of the solution I am proposing.

While this approach provides the strongest segregation, it has several drawbacks:

  1. Cost – each tenant can have many dedicated microservices deployed, meaning lots of compute power to support all customers. In addition, each new onboarded tenant has a direct linear effect on the cost, which immediately affects your SaaS profit metrics.
  2. Complex maintenance and Ops – Managing a lot of microservices, processes, and machines will increase the complexity and stability of your infrastructure, forcing you to deal with problems you could have avoided.
  3. Deployment time – assuming you have the same code deployed multiple times, deployment time and complexity will be increased (along with cost) and will continue to be extended as more customers are onboarded.
  4. Complex tenant provisioning – Assuming you have an onboarding flow to your product, single tenancy architecture complicates this process, as infra-related resources need to be provisioned, services need to be deployed, etc. Those processes are usually internal to the dev team, but here it is part of the product itself, which requires a different SLA.
  5. Deployment vs. tenant creation* – Following bullets 3 and 4, when new tenants are created and at the same time a new version of the code is deployed, this might create a race condition that can result in an inconsistent environment. For example, a deployment script runs after a tenant is created with service version X, and a second later the dev team releases a version with version X’ to be deployed on all existing services. One has to be synced with the other to ensure the latest version is deployed.

*Depending on your deployment pipeline

In our research for the right solution, we aimed to keep the following microservices architecture concepts:

  1. Keeping our microservices independent as much as we can
  2. Ensuring a single database (DB) connection per microservice – same user, password, and DB
  3. Establishing each microservice to manage its database migrations – while aiming for simplicity
  4. Leveraging connection pools as this is crucial for performance and stability
  5. Following KISS – as good developers, we always aim for simplicity

Option 1: Database per tenant

This approach is similar to “single tenant architecture”, but in this case, we’ll have a single microservice that accesses multiple databases.

Our multi-tenant service will receive a request and will need to connect to the correct tenant database to perform the relevant action.

We will need to provision a new database for each customer that onboards and update the service accordingly.

This approach can reduce the compute power needed to support multiple tenants.

Let’s analyze the solution based on our criteria:

Independent micro service – Depends, in option 1 the service is responsible for all logic, including its internal tables, but in option 2 the database migrations need to be shared between the service and the tenant management component. It’s a tricky choice.

Single DB connection – No. The service needs to connect to a different DB depending on the request, meaning it will have to hold multiple connection points.

DB migrations – Complex. When a new tenant is created migrations need to run on the new tenant database for the tables to be created. This also needs to run when a new version is deployed and to run the migration for all the databases of all customers – meaning this is highly complex.

Connection pools Complex. The service will need to hold multiple connections and, therefore multiple pools. This can cause a high number of IO threads and operations in your process.

KISS – Not simple, as migrations are complex, and connection pooling might need to be solved in a different layer external to the service to increase performance. In addition, assuming we have more than one service, we’ll need to ensure to notify all services.

Option 2: Table/Schema per tenant

In this approach, we create a separate schema and a separate table for each tenant.

Each tenant will have its own table (or schema with its own table), and our microservice will access the correct table using a table prefix for a tenant.

Now let’s analyze:

Independent micro service – Pretty similar to the previous approach, to ensure service independence we need to choose option 1.

Single DB connection – Yes. Our microservice can have a single database connection. We can determine in the app code which table/schema should be accessed in each API transaction.

DB migrations – Pretty similar to the previous approach. Complex. When a new tenant is created migrations need to run on the new tenant database for the tables to be created. This also needs to run when a new version is deployed.

Connection pools Yes. We can keep our beloved connection pool in our code.

KISS – Simpler than the previous approach, but still migrations are complex.

Option 3: Row Level Security (RLS)

This approach leverages the Postgres Row level security capability. RLS is a concept implemented in many database systems and BI tools.

Every row in a multi-tenant table is associated with a single tenant – a new column should be added to the table to identify the tenant.

The RLS policy is applied on the multi-tenant table and should enforce that this tenant exists using a complementary table called “tenants”, as well as ensuring that data is not mixed and that we cannot query or modify multiple tenants’ data at the same time.

So how can we enforce access? There are 2 main ways to achieve that:

  1. Connect Postgres user – we can set up a user per tenant and check that the connected user is accessible.
  2. Current session variable – we can set up a variable per session, and enforce that the session is restricted to a single tenant.

We chose the second option, as the first one forces us to manage a user per tenant, and will force our code to maintain multiple connections to the DB for each tenant with its user – something we wish to avoid.

Independent micro service – Very independent, as internal logic including migrations is encapsulated inside the service. It does have a dependency in the “tenants” table, so if we wish to use a dedicated database per service we need to ensure that this table exists.

Single DB connection – Yes. Our microservice can have a single database connection.

DB migrations – Straight forward. We can keep managing our migration files as part of the service, and migrations run only as part of deployment and only on a single table, regardless of the number of customers onboarded. This is a compelling advantage for this solution.

Connection pools Yes. In this approach, we can also keep our beloved connection pool in our code.

KISS – Handling RLS policies is DB-specific (in our example, Postgres), but t the solution is simple. Furthermore, if there are ETL/ELT systems that also consume this data, it is much simpler to extract.

Compare it all

Let’s rank and color-code our potential solutions for easy comparison:

Independent microservice Single DB connection DB migrations Connection pools KISS
DB per tenant (option 1) Medium Low Low Low Low
Schema/table per tenant (option 2) Medium High Low High Medium
RLS (option 3) High High High High High

Clearly, I’m opinionated 🙂

Let’s implement!

Let’s see how we define the policy:

CREATE POLICY mtt_isolation_policy ON my_multi_tenant_table USING
  ("_tenant_id" =
    (
      SELECT "id" FROM tenants
      WHERE id = (SELECT current_setting('postgres.current_tenant')
    )
  )

As you can see, we use the current_setting directive to compare the tenant ID with a session variable that we will set during the transaction in our app code.

The RLS policy wraps the table and behind the scenes, adds a WHERE clause to each operation performed on the table.

This code can be added as a migration script in the service code and keeps the responsibility for the table managed by the dedicated microservice.

Now, after the policy is set, if we query the table without defining the session variable, we’ll get 0 results.

In our app code, before every DML operation is performed, we should set the session variable.

SELECT set_config('postgres.current_tenant', '${tenant_id}', TRUE)

Lessons learned after much trial and error:

  • Separate migrations and app DB users – when the microservice itself runs the migration, eventually it creates the RLS policy. In addition, the same user usually is used in the app itself. It means that our API might have a bug (SQL injection for example) that can remove or bypass the policy. To overcome this issue we can create a user for migration and a user for the app to use.
  • Index the tenant column – Once we set the policy on the table, all our queries will run with an additional condition that is not declared explicitly on the query – using the new tenant column. If the column is not indexed it can affect the query performance.

Compliance silo – DB per group of tenants + RLS

ManySaaS systems offer an “Enterprise plan” that may have a more secure, segregated environment for a specific customer, that is willing to pay for it.

In Vim’s case, we have customers that want to be segregated and to ensure a stronger separation between them and other customers, but they do manage multiple tenants.

We came up with the concept of a “Compliance Silo”, that combines the concept of single tenancy and RLS. This approach can bridge this gap and ensure that a specific customer has stronger segregation, without changing the implementation completely.

This is also a very cost-effective approach:

  1. Compliance silo is created only when needed and upon request.
  2. No need to deploy the entire system for a customer – only sensitive data is segregated and relevant services are deployed.
  3. Your microservices echo system is available for all compliance silos (e.g. sensitive services can access non-sensitive services that are available).

Chen Rozenes

a Chief Architect at Vim

Subscribe For the lastest updates
Subscribe today and become a Vim-insider