If you’re responsible for a microservices app, you may be familiar with the idea of a “latency budget.” This is the maximum latency, measured as total request time, that you need for the app to work, in order to meet your SLAs and keep stakeholders happy. For a stock trading or financial services app, this budget might be the barest of microseconds. For one sales-related service app, which displays the business information of incoming callers, the budget was 250 milliseconds (any longer, and the person has already picked up the phone).
If you’re using Open Policy Agent (OPA) for authorization or policy decisions in your microservices app, OPA will figure as a “line item” in your latency budget. Almost always, the greatest latency is “spent” not on OPA itself, but on the network hops needed to reach OPA for a policy decision. The question then — and it’s one of the most common OPA questions — is where to place OPA (or a policy enforcement point) in your cluster, to maximize latency performance while keeping your data safe.
As an aside, there are many ways to optimize policy performance within OPA itself by following best practices — like preferring objects over arrays and making sure statements are properly indexed. If you want to avoid network calls entirely, you can also evaluate OPA policy inside of your application. In Go applications, this can be achieved by integrating OPA as a library. For other scenarios, you might opt to compile Rego policies into WebAssembly. In general, though, these cases are the exception, rather than the rule.
For when you need to deploy OPA outside of your application, here are some of the most popular OPA deployment performance models for microservices, along with some *rubs hands* experimental models that can get your creative-architectural juices flowing. No right or wrong answers; with the flexibility of OPA, this is only a matter of finding the right policy model for your environment and your latency needs. Time for the rubber to meet the road.
1. The Gateway Model (Centralized)
Among the most popular policy models for OPA is the traditional gateway or firewall model. Familiar to most developers, this architecture places OPA instances at, you guessed it, the gateway guarding your cluster. This means there is a single, centralized location for authorization and policy decisions — all downstream traffic is presumed authorized. As with all centralized models, all authorization decisions therefore have to take one or more network hops to reach the OPA instance.
The gateway model makes sense for many deployments, aided by OPA’s integration with popular tools like Kong and Apogee. That said, despite being one of the fastest models (often requiring only a single network hop), the gateway model is not necessarily the most secure. If a malicious entity somehow bypasses the gateway, for instance, it is a free highway to the entire cluster — because all traffic is “presumed authorized” at that stage. Another tradeoff: an OPA instance at the gateway cannot fulfill authorization for internal services, like cron jobs, which do not pass through the gateway.
Advantages: Fast, popular, familiar, simple (a decades-old strategy).
Challenges: Single point of failure, no internal protection, could be overly permissive for security.
2. The “Close to the Data” Model (Centralized)
Then there is the inverse: moving OPA to the back of your stack, close to the data. For instance, if you have a data layer talking to databases, that is where you would place your OPA instance. This model has similar performance characteristics to the gateway model, because authorization or policy decisions are still being made by a single centralized service point — requiring network hops. The advantage, however, is that you can better guard your data; even if attackers breach the gateway, they still cannot reach your data without authorization. One drawback of this model is that there may be other items in your cluster that you want to protect — other services, APIs and so on — meaning you will need to create additional OPA instances, elsewhere. As a result, OPA policy enforcement points in your cluster can become somewhat arbitrary or haphazard.
Advantages: Fast and provides security for your data (not for your services).
Challenges: Arbitrary enforcement when you need to protect other services in your cluster.
3. The “Enforce OPA Everywhere” Model (Distributed)
Jeff Bezos was prescient in mandating that every internal Amazon service be built as if it were a public-facing API. It’s hard to imagine the success of services like AWS and S3 without this. With OPA, you too can adopt this approach.
In this model, you have OPA instances running as a sidecar to each of your microservices, often via an Envoy proxy or a sidecar authorization API. The performance is fast as lightning for each microservice, because authorization requests are local to the server and require no network hops. Meanwhile, even if certain services in the cluster become partitioned or “unavailable,” each individual microservice can continue running its own local authorization requests.
Still, there are challenges to this model. For one, you have to be aware of the latency of serial OPA calls. For instance, even if a local microservice call to OPA costs only 5 milliseconds of “budget,” if you call 10 microservices in serial, then the total added latency will be 50 milliseconds. Even so, this setup is typically faster than centralized models that require network hops to a remote server. Second, you have to ensure that the local policy version of each OPA instance is consistent with the others. For example, Service A policy needs to match Service B policy, and so on. In general, this model is an ideal fit for many dynamic microservices architectures.
Advantages: Secure, “zero trust” model, dynamic — fit for cloud native.
Challenges: Managing requests in serial, ensuring local policy consistency.
4. The JSON Token Model (Experimental)
Much of the value of OPA lies in its flexibility. This model takes OPA across new and exploratory horizons — one to get the gears turning, to be sure, and yet a model that could be entirely practical in your organization.
This is the JSON web token (JWT) model — doing for authorization what OAuth and OpenID Connect do for authentication. In this model, OPA responds to an authorization request with a signed JWT, which is then issued to the downstream toolchain. Any subsequent service, rather than cost additional latency by querying OPA, can instead validate the JWT issued from the trusted OPA instance. This model is not necessarily centralized or distributed (it could be either), but in a distributed model it carries the benefit of high security and lightning-fast local response times, without the drawback of aggregated latency from serialized OPA calls.
Advantages: Future-facing, fast, efficient, secure.
Challenges: Forging a path with a new model.
5. The “Zero Latency,” Parallel Query Model (Experimental)
Finally, we have the parallel request model — equally as experimental and exciting as the JWT model. Here, when a microservice has a request for the backend that requires an authorization decision, it sends two parallel queries: one straight to the backend, and one to OPA. As long as the authorization decision from OPA is faster than the call to the database, OPA adds zero latency to the overall request. Even if the OPA response is slower than the database request, you are still optimizing for minimal possible latency (and maximizing your latency “budget”) by sending the requests in tandem. One potential problem with this approach: since requests to the backend are sent before OPA has made an authorization decision, even requests that are eventually denied are still processed, creating unnecessary load. This could be alleviated by using other networking patterns, such as circuit breakers, though this adds to the complexity of the solution. Still, if reducing latency is a higher priority than reducing backend load, this model is an option.
Advantages: As close to zero OPA latency as possible for backend requests.
Challenges: Even denied requests put load on backend services. Similarly, you are breaking ground with a new model.
Microservices developers are always in the business of maximizing latency performance. And, with the continued drive for digital transformation, developer teams are under increased pressure to deliver richer app services and process more data, faster — heightening the need for speed. With the flexibility of OPA, you can find the right model to achieve the performance you need, while making the policy and authorization decisions you require.
This article first appeared in The New Stack on March 1, 2021.