Service mesh technology emerged with the popularization of microservice architectures. Because service mesh facilitates the separation of networking from the business logic, it enables you to focus on your application’s core competency.
Microservice applications are distributed over multiple servers, data centers, or continents, making them highly network dependent. Service mesh manages network traffic between services by controlling traffic with routing rules and the dynamic direction of packages between services.
In this blog, we will look at use cases, compare top mesh options, and go over best practices.
Let us start with the most common scenarios where service meshes are used.
Service mesh is an architectural approach to connecting microservices and managing the traffic between them. They are heavily used in production at many levels in an organization. As a result, there are some standardized and widely accepted use cases.
Let's assume you have an instance of a backend service responding slowly and creating a bottleneck in your complete stack. Requests from the frontend services will then timeout and retry to connect to the slow service instance. With the help of service mesh, you can use a circuit breaker that ensures frontend instances will only connect with healthy backend instances. Thus, using service mesh improves the visibility of your stack and helps you troubleshoot problems.
Deployment strategies (blue/green deployment, canary, etc.) are becoming the norm for releasing upgrades to cloud-native applications. Service mesh allows deployment strategies since most deployment strategies are based on diverting traffic to specific instances. For example, you can create traffic rules in service mesh so that only a small group of users (say, 10%) will be exposed to the new version.
If everything goes as expected, you can divert all traffic to the latest version, completing your canary deployment. It is also recommended to check the internal deployment strategies of Kubernetes and match with your application’s requirements.
To keep your production stacks secure, it is best to harden them by testing delays, timeouts, and disaster recoveries.
Service mesh allows you to test its robustness by creating chaos in your system through delays and incorrect responses. For instance, by injecting delays in the service mesh traffic rules, you can test how the frontend and backend will behave when your database responds slowly to the queries from them.
API gateways are design patterns on the end-user side of the services that make it possible to manage APIs from a single-entry point. With the help of service mesh, you can use the same approach for service-to-service communication and create complex API management schemes within your clusters. It’s recommended that you check the Gateway API for incorporating the ideas into native Kubernetes resources in the upcoming versions of Kubernetes.
Service mesh acts as "smart" glue, dynamically connecting microservices with traffic policies, limits, and testing capabilities. As service mesh becomes increasingly popular, many new and widely accepted use-cases will join those listed above.
Now let us look at the benefits and drawbacks of the top service mesh software available.
While there are always a few startups with fancy service mesh products at every conference, only the three top mesh options are widely used in the cloud-native world: Istio, Linkerd, and Consul Connect. They are all open-source products with active communities. They also each have their own pros and cons based on their vision and implementation.
Istio is a Kubernetes-native service mesh initially developed by Lyft and highly adopted in the industry. Leading cloud Kubernetes providers like Google, IBM, and Microsoft use Istio as the default service mesh in their services. Istio provides a robust set of features to create connectivity between services, including request routing, timeouts, circuit breaking, and fault injection. Additionally, Istio creates deep insights into applications with metrics such as latency, traffic, and errors.
From an architectural point of view, Linkerd is similar to Istio but comes with more flexibility. This flexibility comes from multiple dimensions of pluggable architecture. For instance, in terms of connectivity, Linkerd works with the most popular ingress controllers, like Nginx, Traefik, or Kong. Similarly, in addition to its own GUI, it works with Grafana, Prometheus, and Jaeger for observability.
Consul was the most popular service discovery and key/value storage used in distributed applications until its parent company, HashiCorp, converted into a service mesh under the name Consul Connect.
As a result, Consul Connect has a hybrid architecture with Envoy sidecars next to applications, and its control plane and key/value store were developed in Go. From the perspective of connectivity and security, Consul Connect does not provide outstanding features when compared to its alternatives. However, it has less configuration and complexity, making it easier to get started with—much like the other HashiCorp tools in the cloud-native world.
The chart below provides a broad overview of the critical differences between these top three solutions:
Figure 1: Key differences between Istio, Linkerd, and Consul Connect
Service mesh standardizes and automates inter-service communication within your clusters and applications. However, because the products are complex and infrastructures are different, service mesh products are not straightforward. While working with service mesh, the following notes on challenges and best practices will provide you with some helpful guidelines:
Service mesh configurations consist of traffic rules, rate limits, and networking setup. The configuration helps you to install from scratch, upgrade versions, and migrate between clusters. Therefore, it is suggested to treat the configuration as code and follow the GitOps approach with a continuous deployment pipeline.
Service mesh products work better with a few clusters that have a high number of servers, rather than many clusters with fewer instances. Therefore, it’s suggested to minimize redundant clusters as much as possible, allowing you to take advantage of easy operation and a centralized configuration of your service mesh approach.
Service mesh products are complex applications managing the traffic of even more complex distributed applications. Therefore, metric collection, visualization, and dashboards are critical for system observability. Utilize Prometheus or Grafana—or any other integration point your service mesh provides—to create alerts based on your requirements.
Most service mesh products, including the top three, implement a basic set of security features: mutual TLS, certificate management, authentication, and authorization. You can also define and enforce network policies to limit the communication between applications running in the cluster.
It should be noted, though, that defining network policies is not a straightforward task. You need to cover all scenarios for currently running applications and consider scalability in the future. Therefore, using network policies with service mesh is not user-friendly and prone to errors and security breaches.
However, utilizing a service mesh for creating secured network policies have a couple of drawbacks.
First, at the user must define exactly the policies that you believe your cluster requires – this is an untrivial task in an environment where microservices proliferate and continuously change. Thus, the service mesh policies need to be changed frequently and might break production if a microservices changes its behavior.
Second, by design, the service mesh uses a sidecar proxy to control policies, so any connection coming out of a container is automatically treated as legitimate traffic, if attacker breaks into a container, they automatically inherit that container network identity and thus can do anything that the original container can do.
Finally, since every connection goes through a proxy, users see significant performance degradation when using it for encrypting traffic in the cluster.
To summarize - service mesh solutions do not care who is sending or receiving the data. Any malicious or misconfigured application can retrieve your sensitive data if it is allowed by the network policies. Thus, it is vital to consider holistic approaches with less overhead and better operability, rather than blindly trusting the security measures of service mesh products alone.
Service mesh connects distributed microservices in a dynamic, secure, and scalable way. There are widely accepted use cases and top-tier products that implement them. However, because cloud infrastructures and application requirements are highly complex, service mesh is not a silver bullet.
When it comes to security, protecting applications and runtime environments is not in the scope of service mesh products, and its overkill to install a service mesh just for security, since it creates a high overhead in the cluster. Instead, there are leaner and more security-oriented tools such as ARMO that handle security in a cloud-native way.
ARMO is a future-proof solution to the drawbacks and challenges of providing security using service mesh products. ARMO Kubernetes Fabric™ creates a secure and seamlessly integrated runtime protection for microservices. ARMO provides a complementing approach to security, identity, and TLS management, without creating extreme network overhead like service mesh solutions. It enables the use of a service mesh and get the benefits of it, while hardening it in a lean way to make sure your applications are also secure, without adding performance or logistic overhead.
The security of your applications, data, and runtime is a top priority. Thanks to ARMO you can have bullet-proof applications and easily and dynamically manage the network traffic between them using service mesh. ARMO provides an out-of-the-box zero trust policy with its innovative approach.