This is an updated version of an article written by our ownLee Atchison最初出现在新堆栈

Bringing down an entire application is easy. All it takes is the failure of a single service and the entire set of services that make up the application can come crashing down like a house of cards. Just one minor error from a non-critical service can be disastrous to the entire application.

There are, of course, many ways to prevent dependent services from failing. However, adding extra resiliency in non-critical services also adds complexity and cost, and sometimes it is not needed.

查看下图,如果服务D对服务的运行不重要,会发生什么?为什么服务一个失败只是因为服务D失败了?如果在没有它的情况下,如果高度关键的服务,为什么服务D应该具有很高的弹性?

How do you know when a service dependency link is critical and when it isn’t? Service tiers are one way to help manage this.

What are service tiers?

A service tier is simply a label associated with a service that indicates how critical a service is to the operation of your business. Service tiers let you distinguish between services that are mission-critical, and those that are useful and helpful but not essential.

通过比较服务层的依赖的服务水平ices, you can determine which service dependencies are your most sensitive and which are less important.

为服务分配服务层标签

系统中的所有服务 - 无论多大或多大 - 应分配服务层。以下部分概述了我使用的示例比例。您可以使用它或调整它以满足您的特定业务需求。

Tier 1

Tier 1 services are the most critical services in your system. A service is considered Tier 1 if a failure of that service will result in a significant impact to customers or to the company’s bottom line.

The following are some examples of Tier-1 services:

  • 登录服务。允许用户登录系统的服务。
  • 信用卡处理器。处理客户付款的服务。
  • 许可服务。A service that tells you what features a given user may have access to.
  • 订单接受服务。让客户在您的网站上购买产品的服务。

A Tier-1 service failure is a serious concern to your company.

Tier 2

A Tier-2 service is one that is important to your business but less critical than a Tier 1. A failure in a Tier-2 service can cause a degraded customer experience in a noticeable and meaningful way but does not completely prevent your customer from interacting with your system.

二级服务也影响你的服务backend business processes in significant ways, but might not be directly noticeable to your customers. The following are some examples of Tier-2 services:

  • 搜索服务。在您的网站上提供搜索功能的服务。
  • Order fulfillment service.一项服务使您的仓库可以处理向客户发货的订单。

Tier-2服务的失败将具有负面客户影响,但不代表完整的系统故障。

Tier 3

Tier-3服务是一个可能具有轻微,不明智或难以解决的客户影响的服务,或对您的业务和系统的影响有限。

以下是Tier-3服务的一些示例:

  • Customer-icon service.A service that displays a customer icon or avatar on a website page.
  • 建议服务。显示备用产品的服务,客户可能会根据当前正在查看的内容感兴趣。亚博最新版直播
  • 一天服务的信息。将警报或消息显示给网页顶部的客户。

Customers may or may not even notice that a Tier-3 service is failing.

Tier 4

A Tier-4 service is a service that, when it fails, causes no significant effect on the customer experience and does not significantly affect the customer’s business or finances.

The following are some examples of Tier-4 services:

  • 销售报告生成服务。一份生成每周销售报告的服务。虽然销售报告很重要,但发电机服务的短期失败不会产生重大影响。
  • Marketing email sending service.A service that generates emails sent regularly to your customers. If this service is down for a period of time, email generation might be delayed, but that will typically not significantly affect you or your customers.

如何使用服务层

服务层会影响系统的两个方面,需要对服务之间的问题和依赖性的响应性。

Responsiveness

The service tier level of a service determines应如何快速或不快速解决服务问题。Of course, the higher the significance of a problem, the faster it should be addressed. But, in general, the lower the service tier number, the higher importance the problem likely is and the faster it should be addressed. A low-to-medium severity Tier-1 problem is likely more important and impactful than a high severity Tier-4 problem.

Dependencies

鉴于给予更高的重要服务的响应性(较低服务层数)的响应差异,这会影响您对服务依赖性的服务和假设之间的依赖关系映射。

If a Tier-4 (low priority) service makes a call to a Tier-1 (high priority) service, then it probably is safe for the Tier-4 service to assume that the Tier-1 service will always respond, and if for some reason it does not respond, it would typically be acceptable for the Tier-4 service to simply fail itself. After all, if a Tier-1 service for your application is down, significant efforts will be immediately in place to try and resolve that service problem. The fact that a Tier-4 service is also down will not be of consequence. Think of the case where your web application is down because users cannot log in (a Tier-1 service problem). How concerning will it be that the marketing emails for the day might be delayed a bit (a Tier-4 service problem)?

But the reverse is not true. If a Tier-1 service depends on a Tier-4 service, that Tier-1 service must have developed contingency plans and failover recovery plans for when that Tier-4 service might be down. After all, you don’t want a Tier-1 service to fail simply because a much lower priority Tier-4 service is not functioning. As an example, you do not want your web application to fall down and fail simply because you cannot display the customer’s avatar in the corner of every page. You will want to gracefully recover and simply not display the avatar, but continue having your application work otherwise normally.

An example

Take a look at the figure below. In this figure, we assigned service tiers to each service. Given the rules described above, note that we need additional resiliency added between Service A and Service D because Service A is a higher priority service (Tier 1) than is Service D (Tier 3). Therefore, Service A needs to protect itself from Service D failures, given Service D is lower priority.

Now look at Service B. Service B also depends on Service D, but in this case, according to our rules above, Service B does not need the additional resiliency between it and Service D. This is because Service B is a lower priority service (Tier 4) than Service D (Tier 3). So, it’s more acceptable for Service B to suffer an outage at a time when Service D is unavailable. Service D, in this example, is more important.

通过仔细分析您的服务和适当的层分配,您可以确定何处将您的开发,测试和弹性依赖于服务间依赖项,首先优先考虑最关键的和最脆弱的接口,而不会在不太关键的接口中进行过度投资。

服务层是标签

服务层简单地提供了一个“标记”系统,为您提供有关系统中每个服务的重要性的信息。您可以使用该标签来确定问题升级策略,过程和优先级。

但是,如果一个服务无法拨打依赖服务,您还可以使用该标签来确定所需的退回金额和类型,并恢复。如果您正在调用更高或更低的服务,您会做什么以及您的响应如何取决于您的响应。

Lee Atchison is the Senior Director, Cloud Architecture at New Relic. For the last eight years he has helped design and build a solid service-based product architecture that scaled from startup to high traffic public enterprise. Lee has 32 years of industry experience, including seven years as a Senior Manager at Amazon.com, and has consulted with leading organizations on how to modernize their application architectures and transform their organizations at scale. He is the author of the O’Reilly book架构规模博客的作者lee @ scale.。查看帖子

Interested in writing for New Relic Blog?送我们一个球场!!