The money was deducted, but the order was unsuccessful! The most comprehensive solution to the exception of payment dropped orders

The money was deducted, but the order was unsuccessful! The most comprehensive solution to the exception of payment dropped orders


image.png



Little black

Don’t you know if you are having fun this long vacation? Have you been stuck on the road when you went out to play?

Haha, this long vacation, I was still a bit tired (Ĭ ^ Ĭ ).

Two days before the long holiday, I worked overtime to migrate the historical data of the old system. Because this depends on the migration progress of my colleague, the first day was actually okay. I just paddled the water, optimized the program, and went out to watch "Jiang Ziya ".

On the next day, another day was planned. By night, my colleague finally settled his migration, and I could finally start the migration.

But at this time, Party A’s father actually started to add various needs. There was no other way but to meet them, and he started to modify the program temporarily, and then added monitoring, until 12 o'clock.

image.png

After finishing this, I thought there was nothing left for moths, so I only need to check the migration progress every day during the long vacation.

Hey, I didn’t expect to forget to consider performance when designing. When the amount of data increased to tens of millions, the query speed was extremely slow. I had no choice but to optimize the program and figure out a way to speed up the query. Finally, it was four o’clock in the morning.┭┮﹏┭ ┮

The design is really not standardized, and I work overtime two lines of tears~ I will have the opportunity to review this large data migration with you later.

Preface

Well, returning to today's topic, today I will share some handling methods of abnormalities in the payment system.

In fact, these processing methods are not limited to payment systems, but can also be applied to other systems. You can learn from them and apply them to your own system to improve the robustness of your own system.

Anomalies are problems that inevitably occur when the system is running. If everything is normal, our system design will be quite simple.

But unfortunately no one can do this, so in order to deal with the problems that exceptions may cause, we have to add a lot of additional design to deal with these exceptions.

It can be said that in system design, exception handling requires us to focus on thinking and will occupy most of our energy.

Let’s first look at the most common anomaly in the payment system: "Order drop"

Abnormal order drop

One of the most common payment platform architecture relationships is as follows:

image.png

In the above picture, we are from the payment perspective of a third-party payment company. If it is the internal payment system of our company, then the external merchant is actually some internal system of the company, such as the order system, and the external payment channel is actually the third-party payment company

Let's take Ctrip as an example. Initiating an order payment on it will go through three systems:

  1. Ctrip creates an order and initiates a payment request to a third-party payment company

  2. 第三方支付公司创建订单,并向工行发起支付请求

  3. 工行完成扣款操作,返回第三方支付公司

  4. 第三方支付完成订单更新并返回携程

  5. 携程变更订单状态

上面的流程,简单如下图所示:

image.png

在这个过程就可能会碰到,用户工行卡已经扣款,但是携程订单却还是待支付,我们通常将这种情况称为「掉单」

上述掉单的场景,多数是因为「③、⑤」环节信息丢失导致,这种掉单我们将其称为「外部掉单」

还有一种极少数的情况,收到 「③、⑤」环节返回信息,但是在「④、⑥」环节内部系统更新订单状态失败,从而导致丢失支付成功的信息,这类掉单由于是内部问题,我们通常将其称之为「内部掉单」

外部掉单

外部掉单是因为没有收到对端返回信息,这种情况极有可能是网络问题,也有可能对端处理逻辑太慢,导致我方请求超时,直接断开了网络请求。

增加超时时间

对于这种情况,第一个最简单的解决办法,「适当的增加超时时间」

不过这里需要注意了,在我们增加网络超时时间之后,我们可能还需要调整整个链路的超时时间,不然有可能导致整个链路内部差事从而引起内部掉单。

画外音:对接外部渠道,一定要「设置网络连接超时时间与读取超时时间」

接收异步通知

第二个办法,接收渠道异步回执通知信息。

一般来说,现在支付渠道接口我们都可以上送一个异步回调地址,当渠道端处理成功,将会把成功信息通知到这个回调地址上。

这种情况下,我们只需要接收通知信息,然后解析,再更新内部订单状态。

image.png

支付系统异常处理-支付异步通知

这种情况下,我们需要注意几点:

  1. 对于异步请求信息,一定需要对通知内容进行签名验证,并校验返回的订单金额是否与商户侧的订单金额一致,防止数据泄漏导致出现“假通知”,造成资金损失。

  2. 异步通知将会发送多次,所以异步通知处理需要幂等。

掉单查询

有的渠道可能没有提供异步通知的功能,只提供了订单查询的接口,这种情况下,我们只能使用第三种解决办法,定时掉单查询。

我们可以将这类超时未知的订单的单独保存到掉单表,然后定时向渠道端查询订单的状态。

若查询成功或者明确失败(比如订单不存在等),可以更新订单状态,并且删除掉单表记录。

若查询依旧未知,这时我们需要等待下次查询的结果。

image.png

这里我们需要注意了,有些情况下,有可能无法查询返回订单的状态,所以我们需要设置订单查询的最大次数,防止无限查询浪费性能。

对账

最后,极少数的情况下,订单查询与异步通知都无法获取的支付结果,这就还剩下最后一种兜底的解决办法,对账。

如果第二天渠道端给的对账文件有这一笔支付结果,那么我们可以根据这个记录更新直接更新我们内部支付记录。

之前小黑哥写过一篇对账文章,感兴趣的可以再看一下:聊聊对账系统的设计方案

画外音:稳妥一点,可以先发起查询,然后根据查询结果更新订单记录。

不过有些极端情况,查询无法获取结果,那么直接更新内部记录即可。

那如果第二天也没有这笔记录的结果,这种情况下,我们可以认为这笔是失败的。如果用户被扣款,渠道端内部将会发起退款,将支付金额返回给用户。所以这种情况可以无需处理。

内部掉单异常

支付公司内部订单关系

接下来我们讲下内部掉单异常,首先我们来看下为什么会发生内部掉单的异常,这其实跟我们系统架构有关。

image.png

如上图所示,第三方支付公司内部表通常为支付订单与渠道订单这样一种 1 比 N 的关系。

支付订单保存着外部商户系统的订单号,代表第三方支付公司内部订单与外部商户的订单的关系。

而渠道订单代表着第三方支付公司与外部渠道的关系,其实对于外部渠道系统来讲,第三方支付公司就是一个外部商户。

为什么需要设计这种关系那?而不是使用下面这种 1 对 1 关系的那?

image.png

如果我们使用上图 1 对1 的订单关系,如果第一次支付支付失败,外部商户可能会再次使用相同订单号对第三方支付公司发起支付。

这时如果第三方支付公司也拿相同的内部订单去请求外部渠道系统,有可能外部渠道系统并不支持同一订单号再次请求。

那其实我们也有其他办法,生成一个新的内部单号,更新原有支付订单上内部记录,然后去请求外部渠道系统。但是这样的话就会丢失上次支付失败记录,这就不利于我们做一些事后统计了。

那其实第三方支付公司也可以不支持相同的订单号再次发起请求,但是这样的话,就需要外部商户重新生成的新的订单号。

这样的话,第三方支付公司是系统是简单了,全部复杂度都交给了外部商户。

但是现实的情况,很多外部商户并不是那么容易更换生成新的订单号,所以一般第三方支付公司都需要支持同一外部商户订单号在未成功的情况下,支持重复支付。

在这种情况下,就需要我们上面的 1:N 的订单关系图了。

内部掉单异常的原因

当我们收到外部渠道系统的成功的返回信息,成功更新了渠道订单表的记录。但是由于渠道订单表与支付订单表可能不是同一个数据库,也有可能两者并不在同一个应用中,这就有可能导致更新支付订单表的更新失败。

image.png

由于支付订单表保存着外部商户订单与内部订单关系,支付订单未成功,所以外部商户也无法查询得到成功的支付结果。

此时渠道订单表已经成功,所以上面外部掉单的方法并不适用内部掉单。

内部掉单异常解决办法

「第一种解决办法,分布式事务。」

内部掉单异常,说白就是因为支付订单表与渠道订单表无法使用数据库事务保证两者同时更新成功或失败。

那么这种情况下,我们其实就需要使用分布式事务了。

不过我们没有采用这种分布式事务,一是因为之前开发的时候市面上并没有开源成熟分布式事务框架,第二自己自己开发难度又很大。

所以对于分布式事务这一块,并没有什么使用经验。如果有使用分布式事务解决这类的问题同学,留言区可以评论一下。

「第二种解决办法,异步补偿更新。」

When an internal order drop occurs, that is, a failure to update the payment order, etc., the payment order can be saved to an internal order drop table.

But there may be a problem here. We cannot guarantee that the step of saving to the internal drop list will also succeed.

Therefore, we also need to query regularly to query the payment order records that have been unsuccessful in a period of time and the channel order table has been successfully paid, and then insert them into the internal order drop table.

Another system application only needs to scan the internal order drop table regularly to make the payment order successful, and then delete the internal order drop record.

It should be noted here that when the amount of data in the payment order table is large, the regular query may be slow. In order to prevent the main database from being affected, this type of query can be performed in the standby database.

to sum up

Today, I mainly introduced the abnormal order drop in the payment system. This type of abnormality often results in the fact that the user has actually been deducted, but the merchant order is still waiting to be paid.

If this exception is not handled well, it will lead to a very bad customer user experience and may receive complaints from customers.

The exception of a dropped order can usually be an external system or an internal system. Most of the orders dropped are caused by external systems. We can increase the timeout period, order dropped queries, and accept asynchronous notifications to solve 99% of the problems. The remaining 1% of orders dropped can only be done through the next day's reconciliation. .

The internal system causes the abnormal order drop to be a typical data consistency problem in a distributed environment. We do not need to pursue strong consistency for this type of problem, as long as we ensure the final consistency. We can use distributed transactions to solve such problems, or we can scan orders with inconsistent status regularly, and then do batch updates.

Finally, this time I only introduce a type of abnormal order drop in the payment system. In the next article, I will introduce you to other abnormalities in the payment system, so stay tuned!


Guess you like

Origin blog.51cto.com/7592962/2543044