W05 - Risk Exposure of the Payment SDK
In the B2B payments Quick Donkey experience optimization initiative, to better measure changes before and after optimization, we redefined a set of session-level user metrics intended to more objectively describe B2B user experience and service performance. One task was to supplement the reporting data from the current JS SDK according to the new metric structure.
The JS SDK is a critical component for the checkout. Most checkout traffic relies on this SDK to access payment services, with estimated daily calls around 500K. Because its impact is so broad, I personally set a high threshold for making changes to the SDK. Its logic has also been very stable, and over a year of working on B2B payments there have been no iterations.
The development and scope of changes to add tracking data were actually small, but I ended up delaying release by a week. It wasn’t procrastination — it was apprehension. The release process exposed enough risk that I needed time to prepare mentally.
The SDK release process feels uncontrollable. It lacks visibility and key operational controls. The SDK is hosted on Burst; after reviewing Burst’s dashboard and roadmap, it felt somewhat neglected. After code submission, deployment relies solely on a Burst-provided command-line tool. That command-line execution is a black box; when it finishes, it means Burst’s origin machines have been updated. Then we must manually refresh the CDN—this step feels like blowing a dandelion seed: you have no idea where it will land. You must guess the CDN nodes’ refresh progress, whether refresh is complete, and when end users will actually see the new SDK because of client-side caches. The process has no capability for gradual rollouts, no fast rollback path, and, crucially, it is completely silent—no approvals or notifications to stakeholders.
After release, we can only monitor internal SDK exceptions; for call volume, service stability and other technical metrics we depend on Burst and CDN vendors’ information, but the completeness and accuracy of that data are questionable.
I consulted Owl and LX about the issues above. None had a perfect solution, but they offered insights for future release governance. A major difference between the JS SDK and native SDKs is the versioning challenge introduced by full dynamism. If the business integrates the SDK via a URL that cannot change, the concept of versions becomes hidden from the business.
There are other issues inside the SDK as well, such as polluting the global window object. This year we will include SDK governance in the B2B frontend roadmap. At current daily volumes of several hundred thousand requests it’s manageable, but at millions per day it would become a serious hidden risk.
Last updated