TiDB x Catalyst丨Second-level insight into the value of data, TiDB helps "customer success SaaS vendors" improve user experience

guide

Catalyst, a New York-based SaaS startup, provides an intuitive and flexible Customer Success Platform (Custom Success Platform) that helps customer success teams aggregate customer data, gain insight into customer health, and drive customer retention and business growth. At present, Catalyst has completed the B round of financing.

This article is the third part of the topic "Best Practices of Using TiDB in Global Extreme Scenarios and Innovative Scenarios", sharing how TiDB reduces maintenance costs and provides better customer experience for Catalyst.

business features

Catalyst integrates massive data from different sources including Salesforce, Mixpanel, PostgreSQL, etc., and incorporates it into the Catalyst ecosystem for processing, analysis, and generation of data insights that can be referenced and executed.

Catalyst primarily handles three types of data: transactional, read-only, and time-series.

  • Transactional data primarily includes internally created notes and tasks, as well as external data collected from Salesforce, Zendesk, and other platforms.
  • Read-only data mainly refers to ticket data collected from platforms such as Jira and Zendesk.
  • Time series data is one of the most important and tricky data types for Catalyst. Being able to handle this type of data is also one of the important requirements for the database selection of the Catalyst team.

Previous data architectures and their bottlenecks

Catalyst initially used PostgreSQL for all data collected externally. However, as its business grew and its data sources rapidly expanded, PostgreSQL couldn't keep up with its demands. Catalyst initially tried to remedy this by storing data as JSON documents, but query performance suffered severely.

Subsequently, the team turned to a pre-caching solution. They use Elasticsearch to store results for faster response to customer queries. However, since Elasticsearch does not support SQL-style JOINs, Catalyst must precompute everything before storing it in Elasticsearch. As the amount of stored data increases, the cost also rises sharply.

To address these issues and expand business growth, the Catalyst team decided to redesign the entire data processing and storage system. They also discovered TiDB, a new generation of distributed relational database, at this time.

Data Layer Refactoring

Catalyst's new architecture is divided into five data layers: data ingestion layer, data lake layer, Spark layer, data service layer, and web application layer. Raw data comes in through the ingestion layer and continues into the data lake layer. The Spark layer combines data objects and performs precomputation to make sense of the data. The data service layer stores all preprocessed data for client queries. Because it directly affects the user experience, the data service layer is the most important to Catalyst, and it has also become the place where Catalyst urgently needs a new data stack. Layers below the data services layer do not need to be real-time. However, at the data serving layer, Catalyst requires sub-second latency so clients can get results quickly.

Required Capabilities for a New Technology Stack

To serve its growing clientele, Catalyst desperately needed a database with the following characteristics:

Support for mixed transactional and analytical workloads. Catalyst must handle transactional and read-only data, as well as time-series data. They need a solution, whether a single database or a combination of databases, that can handle both transactional and analytical workloads.

Quick response. The new database solution had to be more flexible than Catalyst's previous solution, especially in terms of query speed and user interface performance. It must react to queries within seconds and have low update latency.

Handle complex and highly customized data. Catalyst customers can customize many settings, including queries, data transformations, and relationships, within the Catalyst platform and on data source platforms such as Salesforce and Zendesk. Composition of custom objects integrated with many custom fields can be quite complex. New solutions must be able to handle this situation.

High availability. Catalyst needs to be agile and responsive to their customers. Keeping the system up and running is Catalyst's top priority. Once Catalyst goes down, customers often complain within tens of seconds. Therefore, the new database solution had to be highly available to help Catalyst easily handle any possible system failures.

Horizontal scalability. Scalability is another must-have. The amount of data that Catalyst processes is very large and growing. Database solutions must scale easily to enormous sizes.

Strong data consistency. Data consistency is another requirement. But given that so much data processing happens in streams, it is very difficult to maintain strong data consistency across the system. So Catalyst can accept eventual consistency (Eventual Consistency).

TiDB stands out in performance tests

Catalyst was careful when choosing a new database; they looked at TiDB and two other options: Aurora combined with AWS Timestream, and YugaByte combined with AWS Timestream. These options are a combination of online transaction processing (OLTP) databases and time series databases.

To test the three candidate solutions, Catalyst ran grouped queries in serial parallelism under the load of large real-world datasets from internal Salesforce and Jira instances. Query response speed is one of the most important evaluation criteria.

TiDB's response time for both typical and aggregated queries is within seconds, much faster than other candidate solutions. At the same time, TiDB is flexible and agile enough for time series aggregation queries, returning results within 7 seconds. The table below summarizes some key test results.

The types of queries are:

  • Typical Queries: The queries that customers are most interested in.
  • Aggregation query: mainly based on complex JOIN calculations.
  • Time-series aggregation queries: Catalyst did not test time-series aggregation queries on Aurora and Yugabyte solutions because of time constraints and TiDB's performance was impressive enough for them.

key test results

Why choose TiDB?

Quick query response

Depending on the query type, TiDB's response time is 10 to 60 times faster than its competitors. This is the most important reason why Catalyst chose TiDB.

Perfect support for online DDL

TiDB supports online Data Definition Language (DDL) operations without affecting online businesses. TiDB provides hassle-free schema changes and allows Catalyst to add or remove indexes faster, especially for large tables. This is especially useful when they have slow queries and need to add indexes quickly to improve performance. With online mode changes, Catalyst does not require downtime for online operations or long maintenance windows.

HTAP mixed workload database

TiDB is a hybrid transactional and analytical processing (HTAP) database. Among the three candidates evaluated by Catalyst, TiDB is the only database whose technology stack can handle both object data and time series data. Not only was this very efficient, but it also saved Catalyst a lot of time, effort and money.

horizontal scalability

TiDB has a high level of scalability. This is a perfect fit for Catalyst's business needs to handle ever-expanding data volumes. TiDB also supports the separation of computing and storage resources, which allows Catalyst to scale these two resources independently and also helps control costs.

Fast disaster recovery

TiDB uses the Raft consensus algorithm to ensure high availability and secure replication of data. TiKV is TiDB's storage server. Data is redundantly replicated between TiKV nodes and placed in different availability zones to prevent machine or data center failures. This ensures Catalyst system uptime. In addition, TiDB provides a variety of disaster recovery options, each of which is suitable for different scenarios with flexible costs.

Comprehensive Managed Services

Catalyst has a small DevOps team, so they needed a fully managed database solution to ease the burden on the team and control costs. TiDB's fully managed service, TiDB Cloud, meets this need.

cloud neutral

Catalyst's services are deployed across clouds to ensure business agility: some workloads run on Google Cloud Platform (GCP) and some on Amazon (AWS). Therefore, they need a cloud database solution that supports multi-cloud deployments. TiDB Cloud is exactly such a solution.

Summarize

Catalyst had primarily used PostgreSQL to process customer data, but the system quickly hit a bottleneck. They redesigned the data architecture and introduced new databases to provide data to customers. By adopting TiDB, Catalyst is able to provide better customer experience, including faster query response, more elastic system, and stronger data storage, processing and analysis capabilities. Catalyst also reduces their overall maintenance costs.

Guess you like

Origin blog.csdn.net/TiDB_PingCAP/article/details/131183537