High performance architecture graphql to sql

The Hasura GraphQL engine provides a hypertext transfer protocol application programming interface to query Postgres using GraphQL in a permission-safe manner.

You can use foreign key constraints in Postgres to query hierarchical data in a single request. For example, you can run this query to get the "album" and all its "tracks" (provided that the "tracks" table has a foreign key to the "album" table):


As you might have guessed, the query can traverse the table to any depth. This query interface combined with permissions allows front-end applications to query Postgres without writing any back-end code.

The application programming interface is designed to be fast (response time) and handle high throughput (requests per second) while saving resources (low CPU and memory usage). We discussed the architectural decisions that enabled us to achieve this goal.

The query of data microservices goes through the following stages:

  1. Session resolution: The request arrives at the gateway, the gateway resolves the authorization key (if any), adds the user id and role header, and then proxies the request to the data service.
  2. Query analysis: The data service receives a request, parses the title to obtain the user id and role, and parses the subject as a GraphQL AST.
  3. Query verification: Check whether the query is semantically correct, and then enforce the permissions defined for the role.
  4. Query execution: The verified query is converted into a SQL statement and executed on Postgres.
  5. Response generation: Postgres results are processed and sent to the client (the gateway will add gzip compression if needed).

The requirements are roughly as follows:

  1. The hypertext transfer protocol stack should add little overhead, and should be able to handle a large number of concurrent requests to obtain high throughput.
  2. Quick query translation (GraphQL to SQL)
  3. The compiled SQL query should be efficient on Postgres.
  4. Postgris' results must be effectively sent back.

The following are various ways to obtain the data required for GraphQL queries:

GraphQL query execution usually involves executing a parser for each field. In the example query, we will call a function to get the albums released in 2018, and then for each of these albums, we will call a function to get the track, which is a classic N+1 query problem. The number of queries increases exponentially with the depth of the query.

The query executed on Postgres is as follows:

This will be the sum of N+1 queries to get all the required data.

Projects like Data Loader aim to solve the N+1 query problem through batch query. The number of requests no longer depends on the size of the result set, but on the number of nodes in the GraphQL query. In this case, the sample query requires two queries to Postgres to obtain the required data.

The query executed on Postgres is as follows:

This gave us all the albums. To get all the tracks of the desired album:

This is a total of 2 queries. We avoid issuing queries to get the track information of each album, but use the where clause to get all the tracks of the desired album in one query.

The data loader is designed to work across different data sources and cannot take advantage of the functionality of a single data source. In our example, our only data source is Postgres. Like all relational databases, Postgres provides a way to collect data from several tables in a single query ka connection. We can determine the tables needed for a GraphQL query and use the connection to generate a single query to get all the data. Therefore, the data required for any GraphQL query can be obtained from a single query. Before sending to the client, these data must be appropriately converted.

The query is as follows:


This will provide us with the following data:

Album id(_ d)

Album title (_ t)

Track id

Track title (_ t)

1

Albumin 1

1

track1

1 Albumin 1 2 track2
2 Albumin m2 air air

This data must be converted into a JSON response with the following structure:


We found that most of the time processing the request is spent on the conversion function (it converts the SQL result into a JSON response). After trying some ways to optimize the conversion function, we decided to remove this function by pushing the conversion into Postgres. Postgres 9.4 (released approximately when the first data microservice was released) added JSON aggregation functions, which helped us to advance the conversion to Postgres. The generated SQL will become similar to:


该查询的结果将有一列和一行,并且该值被发送到客户端,无需任何进一步的转换。 从我们的基准测试来看,这种方法大约比哈斯克尔的转换函数快3-6倍。

根据查询的嵌套级别和使用的条件,生成的SQL语句可能非常大而且复杂。 通常,任何前端应用程序都有一组用不同参数重复的查询。 例如,上述查询可以针对2017年而不是2018年执行。 准备好的语句最适合这些用例。当你有复杂的SQL语句时,这些语句会随着一些参数的改变而重复。

因此,第一次执行这个GraphQL查询时:


我们准备了SQL语句,而不是直接执行它,所以生成的SQL将是(注意$1):

接下来执行这个准备好的语句:

当GraphQL查询更改为2017年时,我们只需直接执行准备好的语句:

根据GraphQL查询的复杂程度,这大致可以使我们提高10-20% .

Haskell非常适合各种原因:

  • 具有出色性能的编译语言(本)
  • 高效能的HTTP堆栈(warp,warp的体系结构)
  • 我们之前使用该语言的经验

高效能的HTTP堆栈(翘曲,翘曲的体系结构(
这是Hasura的建筑与Prisma和Postgraphile的比较。

数据库实时同步

我们之前使用该语言的经验

  1. 8GB RAM,i7笔记本电脑
  2. Postgres在同一台机器上运行
  3. wrk被用作基准测试工具,并且针对不同类型的查询,我们尝试“每秒”最大化请求
  4. 查询Hasura GraphQL引擎的单个实例
  5. 连接池大小:50
  6. 数据集:chinook

查询1:tracks_media_some


  • 每秒请求数:1375 req / s
  • 5毫秒
  • 30%
  • 30MB(Hasura)+ 90MB(Postgres)

查询2:tracks_media_all


  • 每秒请求数:410 req / s
  • 延迟时间:59毫秒
  • 查询1:轨道_媒体_部分100%
  • 每秒请求数:1375次请求/秒每秒请求数:410次请求/秒

查询3:album_tracks_genre_some


  • 每秒请求数:1029 req / s
  • 延迟时间:24ms
  • 30%
  • 30%30MB(Hasura)+ 90MB(Postgres)

查询4:album_tracks_genre_all


  • 每秒请求数:328 req / s
  • CPU:100%
  • 30MB(Hasura)+ 130MB(Postgres)

Guess you like

Origin blog.csdn.net/weixin_49470452/article/details/107506394