【GraphQL】GraphQL学习笔记 -------未完成

原文：https://zhuanlan.zhihu.com/p/109424841

什么是API？

如果问上学的时候的我，我会说API就是Application programming interface，这个时候HR就会不懂装懂地点点头然后让我进入下一轮面试。

如果问刚刚工作的我，我会说API就是接口嘛！PM灵光一现想出了一个新功能，我花几天时间写一个API，把这个功能体现出来，前端做网页做APP的人可以用。

比如雅虎，有提供今天的天气的API，查阅新闻的API，看体育比分的API。现代社会衣食住行离不开API，上淘宝看打折商品，用滴滴打车，上班发邮件，下班看电视剧，背后都是API在支撑。

当然API还可以从封装方式来区分。主要是两个流派，一派是基于REST，一派是基于RPC。REST是用HTTP封装，而RPS往往用自定义的协议封装。今天REST这一派衍生出了GraphQL，而RPC这一派衍生出了gRPC。这四种到底该用哪种，是今天互联网公司的日常争论。最后往往要吵到最核心的API定义：API到底是用来干什么的。

所以什么是API，现在的我会这么解释：API的本质就是帮人读数据、写数据。流派在变，技术在变，写API、用API的人的职称也会变，但是API的本质不会变。无论是哪种API，它的终极目的就是能让人读数据读的轻松，写数据写的愉快。懂了这个，就明白了GraphQL解决的是什么问题。

REST API的问题

既然要写GraphQL，就需要明白它的前辈REST。

REST有几种操作，POST是写新数据，GET是读数据，PUT是改数据，DELETE是删数据。还有一些不常用的，比如PATCH、HEAD什么的，一般不用。这几种操作都是基于HTTP协议的，而且很好理解。如果想看今天的天气，那么用GET。如果我想买一个手提包，那就用POST。如果我想改我的QQ名，用PUT。如果我想删掉我十年前的一篇博客，那就用DELETE。

但如果生活都是这么简单就好了。

REST这些操作往往界限很模糊。比如写新数据、改数据、删数据，这三个就往往分不清楚。

我举一个真实的例子：点赞。我原来写雅虎评论区的API的时候，就为这个头疼过。点赞有很多种实现方法。

比如我可以全用POST。写一条新的“赞”，POST一个“赞”上去。如果我想把赞变成踩呢？那就POST一个“踩”。如果我想取消点赞，那就再POST一个取消。

还有一种实现方式，就是全用PUT。所有人对所有评论默认状态是“不赞不踩”，这个状态是中性的。如果我要点赞，那就把我“不赞不踩”的状态改成“赞”。点“踩”也是一个道理。如果取消，就再改回“不赞不踩”。

当然有人还会认为取消“赞”应该用DELETE，因为要删数据。

总结一下，光点赞的实现就有四种方法：

所有操作都用POST
所有操作都用PUT
点赞点踩用POST，取消用DELETE
点赞点踩用PUT，取消用DELETE

我在雅虎写API的时候，用的就是第四种。结果前端工程师有时候会搞不清楚，以为我用的是第一种。点赞这么简单的API，就有四种方法实现，更复杂的API就更难理解了。有的API对数据修改很多，既需要写一些新的，也需要改一下旧的，最后还要删一下重复的。这设计起来就太乱了。

API还有一个问题，就是冗余信息过多。比如我要看一篇新闻报道，那我就做一个GET，GET到的东西有

标题
新闻机构（比如新华社）
新闻类别（比如体育、财经）
新闻图片
摘要（一两句话概括）
文本
新闻视频
记者
发布时间
新闻链接
原始链接

但问题是这些东西往往有很多都用不到，比如这个界面：

它只有标题、图片、新闻机构，这么多response field用三个就行。

再看这个：

这个需要五个：标题、新闻类别（财经）、新闻机构、摘要、图片。

哪怕我只需要三五个response field，我都要用API拿到全部11个数据。这不是浪费流量么？

还有一个问题，就是拼装Post Body很累。比如我想发一条评论，post body就可以大概写成

{
  "text": "我一句话不说，这是坠吼的。",
  "authorId": "prc386",
  "contextId": "news_id_123456",
  "sendFrom": "Android",
  "created": 1582861688
}

这么个JSON其实就是一个长长的字符串，每次我要发评论，我都得拼装这么个东西。如果我可以用模板+变量就好了，也就是说我存一个固定的模板

{
  "text": $1,
  "authorId": $2,
  "contextId": $3,
  "sendFrom": $4,
  "created": $5
}

然后我只需要把$1、$2、$3这些变量设好就行了。REST目前不支持这么做，只能用一些别的library来实现。

这些都是作为前端工程师的烦心事，后端工程师表示我其实也很难啊，我的麻烦更多。

比如验证，每一个传过来的request parameter都需要看是不是合法的。比如上面的sendFrom这个field，必须得是一种手机的操作系统。前端要是不小心说这个用户评论是从收音机里发出来的我不能接受。每一个field都得验证是不是合法的，一共二十多个field我验证二十多遍。虽然这些验证方法都可以写成library减少重复代码，但是还是很麻烦。

设计API的时候，往往会用这么一种思路，就是每一个endpoint对应一种resource。API既然是读数据写数据的工具，那么我按照数据的种类把API分成几个endpoint。

比如博客文章就是一种resource，我搞一个/v1/news，这个endpoint有POST、PUT、GET、DELETE这么几个操作。然后博客评论是/v1/comments，同样也是上面四种操作。博客评论区可以点赞，这就是/v1/vote。

博客、评论、和点赞，这三者其实有依存关系。你不能没有文章光发评论，你也不能没有评论向空气点赞。所以dependency flow就是：news -> comments -> votes。但是如果光看三个endpoint，你是看不明白这个关系的。

GraphQL的解决方案

GraphQL把上面这些问题都解决了，解决的方法就是定了这么几个规矩：

不需要GET、POST、PUT、DELETE这么多动作，一切简化为读和写
Response不会一次给全部数据，用的时候要什么，服务器就返回什么
PostBody可以加入variable
写API之前先写Schema，一切数据都得定义类型
数据Dependency必须确立好，这样Resource结构一目了然

具体GraphQL如何写，我就不重复了，官网的教程很不错

GraphQL: A query language for APIs.graphql.org

我把我的学习笔记粘贴一下：

Query & Mutation

Query and mutation are the two pillar of GraphQL
GraphQL always return json
Query error still use 200
Fragment and variable can avoid query manipulation significantly.
Field selection improve performance
Typing system help API server do type check
While query fields are executed in parallel, mutation fields run in series, one after the other. (This is to prevent race condition)
You can use "__typename" to get this meta field

Schema & Type

GraphQL schema language defines type
! Means non-nullable
Root schema

schema { 
  query: Query 
  mutation: Mutation 
}

Note that Query and Mutation are also objects. (Think GraphQL as object-oriented-programming)

GraphQL is basically a tree, and scalar types are leave nodes
GraphQL comes with a set of default scalar types out of the box:
- Int: A signed 32‐bit integer.
- Float: A signed double-precision floating-point value.
- String: A UTF‐8 character sequence.
- Boolean: true or false.
- ID: The ID scalar type represents a unique identifier, often used to refetch an object or as the key for a cache. The ID type is serialized in the same way as a String; however, defining it as an ID signifies that it is not intended to be human‐readable.
There is also a way to specify custom scalar types. For example, we could define a Date type:

scalar Date

Define enum like this

enum Episode { 
  NEWHOPE 
  EMPIRE 
  JEDI 
}

An Interface is an abstract type that includes a certain set of fields that a type must include (OOP again)
Union types basically are OR logic
Note that members of a union type need to be concrete object types; you can't create a union type out of interfaces or other unions.
Input types look exactly the same as regular object types, but with the keyword input instead of type:

input ReviewInput { 
  stars: Int! 
  commentary: String 
}

You can't mix input and output types in your schema

Validation

I think when they invented GraphQL, they want to call it TreeQL, but then realized that object relationship can be pointing backward, like a graph. So TreeQL becomes GraphQL.
A fragment cannot refer to itself or create a cycle, as this could result in an unbounded result!

{ 
  hero { 
    ...NameAndAppearances 
    friends { 
      ...NameAndAppearances 
      friends { 
        ...NameAndAppearances 
      } 
    } 
  } 
} 

fragment NameAndAppearances on Character { 
  name 
  appearsIn 
}

This is good

fragment NameAndAppearancesAndFriends on Character { 
  name 
  appearsIn 
  friends { 
    ...NameAndAppearancesAndFriends 
  } 
}

This is bad

Rules of querying
- Don't query non-existing field
- Don't query object without scalar field (i.e. Always end with leave node)

Execution

Each field is a function of the previous type which returns the next type (think OOP). This function is called "resolver"
Here is an example of resolver

Query: { 
  human(obj, args, context, info) { 
    return context.db.loadHumanByID(args.id).then( 
      userData => new Human(userData) 
    ) 
  } 
}

Resolver runs asynchronously. It returns a promise. This improves latency! (Especially when returning a list)

Best practice

API design
- Good: Use single endpoint to serve all query/mutation
- Bad: Use multiple endpoint to serve different resource. (You can still do it, but it will be difficult to use GraphiQL reference tool https://github.com/graphql/graphiql)
Good: Use json and and `Accept-Encoding: gzip`
Avoid versioning (personally I don't think it is avoidable)
Every field is nullable by default. Remember that when designing applications
GraphQL usually fetch field individually, but you can optimize performance by batching and caching
Found a visualization tool: https://github.com/APIs-guru/graphql-voyager
Only business layer should do business logic

Database has its own design logic, and client also have a logic of using the API. Better design API around client usage instead of database design.
In the pipeline of API middleware, place graphql after authentication
GraphQL only use GET and POST. For GET, accept `?query=...&variables=...&operationName=...`
Accept header `Content-Type:application/json`
Response should be like

{ 
  "data": { ... }, 
  "errors": [ ... ] 
}

If there were no errors returned, the "errors" field should not be present on the response. If no data is returned, according to the GraphQL spec, the "data" field should only be included if the error occurred during execution.

Please disable GraphiQL for production
Don't do business layer authorization in GraphQL (or any API) layer. Do it in business layer.
For pagination, cursor-based pagination is the most powerful. Better use a base64 encoded form (so that the format is not intuitive thus no one will rely upon it)
This is a good example of pagination (github graphQL)

Request:

query myRepositories { 
  viewer { 
    login 
    repositories(first: 2, orderBy: {field: UPDATED_AT, direction: DESC}) { 
      edges { 
        cursor 
        node { 
          name 
        } 
      } 
      pageInfo { 
        endCursor 
        hasNextPage 
      } 
    } 
  } 
}

Response:

{ 
    "data": { 
        "viewer": { 
            "login": "quzhi1", 
            "repositories": { 
                "edges": [ 
                    { 
                        "cursor": "Y3Vyc29yOnYyOpK5MjAyMC0wMi0wMlQxNTozNzowMS0wODowMM4K7LIe", 
                        "node": { 
                            "name": "InterativeHistoryMapSinatra" 
                        } 
                    }, 
                    { 
                        "cursor": "Y3Vyc29yOnYyOpK5MjAyMC0wMS0yMFQxNDozOTo0Mi0wODowMM4N6ph9", 
                        "node": { 
                            "name": "rails_playground" 
                        } 
                    } 
                ], 
                "pageInfo": { 
                    "endCursor": "Y3Vyc29yOnYyOpK5MjAyMC0wMS0yMFQxNDozOTo0Mi0wODowMM4N6ph9", 
                    "hasNextPage": true 
                } 
            } 
        } 
    } 
}

Note 1: If you base64 decode the cursor, it has format like this `cursor:v2:2020-02-02T15:37:01-08:00`

Note 2: Cursor is on edge, not on node. "Edge" and "node" concept will let people know cursor is not a property of object, but the connection.

Note 3: PageInfo will tell you whether we reached the end of the list.

Note 4: With pageInfo, you don't even need the cursor field on each edge.

Global object identification

Every object should have an id. In this way you can fetch any object with this query:

query retrieveNodeById { 
  node(id: "MDQ6VXNlcjU1NzI1MzU=") { 
    id 
    ... on User { 
        login 
        bio 
        company 
        myEmail: email 
    } 
  } 
}

And the response is:

{ 
    "data": { 
        "node": { 
            "id": "MDQ6VXNlcjU1NzI1MzU=", 
            "login": "quzhi1", 
            "bio": "", 
            "company": "Stripe", 
            "myEmail": "[email protected]" 
        } 
    } 
}

Note 1

The object id in Github is actually a base64 encoding.

User id "MDQ6VXNlcjU1NzI1MzU=" => "04:User5572535"
Repository id "MDEwOlJlcG9zaXRvcnkzODE3NjgwMg==" => "010:Repository38176802"
Issue id "MDU6SXNzdWU1Njg2MDM4MTY=" => "05:Issue568603816"

So my guess is, their global id is <database_shard>:<object_type><database_key>

Note 2

If the id is the same, it must be the same object.

Note 3:

This design is good, because when you fetch N items, you will get exactly N result. If one of the N items turns out to be null, then you will get null in place. You will love this when doing batch! (Think about Yahoo's sherpa batch api as a lesson)

Note 4

Global object id can be used in caching too. In the past, people will use cache key like `<url>:<response>`. Now, people can do `<object_id>:<object>`.