Thoroughly understand session, cookie, token and session solutions in load balancing clusters

Be a qualified porter and sort out other people's notes with some information, and find that the summary is quite correct. Save your own catalog before reprinting, and I am afraid that one day it will be reported as 404!

In fact, I have looked at a lot of solutions and found that most of the implementations that do work are to achieve distributed sharing of sessions through caching. The former author summarized the session sharing scheme based on pornography and python

In addition, several java implementations of distributed sharing of sessions will be added.

 

What exactly are session, cookie and token

Brief description

I read a lot of articles about session and cookie before I wrote it. Some people said that there was a cookie first, and then a session. Some people say that there is a session first, and then a cookie. I don't feel that I have said it very clearly, and talk about it in general. I hope this article is helpful to everyone.
Note: This article requires readers to have basic knowledge of cookie, session, and token.

http is a stateless protocol

What is stateless? That is to say, this request has nothing to do with the previous request, does not know each other, and is not related. The benefit of this statelessness is speed. The downside is that if we want to www.zhihu.com/login.htmland www.zhihu.com/index.htmlassociate, you must use some means and tools

cookie和session

Due to the stateless nature of http, in order to enable all web pages under a certain domain name to share certain data, sessions and cookies appear. The process for the client to access the server is as follows

  • First, the client will send an http request to the server.
  • After the server accepts the client request, it establishes a session and sends an http response to the client. This response header contains the Set-Cookie header. This header contains the sessionId. The format of Set-Cookie is as follows, please refer to Cookie Detailed Explanation for details
    Set-Cookie: value[; expires=date][; domain=domain][; path=path][; secure]
  • For the second request initiated by the client, if the server gives set-Cookie, the browser will automatically add the cookie to the request header
  • The server receives the request, decomposes the cookie, verifies the information, and returns a response to the client after the verification is successful

Request process

note

  • Cookie is just one of the ways to realize session. Although it is the most commonly used, it is not the only method. There are other ways to store after disabling cookies, such as putting in the url
  • Nowadays, most of them are Session + Cookie, but only use session without cookie, or only use cookie without session, in theory, the session state can be maintained. However, in practice, for many reasons, it is generally not used alone
  • With session, you only need to save an id on the client side. In fact, a large amount of data is saved on the server side. If all cookies are used, the client does not have that much space when the amount of data is large.
  • If you only use cookies and not sessions, then all account information is stored on the client. Once hijacked, all information will be leaked. And the amount of client data becomes larger, the amount of data transmitted over the network will also become larger

summary

In short, a session is like a user information file table, which contains the user's authentication information and login status and other information. And the cookie is the user pass

token

Token is also called a token.
The authentication method of token by uid+time+sign[+fixed parameter] is similar to a temporary certificate signature , and it is a stateless authentication method on the server side, which is very suitable for REST API scenarios. Stateless means that the server does not save data related to identity authentication.

composition

  • uid: the unique identity of the user
  • time: the timestamp of the current time
  • sign: Signature, compressed into a fixed-length hexadecimal string using hash/encrypt to prevent malicious splicing by a third party
  • Fixed parameters (optional): Some commonly used fixed parameters are added to the token to avoid repeated database searches

Store

The token is generally stored in localStorage, cookie, or sessionStorage on the client. Generally stored in the database on the server

token authentication process

The authentication process of token is very similar to cookie

  • After the user logs in, the server returns the Token to the client after success.
  • The client receives the data and saves it on the client
  • The client accesses the server again and puts the token in the headers
  • The server side uses filter to verify. If the verification is successful, the requested data will be returned, and if the verification fails, an error code will be returned.

Token can resist csrf, cookie+session cannot

If the user is logging in to the bank webpage, the webpage is not protected against csrf attacks. The attacker can inject a picture, the picture src is http://www.bank.com/api/transfer?count=1000&to=Tom. In the case of session+cookie, the user has already transferred 1,000 yuan to Tom when he opens the web page. Because once the session is established, the current domain page and all pages below the page path share cookies. At the moment of the img request, the cookie will be automatically added to the request header by the browser. But the token is different. Developers manually add the token to the request every time they initiate a request. That is, when opening the page to request img, there is no token in the request header

Session and token in distributed case

We already know that sessions are stateful at all times, and are generally stored in the server's memory or hard disk. When the server is distributed or clustered, the session will face load balancing problems.

  • In the case of load balancing with multiple servers, it is difficult to confirm whether the current user is logged in, because multiple servers do not share sessions. This problem can also be solved by storing the session in a server, but the effect of load balancing cannot be fully achieved.

The token is stateless, and all user information is stored in the token string

  • The client logs in and transfers information to the server. After receiving it, the server encrypts the user information (token) and sends it to the client. The client stores the token in a container such as localStroage. The client passes the token every time it visits, and the server decrypts the token to know who the user is. Through cpu encryption and decryption, the server does not need to store the session to occupy storage space, which solves the problem of load balancing multiple servers. This method is called JWT (Json Web Token)

to sum up

  • The session is stored on the server, which can be understood as a state list with a unique identification symbol sessionId, which is usually stored in a cookie. The server parses out the sessionId after receiving the cookie, and then searches the session list to find the corresponding session. Rely on cookies
  • A cookie is similar to a token, equipped with a sessionId, stored on the client, and the browser usually adds it automatically.
  • A token is also similar to a token. It is stateless. User information is encrypted into the token. After receiving the token, the server decrypts it to know which user it is. Need to be manually added by the developer.
  • jwt is just a cross-domain authentication scheme

 

 

Session solution in load balancing cluster

Preface

After we use load balancing for a Web site, an important issue we must face is the way to deal with Sessions, whether it is PHP, Python, Ruby or Java, as long as the server is used to save the Session, we need to consider the Session issue when doing load balancing.

 

Share directory:

  1. Where is the problem? How to deal with it?

  2. Session persistence (Case: Nginx, Haproxy)

  3. Session replication (Case: Tomcat)

  4. Session sharing (Case: Memcached, Redis)

 


Where is the problem?

To explain from the user side, when a user accesses the backend server A by the load balancing agent for the first time and logs in, the user's login information is retained on server A; when the user sends a request again, it may be affected by the load balancing policy. Proxy to a different back-end server, such as server B. Since this server B does not have the user's login information, the user needs to log in again. This is unbearable for users. Therefore, when implementing load balancing, we must consider Session issues.

In load balancing, for Session processing, we generally have the following methods:

    • Session keep

    • Session replication

    • Session sharing

 

1. Session retention


Session retention (session retention) is one of the most common terms we see. Through session retention and load balancing, when request distribution is performed, each client is guaranteed to have fixed access to the same application server on the back end. The session persistence scheme has corresponding implementations in all load balancing. And this is in the load balancing layer to solve the Session problem.

Nginx does load balancing session retention

For Nginx, you can choose the method of session retention to implement load balancing. The upstream of nginx currently supports 5 ways of distribution, of which there are two more common Session solutions, ip_hash and url_hash. Note: The latter is not an official module and requires additional installation.

ip_hash()

Each request is allocated according to the hash result of the access ip, so that each visitor has a fixed access to a back-end server, achieving the method of session retention.

Example:

upstream bakend {
   ip_hash;
   server192.168.0.11:80;
   server192.168.0.12:80;
 }

Haproxy does load balancing session retention

    As an excellent reverse proxy and load balancing software, Haproxy also provides a variety of methods for session retention. The two most commonly used are listed below:

Source address Hash ( load balancing is not supported )

Haroxy assigns the user IP to a fixed real server after hash calculation (similar to nginx's ip hash command)

配置指令:balancesource

Use cookies for identification ( obviously this insecure operation is not reliable )

That is, Haproxy inserts a cookie in the user's browser after the user visits for the first time, and the browser will bring this cookie to Haproxy for identification by Haproxy when the user next visits.

配置指令:cookie  SESSION_COOKIE  insert indirect nocache

The configuration example is as follows:

cookie SERVERID insert indirect nocache
server web01 192.168.56.11:8080 check cookie web01
server web02 192.168.56.12:8080 check cookie web02

Disadvantages of session retention:

Session retention seems to solve the session synchronization problem, but it brings some other problems:

  • Unbalanced load: Due to the use of Session retention, it is clear that absolute load balance cannot be guaranteed.

  • The problem is not completely solved: if the back-end server is down, the session of this server is lost, and the user assigned to this service request still needs to log in again.

 


2. Session replication

Since our goal is to maintain the user's session on all servers, can it be sufficient to copy the session information in each application server to other server nodes? This is the second way to deal with Session: session replication.

 Session replication is supported on Tomcat. It is based on IP multicast (multicast) to complete the session replication. Tomcat's session replication is divided into two types:

  • Global session replication: Use Delta Manager to replicate the changed information in the session to all other nodes in the cluster.

  • Non-global replication: Use Backup Manager for replication, it will replicate the Session to a designated backup node.

    However, I am not going to explain the Tomcat configuration of session replication here. If you have any requirements, you can refer to the official Tomcat documentation, mainly because session replication is not suitable for large clusters. According to the author's practical case in production, various problems will occur after the cluster exceeds 6 nodes, and it is not recommended for production use (synchronization may cause delay ) .

 

3. Session sharing


Since session retention and session replication are not perfect, why don't we put the Session in a unified place, so that all nodes in the cluster can access the Session in one place to solve the problem.

    Where is the session stored?

For Session, it is definitely used frequently. Although you can store it in a database, in a real production environment, I recommend storing it in distributed KV data with faster performance, such as Memcached and Redis.

 

PHP settings session sharing

Congratulations if you are using PHP, the configuration is very simple. PHP can store Sessions in Memcached or Redis through two lines of configuration. Of course, you have to configure them in advance. Modify php.ini:

session.save_handler = memcache
session.save_path = "tcp://192.168.56.11:11211"

Use Redis to store Session

session.save_handler = redis
session.save_path ="tcp://localhost:6379"

Reminder: Don't forget to install memcache or redis plugins for PHP.

Tomcat sets up session sharing

We can use MSM (Memcached Session Manager) to also store the Session in Memcache. The Github address is as follows: https://github.com/magro/memcached-session-manager currently supports Tomcat 6.x7.x and 8.x version of.

If you want to use Redis, there is also an open source available, but unfortunately the version of Tomcat 8.x is not supported temporarily: https://github.com/jcoleman/tomcat-redis-session-manager

 

Django settings session sharing

In Django, Session is managed through a middleware. If you want to use Session in your application, you need to add'django.contrib.sessions.middleware.SessionMiddleware' to the MIDDLEWARE_CLASSES variable in settings.py. Django's Session engine can store Session in three places: database, cache, and file.

Use the database to save the Session ( not efficient )

If you want to use the sessions supported by the database, you need to add'django.contrib.sessions' to your INSTALLED_APPS setting. After the configuration is complete, please run manage.py migrate to install a database table that saves session data.

Use cache to keep Session

For a simple cached session:

You can set SESSION_ENGINE to "django.contrib.sessions.backends.cache". At this point, the session data will be stored directly in your cache. However, the cached data may not be durable: if the cache fills up or the cache server restarts, the cached data may be cleaned up.

  To cache data persistently:

You can set SESSION_ENGINE to "django.contrib.sessions.backends.cached_db". Its write operation uses the cache, and each write to the cache will be written to the database again. For a read session, if the data is not in the cache, it is read from the database. The storage of both sessions is very fast, but simple caching is faster because it gives up persistence. In most cases, the cached_db backend is fast enough, but if you need to squeeze the last point of performance and accept the risk of session data loss, then you can use cache instead of cached_db

Use file to save session

Using files to save Sessions is no longer our discussion, because it is difficult to share, PHP also stores Sessions in the /tmp directory by default.

 

 

Cookie, Session, and Token
are the difference between cookie and token verification.
Why does the cookie need session? Is
the design of CSRF Token necessary?
Problems and solutions of cookie, token, and session Session solutions in
load balancing clusters
JWT introduction
Json Web Token introductory tutorial

Guess you like

Origin blog.csdn.net/zw764987243/article/details/115033765