Design and implementation of an IM message system

I. Glossary

  • Unicast: single server sends a message to the client user
  • Multicast: The server sends a message to multiple clients users
  • Multicast / Broadcast: The server sends a message to a group of clients. Group ID to identify the set of users
  • Uplink message: The server sends a message to a group of clients. Group ID to identify the set of users
  • Downlink message: the server sends a message to the client

Second, the system architecture

  • proxy: deployed at the edge of the room, through the intelligent client dns nearest access
  • logicService: processing authentication, heart, upper and lower, out of the group
  • pushService: After unicast, broadcast, the received message is forwarded to the comet, then the message is then sent to the comet
  • imService: chat server, chat processing unit group chat, offline messages
  • cosumerService: to group messages asynchronously write proliferation
  • authService: Certification Services

data structure

cacheService maintain a global online users, is a two-stage the Map user_id -> conn_id -> server_id.

  • user_id is specified in the business, uniquely identifies a user
  • conn_id allocated by the memory process, uniquely identifies the user of a connection
  • Access process which identifies this connection belong server_id

Access proxy process to protect their online users,user_id+conn_id -> Connection

  • Connection client connection is encapsulated, it can push the message

Maintain the connection proxy access process information room,room_id -> ConnectionList

Third, the message model

3.1 Reading diffusion model

  • Pros: write once, reducing the number of writes, especially under the group mode
  • Disadvantages: the synchronization logic would be more complex message, the receiving end must be read once each session, the amplified read, will produce a lot invalidation request.

3.2 Write diffusion model

  • Advantages: simple logic pull message
  • Disadvantages: write amplification, single chat to additional write twice, the group to write N times

Fourth, implementation

4.1 Single chat

4.1.1 Design goals

4.1.2 Online News

4.1.3 Offline Message

4.1.4 message loss detection

4.2 Group Chat

4.2.1 Design goals

4.2.2 small group (write proliferation)

4.2.3 large group (read-proliferation)

Fifth, high-performance analytics

Bottleneck CPU> Bandwidth> Memory

5.1 Capacity Planning :( Ali cloud host 16C32G-2.5GHz, set aside 50% margin)

  • 10,000 conn per proxy
  • 100 proxy
  • 50 logicService/cacheService/pushService
  • or improvements:
    • 10 logicService
    • 5 pushService
    • kafka cluster
    • zookeeper cluster
    • 10 cacheService

5.2 no internal communication bottleneck path may have horizontal expansion:

  • Client-initiated RPC mobile -> proxy -> micro
  • Online / offline / switching rooms / heartbeat mobile -> proxy -> logicService -> cacheService
  • 单播 micro -> logicService (-> cacheService) -> pushService -> proxy -> mobile
  • Online information search
    • Check online check room by user / session

5.3 intercom, there may be a bottleneck path:

  • Unicast bulk micro -> logicService ((N-parallel) -> router) -> pushService -> proxy -> mobile
    • Limit: total number of users, not too much
  • 广播 micro -> logicService -> pushService -> proxy -> mobile
    • Restrictions: Due to pushService regularly absorb room list on the proxy, so not too much quantity pushService
    • Improvement: logicService pushService and decoupling, connected kafka. Since pushService CPU consumption minimal proxy / logicService / cacheService in only very few examples pushService line.
  • Online information search
    • The total number of online check / count due logicService regularly absorb room users on cacheService, can only be limited logicService open counter to check the timing
    • Press room check user / room with / count
    • Traversal / list debug interface for other services

5.4 proxy performance bottlenecks

5.5 rpc performance bottlenecks

Sixth, high availability analysis

7-24 hours to provide uninterrupted service to users. Iterative development, requirements to upgrade the internal modules and business services, expansion of free user perception.

  • stateless proxy service, restart, upgrade, client detects disconnection, automatically reconnect to another proxy
  • logicService stateless service, reboot, upgrade, proxy will automatically search for the next logic
  • pushService stateless service, reboot, upgrade, there are other external services pushService
  • cacheService stateful service, restart, upgrade, prepared by the top cacheService; upgrade is complete, back to the primary cacheService
  • imService stateless service, reboot, upgrade, there are other external services pushService
  • mysql: use mysql master master mechanism to ensure
  • redis: the use of mechanisms to ensure the availability of Sentinel

Seven, exception handling

  • How to prevent message loss (maximum message id reported receiving end has received abnormal server retransmission)
  • redis caused by switching from the primary self-energizing discontinuous id
  • How to improve the performance proxy broadcasting
  • Rpc how to avoid a bottleneck of a single connection

Eight, low cost, safety

  • Almost no external dependencies, low operation and maintenance costs
  • Code to achieve high performance, cost saving server
  • Integrated certification authentication, but also supports HTTPS

(Finish)

Guess you like

Origin juejin.im/post/5e12e80a5188253a821082ba