Intelligent visualization system Atom service architecture evolution

Author: Bump Man - Manjiz

What is Atom? Atom is a platform that integrates all kinds of senior e-commerce designers in the industry and provides one-stop professional smart page and small program design services. After 2 years of compact iterations, the project is getting bigger and bigger, the requirements are constantly changing and optimizing, the internal logic is complicated, and the maintenance costs are rising sharply. At the same time, Atom will carry more and more services and provide services to more internal users and merchants. In order to adapt to these changes, structural upgrades became an urgent matter at the time. We will deconstruct the server-side module to make the service lightweight and modular. To more easily expand business scenarios.

The Atom server has gone through three versions of iteration, this article focuses on the analysis of the third version.

Architecture 1.0

This is the oldest version of Atom. In this version, only the function of the channel page is planned. The purpose is to free developers from the complicated channel page development. Because the purpose of the function is pure, the system complexity is low. The server is directly developed using the Koa framework. This is a monolithic service. All the code runs in a process.

In terms of deployment, a very original manual operation is used: the developer logs into the machine, pulls the code, and installs and starts similar to the local environment, and then repeats this process on different machines.

In addition, the old version of Quark uses named components. Named components limit the scalability of Quark itself to a certain extent, and will not be expanded here.

Architecture 2.0

From channel page building platform to multi-scenario page building platform, Atom used less than a year, richer components, more templates, more scenes, more participating designers, more users, products The development is becoming more specialized, and the simple manual operation and maintenance are no longer applicable, so the front end and the server have undergone a major blood exchange. The server is reconstructed with Salak . Salak is a very good server framework, and it has brought us For the automatic generation function of interface documents, both the front end and the server end rely on Talos (a containerized deployment internal platform) for deployment. The server gradually entered the industrial age.

However, at this stage, the extensive development method has not been solved, and the lack of macro planning has increasingly exposed the following problems:

  • Highly focused

    More than 90% of the services are concentrated in a single architecture, the business is more and more complex, the code volume is increasing, the readability, maintainability and scalability of the code are decreasing, the cost of developer access is increasing, and the business is expanding The cost has risen exponentially, making continuous delivery capacity difficult to maintain. With more and more users, the concurrency that the program bears is higher and higher, and the concurrency ability of the application of the monolithic architecture is limited. As the complexity of the system increases, the difficulty of testing becomes more and more difficult.

  • High coupling

    The individual modules in the monomer depend on each other, affect each other, and interfere with each other, resulting in low code reuse. New function development often chooses to rewrite due to fear of hidden eggs in the coupling logic, which is not what we want to see!

  • Logical confusion

    In addition to the logic confusion caused by coupling, Atom, as a platform that grew from zero, itself has accumulated a lot of historical needs, some are no longer used, and some are almost not used. These code logics give developers a great deal of Challenge: Not dare to change the code easily while maintaining the code. In addition, it needs to be backward compatible in the iteration, so that the server has a heavy historical burden.

  • Code redundancy

    Because the framework did not define the specification standards in the early stage, the code verification was strictly followed during the development process, and the logic and constants of the code were repeatedly defined, which also made the project difficult to maintain. The premise is to modify multiple places at the same time.

New architecture goals

According to the advantages and disadvantages of the original architecture, we set the goals of this architecture upgrade:

  • Service modularity
  • Service generalization
  • Plug-in site
  • Plug-in scenario
  • Standards and specifications

Glossary:

  • Site: Decoupling the server from the platform, from the original service as the platform, to providing the same service for multiple platforms that are isolated from each other.

Site

  • Scenarios: Concepts set up in response to different business types. Different scenarios have different management methods and processes.

Overall structure

The overall architecture is divided into 4 parts: Web application layer, interface layer, service layer and data layer, so the split can achieve unified entrance, single point deployment on deployment makes publishing more convenient, and independent deployment reduces the impact on the overall service :

  • Web application layer: including Atom platform and other platform applications
  • Interface layer: provides gateway services, and application layer requests are controlled and forwarded via the gateway
  • Service layer:
    • Service communication: MQ for asynchronous communication and HTTP for RPC communication
    • Business module: core code, dismantling many small module applications
    • Basic service: unified control of users and permissions
    • Service management: improve the stability, robustness and flexibility of services
  • Data layer: core data storage

Overall structure

The gateway serves as the traffic entrance of the entire server, processes all traffic, intercepts illegal requests, parses the login status and transmits it to the downstream, verifies interface permissions and timeout responses, etc., and controls them uniformly, while reducing downstream pressure.

Gateway

Implement

Plan / preparation / evaluation

Before officially entering into upgrade development, the team discussed the necessity and feasibility of the architecture upgrade through a meeting, and the direct reason for prompting us to upgrade is the new site requirements and scenario requirements of the platform. Add more coupling logic to the original chaotic logic, and the indirect reason, that is, the necessity of upgrading, is to make the system modular, standardized, and generalized, make the system logic clearer, and improve the maintainability of the entire system .

After our repeated discussion, the original system was divided according to functions, and then based on the functions, it was further divided according to versatility, supporting work of new architecture was added, the workload and estimated time of these work were evaluated, and finally the tasks were carried out. Distribution is released.

Implement

Modular

Why modular? As the platform grows bigger and bigger, we want to make the functions of each part more independent, clear and clear, minimize the impact between each part, operate and maintain each part separately, and avoid the situation of pulling the whole body in one stroke .

This upgrade divides the project into 10+ modules according to function and versatility: such as a module dedicated to compilation, a module dedicated to template management, a module responsible for scheduled tasks, a gateway for entry, etc.

Among them, several general services are split out. The general service is a service independent of the Atom system and can provide services for Atom and other systems.

Module division

The most troublesome thing about disassembling the project is to cut off the associated logic. The stripping and repair of the module will inevitably cause a problem-the same code appears repeatedly in different modules. In order to solve this problem, we put some of these codes into the tool npm package. These codes include: constants, TypeScript type definitions, permission mapping, Mongoose Schema definitions, Salak plug-ins and tool methods, etc.

Another question is that in the original architecture, modules can be directly called by code. How to "restore" this function in the new architecture? In order to ensure the degree of decoupling, only a few functions in the new architecture that need to be called immediately are called directly through the interface between the modules, and the rest are communicated with the database through the MQ message queue.

For MQ communication, here is an example: compile. Server-side compilation usually takes a long time, and long-term occupation of the connection will affect the performance of the service, and the compilation result does not require synchronous response. For the compilation module, if the visitor does not refuse, there is no small pressure on the service So we decided to use the message queue to complete the communication between the various modules:

  1. The project module directly calls the release module through the interface to initiate the release operation;
  2. The publishing module pushes a message "I want to compile" to the message pool;
  3. After the compilation module receives the message, it judges whether it can enter the compilation by its own situation, otherwise it will not respond first;
  4. Each state of compilation is also pushed by message;
  5. Finally, the project module does various processing after receiving the compilation status message.

Compile message

Generalization

As mentioned earlier, in the modularization work, we dismantled four general service modules. The general service is independent of the Atom system and can provide services for Atom and other systems. The generalization of the module is based on two considerations:

  1. Enrich the services of the department and reduce duplication of development functions
  2. Eliminate Atom non-core code and make the system slim

An accompanying question is worthy of our consideration, how to consider whether a function is worthy of generalization? We should try our best to avoid falling into a misunderstanding: system modularization is to disassemble the system as fine as possible. If the split is too fine, the operation and maintenance workload will inevitably increase. When splitting modules, we consider whether the functions in a module are complete and independent, and the demand of the department or company for this common service, and truly achieve low coupling and high cohesion .

standardization

At the code level, the following is a simple comparison:

Contrast Old architecture New architecture
Main language JavaScript TypeScript
Code detection Non-compliance essential
Interface name Variety Unified form
Interface output Blooming flowers Unified form

TypeScript is good. Front-end people know that it brings us auto-completion and optional type system, so that we can use more new JavaScript features, etc. For more, please refer to " Why Choose TypeScript ". What are the reasons for the following three points? The old architecture experienced a process from zero to one. The project lacked initial planning and did not have enough time to correct the system in the middle and late stages. The dual role of time and demand changes led to code siltation.

To this end, we have emphasized the standardization of the code in the development of the new architecture. Each submission must undergo code inspection, and then the unified interface of the variety:

  • Unified interface path: the old architecture, the path may be a list of interfaces /xxx/list, it may be /xxx/xxxes, and so on, we RESTful API rules-based resource path of nouns and semantics of the HTTP protocol defined in the new unified architecture interface;
  • Parameter name uniform: a list of the parameters such as the number of page might be called pageSizemay be called count, so we put it into a unified name, required to comply with this convention in development;
  • Unified output: Output data to the data processing front-end pre-screening, including culling _idand __vso on unrelated data, in the form of output has done a unified, all output requirements are replaced _id appear in id's name, etc.

Behavior

The benefit of code standardization is to make the code easier to maintain, developers can quickly locate the corresponding interface code, and for the front end, it reduces the identification memory of the interface.

Plug-in site

As mentioned earlier, the direct reason for this architecture upgrade is site requirements and scenario requirements. If the site requirements are iterated under the old architecture, it will only further increase the degree of coupling. For this reason, we have added a site management module, added site fields to almost all data items, and brought site parameters to almost all database queries. Through these efforts, now the new site only needs to add a new site through the site module, and then do some initial configuration to complete.

In addition to the higher requirements for Atom functions, the site concept also poses new challenges to the original permission system. In the pre-upgrade version, the user's permissions are only one set. To achieve different permissions for each site, there are only two perspectives:

  1. Permission meaning split (provide a separate set of permissions for each site)
  2. User permissions add a layer of abstraction (user permissions change to multiple collections to switch according to the site)

After comparing the two modifications, the meaning of the split permissions is easier to understand and the code has not changed much. However, it greatly increases the difficulty of maintaining the permission table, which is equivalent to adding a set of permissions for new scenarios, which cannot be pluggable. Finally, at the gateway layer
, the logic for switching the permission set according to the user's access to the site is added .

Plug-in scenario

The scene is a latitude below the site. There are several major scenes of existing activities, channels, psychological tests, SNS, and shops. If a new scene is added under the old architecture, it needs to be scheduled for development. Different scenes if-else. In order to expand and maintain the scene more conveniently and worry-free, we disassembled the scene-related code from the perspective of resource management.

ATOM resources at each scene has mainly 模板/项目/标签/权限four categories:

标签       页面
 |         |
模板------>项目

权限     

Structure introduces project directory module, the module project codes based policy model organization, business logic for each scene split into separate files, called directly by a scheduler logic to avoid doping the different scenarios.

  • The scheduler file is named base_资源_service
  • The scene strategy file is named 场景小写_资源_service
  • The general policy file is named common_资源_service

When a user comes in, the scheduler directly calls the method in the corresponding strategy file according to the query condition ( generally, it is not allowed to directly call the strategy of the specified scene unless it is confirmed that it will not be related to the data of other scenes ). the next strategy will call the default common_servicelogic, so each scene needs to inherit common_service. To page management services, for example, the scheduler for the src/service/pagedirectory base_page_service, common logic common_page_service, logic channel page scene ch_page_service.

For the unity of abstract public methods of scenario, service method commonly used CRUD interfaces placed in the AbstractServiceClassfile

├── src
│   ├── service
│   │   └── {resource}
│   │       ├── base_{resource}_service 策略文件调用器,controller/mq 直接调用
│   │       ├── common_{resource}_service 通用策略文件,例如列表查询共用的参数处理
│   │       └── {scene}_{resource}_service 场景策略文件,场景特殊的

deploy

data migration

In view of the drastic changes in this upgrade, we must be careful when switching between the new and old versions. In addition to the large number of joint adjustments made by the front end and the server for this purpose, we also carried out compatibility migration on the data. Do multiple processing according to the needs of the new architecture, and then write it into the new database.

Uninterrupted deployment

In a monolithic architecture, each service release deployment will cause a few minutes of empty windows.

Before uninterrupted deployment

To avoid this situation, in the production environment, we ensure that each module has at least two containers. During deployment, some of the containers are removed from the load balancing, and then the container is cyclically detected whether there is traffic, and it is not updated until no traffic comes in. After the operation is started, the service is added to the load balancer again, and then the same operation is performed on the remaining containers. The advantage of this is that the entire deployment process is guaranteed, and the service is uninterrupted, avoiding gaps in the deployment process.

Uninterrupted deployment

O & M

In order to avoid repeating the poor operation and maintenance experience and project code management under the old architecture, we organized an operation and maintenance document for the new architecture, including details of rapid access, development, debugging, and deployment.

O & M Document Directory

Added monitoring to the system to monitor the performance and availability of each interface.

Method performance monitoring

effect

After this upgrade, the planned results were basically achieved:

  • Clarity: logical combing, removal of redundancy, TS reconstruction, ESNext
  • Modularity: Decouple 10+ modules, operate independently; multiple communication methods such as HTTP, MQ, data layer, etc.
  • Standardization: strong code specification; unified interface; unified response
  • Generalization: 4+ universal modules, platform-independent; extract public libraries, configurations, plug-ins, middleware, etc.
  • Easy migration: one-key initialization; one-key, single point, independent deployment; unified entrance
  • Easy to expand: + add site expansion capabilities; adjust scene expansion; save labor time costs by 95% +
  • Easy to maintain: add logs; one-click deployment; uninterrupted deployment
  • Easy docking: complete Joi documentation; detailed interface change records; as much upward compatibility as possible

Tools / Methods / Collaboration

Tools have a very important impact on the smooth progress of the project, so in this upgrade, we tried a variety of tools.

In order to ensure that project members have a clear understanding of their own responsible modules and a clear picture of the transformation of the modules, the team introduced flowchart tools to sort out the modules of the old architecture and divide the work, sort out the logic inside the modules of the new architecture, and so on.

Carded internal logic diagram

In terms of scheduling, we used the Gantt chart in practice. The Gantt chart was used to split the tasks according to the module, and then assigned to the corresponding person in charge and set the scheduled time. The overall progress was synchronized every day. From the Gantt chart, you can A clear understanding of the project's resource allocation and scheduling, as well as the comparison between the project plan and the actual situation, helps to control the overall progress of the project.

Gantt chart

Gantt chart initially divides the task of project upgrade. For a more detailed division, we put IssueBoard . IssueBoard is like a simplified version of the task board, but it is more than enough for us. In addition, choose it The reason also includes: it supports linkage with git commits, suitable for developers to use, and can close the corresponding issue with each commit.

IssueBoard

Summary reflection

In this upgrade process, some shortcomings were also exposed, mainly reflected in the scheduling and expectations and in the early communication.

  • Schedules and expectations

    The schedule at the beginning of the upgrade plan was too optimistic, and no further amendments were made during the upgrade process. Of course, this was caused by objective reasons. The team must complete the upgrade within the limited demand window period to avoid maintaining two versions at the same time, which leads The consequence is that the team must spend more time each day than planned.

  • communication

    When the server was upgraded, there was no specific details communicated with the front end, and this upgrade is not completely backward compatible, so the front end caused some trouble and inconvenience during the joint debugging.

reference

Original address: https://aotu.io/notes/2020/04/21/atom-services-upgrade/


Welcome to the Bump Lab blog: aotu.io

Or follow the AOTULabs public account (AOTULabs), and push articles from time to time:

Welcome to the public number of Bump Laboratory

Guess you like

Origin www.cnblogs.com/o2team/p/12753295.html