Harbin Institute of Technology Software Architecture and Middleware Assignment 3

"Software Architecture and Middleware" homework 3 report

——Assignment 3: KWIC Web Application for 1 Billion Users Worldwide

Name:  Shi Zhuofan             Student ID:   120L021011            

Table of contents

Distributed Architecture Design Solution - KWIC Web Application for 1 Billion Users Worldwide ... 1

Summary... 1

Architecture Overview ... 2

1. Load Balancer... 2

2. API Gateway... 2

3. Microservices... 2

4. Message queue... 3

5. Caching... 3

6. Elastic computing resources... 3

7. Monitoring and logging... 3

8. Security... 4

Conclusion... 4

Distributed architecture design solution - KWIC Web application for 1 billion users worldwide

Summary

In order to meet the needs of the KWIC problem web application available to 1 billion users worldwide, this paper proposes a distributed architecture design scheme. This solution mainly focuses on the design of the calculation level, and does not involve data management issues. It will ensure that users can access the application from various terminals, and can upload files to be processed through the network, and the files may exceed 10,000 lines. At the same time, the design scheme will also fully consider non-functional requirements such as high performance, high expansion, and high availability.

Architecture overview

The distributed architecture design scheme of the KWIC Web application for 1 billion users worldwide includes the following key components:

  1. load balancer
  2. API Gateway
  3. microservice
  4. message queue
  5. cache
  6. Elastic Computing Resources
  7. Monitoring and Logging
  8. safety

The following sections detail the design and role of each component.

1. Load balancer

In order to achieve high-performance, high-availability services on a global scale, a load balancer will be used to distribute client requests. A load balancer can route requests to the best server based on geographic location, server load, and other factors. This ensures low response latency for users anywhere in the world. In this solution, it is recommended to use a global load balancer (such as Google Cloud's Global Load Balancer or AWS's Route 53) to achieve global load balancing.

2. API Gateway

The API gateway is the interface layer between the client and the server, which handles all client requests and routes them to the corresponding backend services. The API gateway can also be responsible for functions such as authentication, authorization, and current limiting. You can use an open source API gateway (such as Kong or Ambassador) or an API gateway service provided by a cloud service provider (such as AWS API Gateway or Google Cloud API Gateway) to achieve this function.

3. Microservices

In order to achieve high scalability and high availability, a microservice architecture will be adopted. This architecture allows individual services to be scaled and deployed independently as needed. A microservice architecture can include the following key services:

  • File upload service: process files uploaded by users and store files in a distributed file system (such as HDFS, Amazon S3, or Google Cloud Storage).
  • KWIC processing service: responsible for processing KWIC tasks submitted by users. This service can scale automatically based on the size and complexity of tasks.
  • Notification Service: When the KWIC task is completed, the user is notified that the task has been completed and a link to download the result is provided. These microservices can be managed using container orchestration tools such as Kubernetes or Docker Swarm. Containerization can make it easier to deploy, scale and maintain applications.

4. Message queue

In order to achieve high performance and high availability, message queues will be used to achieve decoupling between services. When users upload files and submit KWIC tasks, these tasks will be added to the message queue. The KWIC processing service will then fetch the task from the queue and process it. This design ensures that the system maintains high performance and availability even under high load conditions. This can be achieved using an open source message queue such as RabbitMQ or Apache Kafka, or a message queuing service provided by a cloud service provider such as AWS SQS or Google Cloud Pub/Sub.

5. Cache

To further improve system performance, cache technology can be used to store frequently accessed data. For example, the results of a user's KWIC task can be cached so that they can be returned quickly when the user requests it again. This can be achieved using open source caching technologies such as Redis or Memcached, or caching services provided by cloud service providers such as AWS ElastiCache or Google Cloud Memorystore.

6. Elastic computing resources

In order to achieve high scalability, elastic computing resources will be used to dynamically adjust the computing power of the system. When the load increases, computing resources can be automatically expanded to meet demand; when the load decreases, resources can be reduced to save costs. This can be achieved using elastic computing services provided by cloud service providers such as AWS EC2 Auto Scaling or Google Cloud Compute Engine Autoscaler.

7. Monitoring and logging

In order to ensure the stable operation of the system and to detect potential problems in time, monitoring and logging need to be implemented. This can help understand the system's health, performance metrics, and fault information.

7.1 Monitoring

Various monitoring tools can be used to monitor key performance indicators of the system such as response time, error rate, throughput, resource usage, etc. This can be achieved using open source monitoring tools such as Prometheus or Grafana, or monitoring services provided by cloud service providers such as AWS CloudWatch or Google Cloud Monitoring. By setting appropriate thresholds and alarm rules, you can receive timely notifications and take corresponding measures when key indicators are abnormal.

7.2 Logging

Logging is critical to system maintenance and troubleshooting. It is necessary to record the running logs of key services so that problems can be quickly located and resolved. This can be achieved using open source logging tools such as ELK Stack or Fluentd, or log services provided by cloud service providers such as AWS CloudWatch Logs or Google Cloud Logging. These tools can help collect, store, analyze and retrieve log data.

8. Security

Security is an important factor that must be considered when building distributed web applications. The following are some recommended measures for improving system security:

8.1 Transport Layer Security

HTTPS should be used to encrypt communication between client and server. This prevents man-in-the-middle attacks and eavesdropping. TLS certificates should also be renewed regularly and use the latest security protocols and cipher suites.

8.2 Identity authentication and authorization

Strong authentication and authorization policies should be implemented to ensure that only legitimate users can access system resources. This can be achieved using OAuth 2.0, OpenID Connect or other authentication standards. Additionally, role-based access control (RBAC) policies should be implemented to restrict user access to resources.

8.3 Data protection

To keep user data safe, sensitive data should be encrypted. When storing data, either symmetric encryption or asymmetric encryption can be used. When transferring data, TLS or other transport layer security technology should be used.

8.4 Periodic Security Audits

Security audits should be conducted on a regular basis to check the system for security vulnerabilities and potential risks. This can include code reviews, dependency checks, network security scans

Guess you like

Origin blog.csdn.net/qq_35798433/article/details/130687527