Summary of solutions for big data and high concurrency

1. Mass Data Solutions

1. Use cache:
  How to use: 1. Use the program to save directly to the memory. Mainly use Map, especially ConcurrentHashMap.
2. Use a caching framework. Commonly used frameworks: Ehcache, Memcache, Redis, etc.
  The most critical question is: when to create the cache, and its invalidation mechanism.
For the buffering of empty data: it is better to save it with a specific type value to distinguish between the two states of empty data and uncached.
 
2. Database optimization:
  1. Optimize the table structure.
  2, SQL statement optimization, syntax optimization and processing logic optimization. The execution time of each statement can be recorded for targeted analysis.
  3. Partition
  4. Sub-table
  5. Index optimization
  6. Use stored procedures instead of direct operations
3. Separate active data
  For example, users can be divided into active users and inactive users.
4. Batch read and delayed modification
  High concurrency situations can combine multiple query requests into one.
  High concurrency and frequently modified can be temporarily stored in the cache.
5. Separation of read and write
  In the figure above, multiple database servers are configured, and master-slave databases are configured. The master database is used for writing and the slave database is used for reading.
6. Distributed database
  Store different tables in different databases, and then put them in different servers. Some complex problems, such as: transaction processing, multi-table query.
7. NoSql and Hadoop
  NoSql, not only SQL. There are not as many restrictions as relational databases, and it is more flexible and efficient.
  Hadoop, the data in a table is layered into multiple blocks and saved to multiple nodes (distributed). Each piece of data is stored by multiple nodes (cluster). The cluster can process the same data in parallel and also guarantee the integrity of the data.


2. High concurrency solution

1. Application and static resources are separated.
  Put static resources (js, css, images, etc.) on a dedicated server.
2. Page caching
  Caching the pages generated by the application can save a lot of CPU resources.
  For some pages that frequently change data, you can use ajax to handle it.
3. Clustering and Distributing
  In a cluster, multiple servers have the same function and mainly play the role of shunting.
  Distributed, different services are placed on different servers, and multiple servers may be required to process a request, thereby improving the processing speed of a request.
  又分为静态资源集群和应用程序集群。后者较复杂,经常要考虑session同步等问题。
4.反向代理
  客户端直接访问的服务器并不是直接提供服务的服务器,它从别的服务器获取资源,然后将结果返回给用户。
  代理服务器和反向代理服务器:
  代理服务器是代我们访获取资源,然后将结果返回。例如,访问外网的代理服务器。反向代理服务器是我们正常访问一台服务器的时候,服务器自己调用了别的服务器。
  代理服务器我们主动使用,是为我们服务的,不需要有自己的域名;反向代理是服务器自己使用的,我们并不知道,有自己的域名。
5,CDN
  CDN是一种特殊的集群页面缓冲服务器,和普通的集群的多台页面缓冲服务器相比主要区别是:其存放位置和分配请求方式不同。
  CDN的服务器分布在全国各地,接收到请求后会将请求分配到最合适的CDN服务器节点来获取数据。其每一个CDN节点就是一个页面缓存服务器。
  分配方式:并不是普通的负载均衡,而是专门的CDN域名解析服务器在解析域名的时候就分配好的,一般的做饭是:ISP那里使用CNAME将域名解析到一个特定的域名,然后再将解析到的那个域名用专门的CDN服务器解析(返回给浏览器,再访问)到相应的CDN节点。每个节点可能也集群了多台服务器。
 
 小结:
  
  
  少你可以知道处理高并发的业务逻辑是:
  • 前端:异步请求+资源静态化+cdn
  • 后端:请求队列+轮询分发+负载均衡+共享缓存
  • 数据层:redis缓存+数据分表+写队列
  • 存储:raid阵列+热备
  • 网络:dns轮询+DDOS攻击防护
  
  网站架构的整个演变主要围绕大数据和高并发而展开。解决的方案主要是使用缓存和多资源两种类型。多资源:多存储,多CPU,多网络。可以单个资源处理一个请求,也可以多个。
  使用复杂框架之前一定要将项目的业务优化好,基础中的基础,重中之重!
 
  架构和协议并不是神圣不可侵犯的东西。






Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325418995&siteId=291194637