JAVA WEB high concurrency solution

A small website can be realized by using the simplest html static page, with some pictures to achieve beautification effect, all pages are stored in a directory, such a website has very simple requirements on system architecture and performance. With the continuous enrichment of Internet business, website-related technologies have been subdivided into very fine aspects after years of development. Especially for large-scale websites, the technologies used are very extensive, ranging from hardware to software, Programming languages, databases , WebServer, firewalls and other fields have high requirements, which are no longer comparable to the original simple html static website.

  Large-scale websites, such as portal websites, face a large number of user visits and high concurrent requests, and the basic solutions focus on the following links: using high-performance servers, high-performance databases, high-efficiency programming languages, and high-efficiency programming languages. Performance web container. These solutions to a certain extent mean greater investment.

 

1. HTML static

  In fact, we all know that pure static html pages are the most efficient and least expensive, so we try our best to use static pages to implement the pages on our website. This simplest method is actually the most effective method. However, for websites with a large amount of content and frequent updates, we cannot manually implement all of them one by one, so our common information publishing system CMS appears, such as the news channels of various portal sites we often visit, and even their other channels. It is managed and implemented by the information release system. The information release system can realize the simplest information entry and automatically generate static pages. It can also have functions such as channel management, authority management, and automatic crawling. For a large website, it has a set of efficient , a manageable CMS is essential.

  In addition to portal and information publishing type websites, for community type websites with high requirements for interactivity, as static as possible is also a necessary means to improve performance. Posts and articles in the community are made static and updated in real time. Re-staticization is also a strategy that is widely used at times. Mop’s hodgepodge uses such a strategy, as does the NetEase community.

  At the same time, html staticization is also a means used by some caching strategies. For applications that frequently use database queries in the system but the content update is small, you can consider using html staticization to achieve it. For example, the public setting information of the forum in the forum, the current mainstream forums can be managed in the background and stored in the database. In fact, a large number of these information are called by the foreground program, but the update frequency is very small, you can consider updating this part of the content in the background When it is static, it avoids a large number of database access requests.

2. Image server separation

  As we all know, for web servers, whether it is Apache, IIS or other containers, pictures are the most resource-intensive, so we need to separate pictures from pages. This is basically a strategy that large websites will use. They have Independent or even multiple image servers. Such an architecture can reduce the pressure on the server system that provides page access requests, and can ensure that the system will not crash due to image problems.

  Different configuration optimizations can be performed on the application server and image server. For example, when apache configures ContentType, it can support as few LoadModules as possible to ensure higher system consumption and execution efficiency.

3. Database cluster, database table hash

  Large websites have complex applications, and these applications must use databases. When faced with a large number of accesses, the bottleneck of the database will soon appear. At this time, a database will soon be unable to meet the application, so we need to use the database Cluster or library table hash.

  In terms of database clustering, many databases have their own solutions. Oracle , Sybase, etc. have good solutions. The commonly used Master/Slave provided by MySQL is also a similar solution. What kind of DB you use, please refer to the corresponding solution. solution to implement.

  The database cluster mentioned above is limited by the type of DB used in terms of architecture, cost, and scalability. Therefore, we need to consider improving the system architecture from the perspective of the application. Library table hashing is the most commonly used and most effective solution. .

  We install business and application or function modules in the application to separate the database, different modules correspond to different databases or tables, and then perform a smaller database hash for a page or function according to a certain strategy, such as user table, Hash the table according to the user ID, which can improve the performance of the system at low cost and has good scalability.

  Sohu's forum adopts such a structure, which separates the forum's users, settings, posts and other information from the database, and then hashes the database and tables for posts and users according to the section and ID, and finally can be simply configured in the configuration file. A low-cost database can be added to the system at any time to supplement system performance.

4. Cache

  The term cache has been touched by technology, and cache is used in many places. Caching in website architecture and website development is also very important. Here are the most basic two types of caches. Advanced and distributed caching is described later.

  For caching in terms of architecture, those who are familiar with Apache can know that Apache provides its own caching module, and can also use the additional Squid module for caching, both of which can effectively improve Apache's access response capabilities.

  For the cache of website program development, the Memory Cache provided on Linux is a commonly used cache interface, which can be used in web development. For example, when developing in Java , you can call MemoryCache to cache and share some data. Some large communities use it. such a structure. In addition, when using web language development, various languages ​​basically have their own cache modules and methods. PHP has Pear's Cache module, Java has more, and .net is not very familiar, I believe there must be.

5. Mirror

  Mirroring is a method often used by large websites to improve performance and data security. The mirroring technology can solve the difference in user access speed caused by different network access providers and regions. For example, the difference between ChinaNet and EduNet has prompted many websites to A mirror site is built in the education network, and the data is updated regularly or in real time. In terms of the detailed technology of mirroring, I will not elaborate too deep here. There are many professional ready-made solution architectures and products to choose from. There are also cheap software implementation ideas, such as tools such as rsync on Linux.

6. Load balancing

  Load balancing will be a high-end solution for large websites to solve high-load access and a large number of concurrent requests.
  Load balancing technology has been developed for many years, and there are many professional service providers and products to choose from. I have personally encountered some solutions, and there are two architectures for your reference.

(1), hardware four-layer switching

  Layer 4 switching uses the header information of Layer 3 and Layer 4 packets to identify service flows according to application sections, and allocates the service flows of the entire section to appropriate application servers for processing.

  Layer 4 switching functions like virtual IPs, pointing to physical servers. The business it transmits obeys a variety of protocols, including HTTP, FTP, NFS, Telnet or other protocols. These services are based on physical servers and require complex load balancing algorithms . In the IP world, the service type is determined by the terminal TCP or UDP port address, and the application range in Layer 4 switching is determined by the source and terminal IP addresses, TCP and UDP ports.

  In the field of hardware four-layer switching products, there are some well-known products to choose from, such as Alteon, F5, etc. These products are expensive but value for money, and can provide excellent performance and flexible management capabilities. "Yahoo China" had nearly 2,000 servers at first, and only three or four Alteons were used to get it done.

(2), software four-layer switching

  After everyone knows the principle of the hardware layer 4 switch, the software layer 4 switch based on the OSI model comes into being. The principle of such a solution is the same, but the performance is slightly worse. However, it is still easy to meet a certain amount of pressure. Some people say that the software implementation method is actually more flexible, and the processing capacity depends entirely on the familiarity of your configuration.

  We can use LVS commonly used in Linux to solve the four-layer software switching. LVS is Linux Virtual Server. It provides a real-time disaster response solution based on heartbeat, which improves the robustness of the system and provides flexible virtual VIP configuration. and management functions, which can meet a variety of application requirements at the same time, which is essential for distributed systems.

  A typical strategy for using load balancing is to build a squid cluster on the basis of software or hardware four-layer switching. This idea is used in many large websites, including search engines. This architecture is low-cost, high-performance, and strong. It is very easy to add or remove nodes to the architecture at any time.

  For large-scale websites, each of the methods mentioned above may be used at the same time. The introduction here is relatively simple, and many details in the specific implementation process need to be familiarized and experienced by everyone. Sometimes a small squid parameter or apache parameter setting has a great impact on system performance.

7. The latest: CDN acceleration technology

What is a CDN?

   The full name of CDN is Content Delivery Network. Its purpose is to add a new layer of network architecture to the existing Internet, and publish the content of the website to the network "edge" closest to the user, so that the user can obtain the required content nearby and improve the response speed of the user visiting the website. .

  CDN is different from mirroring because it is smarter than mirroring, or it can be used as an analogy: CDN = smarter mirroring + caching + traffic diversion. Therefore, CDN can significantly improve the efficiency of information flow in the Internet network. Technically, it comprehensively solves the problems of small network bandwidth, large number of user visits, and uneven distribution of outlets, and improves the response speed of users visiting the website.

Types of CDNs

   There are three types of CDN implementations: mirroring, caching, and dedicated lines.

  Mirror Sites, the most common, allow content to be published directly and are suitable for static and quasi-dynamic data synchronization. However, the cost of purchasing and maintaining new servers is relatively high, and mirror servers must be set up in various regions, with professional and technical personnel for management and maintenance. For large sites, the bandwidth costs for updates are also significantly higher.

  Cache, low cost, suitable for static content. Internet statistics show that more than 80% of users often visit 20% of the website content. Under this rule, the cache server can handle most of the static requests of customers, while the original server only needs to handle about 20% of non-existent requests. Cache requests and dynamic requests, thus greatly speeding up the response time of client requests and reducing the load on the origin server.

  CDN services generally place cache servers on key nodes across the country.

  The dedicated line allows users to directly access the data source, which can realize the dynamic synchronization of data.

Examples of CDNs

  For example, when a user visits a website, the website will use global load balancing technology to direct the user's visit to the nearest working cache server, and directly respond to the user's request.

  When a user visits a website that has already used the CDN service, the biggest difference between the resolution process and the traditional resolution method is that the authorized domain name server of the website does not respond to the resolution request of the local DNS in the traditional polling method, but fully considers the user initiated The location of the request and the network conditions at the time determine to direct the user's request to the node cache server that is closest to the user and has a relatively light load.

  Through the combined data of the user positioning algorithm and the server health detection algorithm, the user's request can be directed to the nearby cache servers distributed at the "edge" of the network, so as to ensure that the user's access can get a more timely and reliable response.

  Since a large number of user accesses are directly responded by the CDN node cache servers distributed at the edge of the network, this not only improves the access quality of users, but also effectively reduces the load pressure on the origin server.

Attachment: Service Description of a CDN Service Provider

 


Adopt GCDN acceleration method

  After adopting the GCDN acceleration method, the system will add a GCDN server between the browsing user and your server. When browsing users access your server, general static data, such as pictures, multimedia materials, etc., will be directly read from the GCDN server, which greatly reduces the exchange of static data read from the main server.

  The VPN high-speed compression channel specially added for VIP type virtual host, using high-speed compressed telecom <==> Netcom, Telecom <==> International (HK), Netcom <==> International (HK) and other cross-network dedicated line channels, Intelligent multi-line, automatically obtain the fastest path, extremely fast dynamic real-time concurrent response speed, realize the real-time synchronization of the dynamic script of the website, and have a more obvious acceleration effect on the dynamic website.

  Every network operator (Telecom, China Netcom, China Railcom, Education Network) has a GCDN server for your server, no matter where the browsing users come from, GCDN can make your server show the fastest speed! In addition, we will back up your data in real time to make your data more secure!

 

Go to: http://blog.csdn.net/y_h_t/article/details/6322823

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326485729&siteId=291194637