How to Ensure DNS Security from an Operational Perspective

 

As we all know, DNS, as a basic Internet service, plays a vital role in the normal operation of the entire Internet. Of course, attackers with ulterior motives also understand this truth and always hope to disrupt the normal development of DNS resolution services through various attack methods.

How to comprehensively operate from different levels to ensure the safe and efficient operation of DNS services has always been a problem that every engineer of DNSPod has been thinking about. We believe that we should start from the following aspects:

Status monitoring

DNS service is a service with very high real-time requirements, and an accurate and comprehensive monitoring system is the basis for the operation of the entire DNS service. To this end, we have designed a complete monitoring system, including network traffic monitoring, server kernel monitoring module, analysis monitoring, server cluster monitoring and so on. Monitor the DNS resolution service from different levels and angles to ensure that engineers can understand its running status at the first time. In terms of technology selection, on the one hand, we use the relatively mature SNMP-based nagios/cacti monitoring, on the other hand, we develop a monitoring module closely integrated with the resolution service according to the characteristics of DNS to meet the needs of different monitoring objects.

Information alert

Various situations will always occur during the operation of the DNS service. The same event needs to be notified to different persons in charge, and each person needs to know different information. For example, after a domain name attack event is captured, an alarm will be sent to the operation and maintenance engineer immediately, indicating traffic data at various levels. Send technical support a summary of the attack situation and the extent of the impact so that users can get the latest information when they inquire about the situation. For VIP customers, the attack-related data and processing situation will also be sent to the relevant sales staff, and the sales staff will directly get in touch with the customers. Especially serious attack incidents will also be sent to market personnel, developers, technical leaders and even general managers to ensure timely delivery of information and timely handling of incidents. In order to meet diverse information sending needs, we have established a special notification system platform, which provides a consistent API interface for each program to call, and can provide various notification methods such as email, WeChat, SMS, and voice.

event handling

In order to respond to and deal with various incidents in a timely manner and provide users with continuous high-quality services, we implement a 24-hour on-duty system. Experienced technicians are available at all times to respond to emergencies. At the same time, in order to further enhance the response efficiency, automated operation and maintenance processing is essential. For example, we have conducted long-term research on DNS attacks, and developed various protection methods such as domain name blocking/unblocking, protection algorithms, and traffic guidance, which are automatically activated according to the actual situation of DNS attacks, which can resolve large-traffic DNS in a short period of time. attack to minimize the impact.

data record

Of course, the completion of event processing does not mean the end, and various records need to be made to ensure that it can be reviewed and analyzed. The basic data includes switch traffic data, network card packet capture data, event processing records, etc. We have made complete records, backups, sorting and archiving of these data, so that not only any problems can be documented, but also for further investigation. Statistical analysis is ready. Because of the large amount and variety of data, we use Redis and MongoDB more, and their NoSQL feature is especially suitable for this situation.

Comprehensive operational data analysis

In addition to short-term response strategies for a single event, operations require long-term data recording and analysis. Our daily operations will be presented in the form of reports, and we will have longer-term tracking and trend analysis of data such as domain name resolution, number of users, and attacks. For example, according to the analysis of the attack trend, strengthen the investment in attack prevention, and contact the sales staff to follow up according to the user's transfer-in/transfer-out situation. Here we use Graphite for drawing, and D3.js also has a good performance in drawing reports.

In general, DNS service has its complexity and particularity. DNSPod has been focusing on DNS resolution business for a long time, and has rich experience and profound accumulation in this field. I hope the above sharing can bring benefits to every friend who cares about DNS field. benefits, and jointly create a better Internet environment.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326840668&siteId=291194637