The development of Coremail AI technology past and present

In the early morning of March 15, 2023, OpenAI released the large-scale multimodal model GPT-4, officially declaring that AI has entered a new "golden age". As an email security vendor, Coremail can't help thinking, how to implement such a large-scale multi-modal model to specific email security protection at the node of today's rapid technological development?

On March 23, Pan Qingfeng (hereinafter referred to as "big P"), Chief Architect of Coremail Email Security Artificial Intelligence Laboratory, introduced the development process of Coremail AI technology to audience friends in the live broadcast room, and demonstrated the CAC email security big data center and application The closed-loop structure of the product also welcomes like-minded friends to discuss and share with us.

The embryonic period of intelligent algorithms (around 2000)

As a veteran figure who joined Coremail in 2000 to develop the mail system, Coremail often affectionately calls Mr. Pan Qingfeng the big P.

According to Big P, after developing the first mail system in 1999, Coremail has already started research on anti-spam technology in the second year.

The period from 2000 to 2010 was a period of brutal growth and competition among Internet e-commerce companies. Some e-commerce APPs often chose to use advertising emails when promoting them.

Once the user clicks on the URL to complete the purchase payment, the sender of the advertising email can get the corresponding commission.

This type of gameplay is also derived from games, websites, and APPs. As long as the end user completes the game krypton gold, user registration, and APP download through the advertising email link, the sender of the advertising email can get a large amount of commission.

Faced with the harassment of such spam emails, Coremail mainly relies on specific rules to intercept spam emails at this stage, such as keywords, IP addresses, sending and receiving addresses and other information to intercept.

With the development of technology and the growth of spam, Coremail has gradually begun to use a variety of intelligent algorithms to analyze and filter emails, including Bayes algorithm, fingerprint algorithm, email scoring algorithm based on rule weight, etc.

Intelligent algorithm development period (around 2010)

Fast forward to 2010-2020, spam senders have formed a complete industrial chain, trying to bypass the existing ones by purchasing a large number of IPs, adding random backgrounds to email text pictures, or adding a large amount of normal text to spam texts to interfere with the Bayes algorithm. Mail system vendor detection .

Faced with such challenges, Coremail has gradually introduced big data technology in the past ten years, established the Coremail Email Security Big Data Center (CAC Center), established the cloud CAC service, and strengthened each Coremail system through real-time inspection and feature rule distribution anti-spam capabilities.

The CAC center applies feature engineering combined with traditional artificial intelligence spam recognition algorithms, such as SVM and shallow neural network algorithms, to further improve the filtering effect of the original simple algorithm based on email scoring.

Coremail has made a centralized inspection process for the newly emerging methods of sending spam, such as text and image spam. According to the computing power limit at that time, it specially developed a non-OCR algorithm for image spam and applied for related patents.

After 2015, deep learning algorithms began to develop rapidly, and a large number of high-level models appeared in computer vision and natural language processing. CAC also tried to apply some deep learning algorithms in phishing email detection and other aspects.

Large-scale application period (2020-present)

Since the outbreak of the new crown epidemic in 2020, attackers have become increasingly active in attack activities. Coremail found that various malicious emails on the market, such as phishing, fraud, intimidation, and Business Email Compromise (BEC), have evolved into more and more complex emails. The attack combination is still evolving, and the monetary loss caused by such malicious emails is much higher than the harassment impact of spam advertisements on users.

With the continuous explosive development of deep learning algorithms, AI technologies such as natural language processing, anomaly detection, transfer learning and pre-trained large models are also emerging rapidly.

Fortunately, in the offensive and defensive confrontation, based on the massive high-quality data continuously accumulated by the email security big data center, Coremail has realized the closed loop of intelligent collection, identification, warehousing, feedback, self-learning training and improving algorithm model capabilities of email samples.

Various attempts have been made in spam detection, abnormal login detection, semantic analysis, etc. and achieved certain results. Some related deep learning algorithms have been directly applied to our products, such as CAC 2.0 anti-phishing anti-theft account, CACTER email security gateway, security management center SMC2, etc.

In the future, Coremail AI LAB will adhere to the long-term principle, increase investment in cloud computing, artificial intelligence, and big data research, implement the results into the Coremail series products, apply them to actual business scenarios, and promote the overall industrial development of email security .

Guess you like

Origin blog.csdn.net/CACTER_S/article/details/130089528
Recommended