500G+ top academic and large factory data sets are open for download, development guide for scientific research and academic competitions

Say important things three times

Only half a month left until the registration deadline!

Only half a month left until the registration deadline!

Only half a month left until the registration deadline!

006C3FgEgy1h65x48jn4nj312a0ggqh2

Competition official website registration link: https://sourl.cn/G5RJKD

[The strongest benefits, live broadcast sharing of dry goods for the competition]

From September 14th to 16th, for 3 consecutive days, 2 experts live online every day, explaining the content of the competition in detail, and dry knowledge that cannot be missed.

Live link: https://live.bilibili.com/25865198

006C3FgEgy1h65x4h53cdj31240h8qi5
006C3FgEgy1h65x4qinh9j31220haduq
006C3FgEgy1h65x4x561bj31260h4qhv

1 Interpretation of the competition questions in the arena (500G+ academic data set is open for download)

- Arena-based track · Super high academic gold content -

The organizer of the competition invites top experts and professors in the field to be the problem definers. According to the research and cognition in the academic field, scientific and forward-looking competition problems are designed, aiming at the basic algorithm to solve the country's major needs.

Question 1. Image Analysis and Recognition of Ancient Books and Documents

Description of the competition topic: In order to solve the problem of digitizing a large number of ancient books in my country, the competition topic aims to collect advanced artificial intelligence algorithms to solve the technical problems of high-precision ancient book text detection, text line recognition, and end-to-end ancient book recognition, promote the technological progress of ancient book OCR, and contribute to the digitization of ancient books. This competition is hereby held to provide artificial intelligence support methods for protection, organization and utilization.

Ancient Book Image OCR Dataset

The training set, verification set, and test set each include 1,000 images of ancient book documents (3,000 images in total), and the data are selected from various ancient book data such as Siku Quanshu, rare ancient books of the past dynasties, and Qianlong Tripitaka.
006C3FgEgy1h65x5remiuj312209caja

Question 2. Pre-trained language model application tuning algorithm:

Description of the competition: In recent years, pre-trained language models have greatly promoted the development of the field of natural language processing. Based on the pre-trained language model, it can achieve good performance on many downstream tasks with only a small number of labeled samples. However, due to considerations of operating costs and commercial profitability, the parameters of many large-scale language models are not disclosed, but are provided to users in the form of open model inference APIs. How to complete common natural language processing tasks only by calling the language model inference API has become an important research direction. The competition topic focuses on the tuning of large-scale pre-trained language models, requiring participating teams to tune the models for six small-sample learning tasks related to natural language understanding only on the premise of using the reasoning capabilities of the pre-trained language models.

Language Classification Dataset

SST-2 is a movie review dataset with sentiment annotation. The Yelp sentiment analysis dataset is built based on the comments on the Yelp website. The AG's News topic classification dataset includes a large number of news corpora collected from more than two thousand news sources. TREC is a question classification dataset, MRPC is a sentence pair classification dataset, and SNLI is a natural language inference dataset.
006C3FgEgy1h65x5yankbj312i0ncq8k

Question 3. Data selection and marker correction algorithm design:

Description of the competition: Deep neural networks can easily overfit to the noisy labels in the training data set, resulting in poor performance in the test data set. This problem limits the performance of deep neural networks in more real problems. In order to enable deep learning technology to land in more real application scenarios, research and develop new classification algorithms, so that the deep neural network trained on the training data set with labeled noise can have good performance in the test data set, which is the post-depth A very important and fundamental scientific question in the study of the study age.

This challenge combines the characteristics of noise labeling to develop an efficient, concise format, and universal classification algorithm for noise labeling problems.

CIFAR-10, CIFAR-100 Tiny Image Classification Datasets

The benchmark data sets used in this task are CIFAR-10, CIFAR-100, Tiny ImageNet, Twitter, SST, including more simulation and real noise label data set experimental tasks. The specific task forms and data will be disclosed in the final.

Question 4. Singular value decomposition and inversion of approximate low-rank matrices:

Description of the competition: Matrix calculation is the most basic calculation task of information processing, and it is also one of the "seven giant problems" of big data calculation. Carrying out research on singular value decomposition (SVD) and inversion algorithms for approximate low-rank matrices has made important contributions to the development of information processing and basic theories of big data, and can promote the innovation of related core technologies.

This competition focuses on a class of approximate low-rank matrix singular value decomposition and inversion problems of special significance. For a given matrix and the conditional constraints of the proportion of the largest number of non-zero singular values ​​in the matrix, this challenge requires teams to develop fast and efficient matrix singular value decomposition and inversion algorithms.

Approximate Low Rank Matrix Singular Value Datasets

Matrix computing is the most basic computing task in information processing, and it is also one of the "seven giant problems" recognized by big data computing. Every advancement in matrix-based computing theory will have a wide-ranging impact on big data analysis, information and communication and other related industries, triggering a series of technological changes and greatly promoting the development of productivity.

Competition question 5. The robust defense algorithm against deep learning models:

Description of the competition: The currently widely used deep learning models have insufficient model vulnerability under the conditions of natural changes in some data, and are deceived by adversarial examples that are invisible to the human eye, resulting in inaccurate model judgments. In order to improve the robustness of deep learning models, develop a new generation of safe and reliable deep learning. This challenge is aimed at image classification tasks, and aims to discover more efficient adversarial defense technologies and improve the robustness of computer vision models under adversarial attacks.

ImageNet computer vision dataset:

The ImageNet dataset used in the competition is a classic dataset used in computer vision system recognition tasks, and was created under the leadership of Professor Li Feifei from Stanford University. The recommended data set for this competition is a subset of ImageNet classification tasks, which is the standard training and testing data used in the annual ILSVRC image recognition competition. The ImageNet dataset and the ILSVRC competition are of great significance to the development of computer vision technology and deep learning models. This competition expects to further explore the robustness of deep learning models on large datasets on classic image classification tasks.
006C3FgEgy1h65x65keamj312c066473

2 10 million prize pool, attracting talents

The competition sets up heavy prizes to attract global artificial intelligence talents and teams, forming a group of internationally competitive artificial intelligence innovative industrial clusters. The competition will set up a total prize pool of 10 million, and each question will be rewarded with a prize of up to 1 million, which may become the algorithm competition with the highest prize money in history.
006C3FgEgy1h65x6i86h0j31260dgk0h

3 Participation Instructions

1. Competition registration period: August 6th - October 7th

Competition time: August 6th - November 15th

2. The competition is open to the whole society. Individuals, institutions of higher learning, research institutes, enterprises, maker teams, etc. can sign up for the competition; each contestant can only join one team, and each team can have up to 5 people.

Note:

(1) In addition to the group arena competition, those involved in topic writing and data contact are prohibited from participating in the competition organization;

(2) The organizer and the competition track can only participate in the competition, and will not participate in the ranking.

  1. Scan the official QR code of the competition or log in to the official event page of Pazhou Laboratory (Huangpu): https://sourl.cn/G5RJKD

Click the "Register Now" button corresponding to the competition topic in the competition topic selection, complete the registration information, and then you can register for the competition.

Note: Make sure that the registration information and team information are accurate and valid. If a trumpet or fake name is found, the qualifications, results and bonuses will be cancelled.

4 Organizational Structure of the Competition

Guiding Units: Pengcheng Laboratory; Guangzhou Municipal Bureau of Science and Technology; Guangzhou Municipal Bureau of Industry and Information Technology

Supporting units: People's Government of Huangpu District, Guangzhou; Management Committee of Guangzhou Development Zone; Management Committee of Guangzhou High-tech Zone

Sponsor: Pazhou Laboratory (Huangpu)

Co-organizers: Chinese Society of Industrial and Applied Mathematics (Big Data and Artificial Intelligence Professional Committee); China Computer Federation; Chinese Command and Control Society; Chinese Artificial Intelligence Society; Fifth Electronic Research Institute of Ministry of Industry and Information Technology; Xidian University Guangzhou Institute

Guess you like

Origin blog.csdn.net/Extremevision/article/details/126847447