2023 Zhiyuan Conference Agenda Open丨Visual and Multimodal Large Model Forum

2af556d949f9e490f7cb25922ae90c85.png

f1dd568c7e1672b0da10505c8d445a2c.jpeg

On June 9th, the 2023 Beijing Zhiyuan Conference will invite explorers, practitioners in the field of AI, and everyone who cares about intelligent science to jointly open the curtain of the future stage. Are you ready? Famous guests at the conference included Turing Award winner Yann LeCun, OpenAI founder Sam Altman, Turing Award winner Geoffrey Hinton, Turing Award winner Joseph Sifakis, Nobel Prize winner Arieh Warshel, Future Life Institute founder Max Tegmark , 2021 David Baker, winner of the Breakthrough Prize, academician Zheng Nanning, winner of the 2022 Wu Wenjun Supreme Achievement Award, and Zhang Bo, academician of the Chinese Academy of Sciences . The online registration channel for the conference has been officially opened. The conference will be broadcast live online to the world simultaneously.

Countdown to Beijing Zhiyuan Conference: 5 days

Vision and Multimodal Large Model丨Afternoon of June 9th

In recent years, row upon row of large-scale language models and multi-modal large-scale models have emerged, opening up a broad new stage for researchers and profoundly affecting human society. Entering 2023, a series of "visual large models" represented by SAM and SegGPT have come out one after another, and follow-up work based on these models has also exploded. It is foreseeable that the "big visual model" will become an important topic that cannot be avoided in the field of computer vision in the next period of time. This forum has invited outstanding scholars from well-known schools, enterprises and research institutions such as Nvidia, Nanyang Technological University, Beijing Jiaotong University, Zhiyuan Research Institute, and Moore Threads. Discussions on the theory, technology, and application of "Visual Large Models" aim to spread knowledge, share opinions, and jointly create a "Visual Large Model" ecosystem to contribute to the development of this field.

Forum agenda

4b3934c59735ca71bb8098fd8acb2265.jpeg

Forum Chair

f8a1258f560beaf7d649feb055fd200c.jpeg

Yan Shuicheng, Visiting Chief Scientist of Zhiyuan Research Institute

Prof. Yan is currently Visiting Chief Scientist at Beijing Academy of Artificial Intelligence (non-profit organization), and former Group Chief Scientist of Sea Group.

Prof. Yan Shuicheng is a Fellow of Singapore's Academy of Engineering, AAAI, ACM, IEEE, and IAPR. His research areas include computer vision, machine learning, and multimedia analysis. Till now, Prof Yan has published over 600 papers at top international journals and conferences, with an H-index of 130+. He has also been named among the annual World's Highly Cited Researchers eight times.

Prof. Yan's team received ten-time winners or honorable-mention prizes at two core competitions, Pascal VOC and ImageNet (ILSVRC), deemed the “World Cup” in the computer vision community. Besides, his team won more than ten best papers and best student paper awards, particularly a grand slam at the ACM Multimedia, the top-tiered conference in multimedia, including the Best Paper Awards thrice, Best Student Paper Awards twice, and Best Demo Award once.

host

521b02596adb2a29c4df4528b218ec98.jpeg

Wei Yunchao, Professor and Doctoral Supervisor of Beijing Jiaotong University

He has engaged in research work at the National University of Singapore, the University of Illinois at Urbana-Champaign, and the University of Technology Sydney. Selected into MIT TR35 China, Baidu Global High Potential Chinese Young Scholars, "Australian" TOP 40 Rising Star, Project Leader of National Key R&D Program Young Scientists, won the first prize of Natural Science Award of the Ministry of Education, China Image Graphics The first prize of the Academy of Science and Technology Award, the champion of ImageNet target detection in the Computer Vision World Cup, and the champion of multiple CVPR competitions. He has published more than 100 papers in top journals/conferences of TPAMI and CVPR, and has been cited by Google more than 15,000 times. The main research directions include visual perception for imperfect data and multimodal data analysis.

Speech Topics and Guest Introductions

1、Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Introduction to the topic:Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this talk, we will introduce a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner. To achieve this, we propose DragGAN, which consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity.

31ed98dc77f9b18cd9a27dc96026c369.jpeg

Xingang Pan, Assistant Professor, Department of Computer Science and Engineering, Nanyang Technological University

Affiliated to MMLab-NTU and S-Lab. His research direction is generative artificial intelligence and neural rendering, and his main work includes DragGAN, Deep Generative Prior, GAN2Shape, etc. Before joining Nanyang Technological University, he worked as a postdoctoral researcher in the group of Professor Christian Theobalt at the Max Planck Institute for Computer Science. He received his Ph.D. from MMLAB, The Chinese University of Hong Kong under the tutelage of Prof. Tang Xiaoou, and his Bachelor's degree from Tsinghua University.

2、Machine Learning for 3D Content Creation

Introduction to the topic:With the increasing demand for creating large-scale 3D virtual worlds in many industries, there is an immense need for diverse and high-quality 3D content. Machine learning is existentially enabling this quest. In this talk, I will discuss how looking from the perspective of combining differentiable iso-surfacing with differentiable rendering could enable 3D content creation at scale and make real-world impact. Towards this end, we first introduce a differentiable 3D representation based on a tetrahedral grid to enable high-quality recovery of 3D mesh with arbitrary topology. By incorporating differentiable rendering, we further design a generative model capable of producing 3D shapes with complex textures and materials for mesh generation. Our framework further paves the way for innovative high-quality 3D mesh creation from text prompt leveraging 2D diffusion models, which democretizes 3D content creation for novice users.

fed71d26bb4f6abf2239498e7a0b3e3d.png

Jun Gao, Research Scientist, NVIDIA

Jun Gao is a PhD at the University of Toronto and a research scientist at NVIDIA. His research direction is 3D computer vision and graphics, mainly focusing on the application of machine learning in the direction of large-scale 3D content generation. His representative work includes GET3D, Magic3D, DefTet, etc., many of which are integrated in NVIDIA products, including NVIDIA Picasso, GANVerse3D, Neural DriveSim and Toronto Annotation Suite. He will serve as the 2023 NeurIPS Field Chair.

3. Preliminary study of general visual model

fa27bbe8f54ad02b5a3a9f32d461faaf.png

Wang Xinlong, Researcher of Zhiyuan Research Institute

Wang Xinlong, a researcher at the Visual Model Research Center of Zhiyuan Research Institute, graduated from the University of Adelaide in Australia with a doctorate. His research field is computer vision and basic models. His research work in recent years includes SOLO, SOLOv2, DenseCL, EVA, Painter and SegGPT, etc. Awards include Google PhD Fellowship, National Scholarship for Outstanding Self-Financed International Students, Doctoral Research Medal of the University of Adelaide, etc.

4、Image, Video and 3D Content Creation with Diffusion Models

Introduction to the topic:Denoising diffusion-based generative models have led to multiple breakthroughs in deep generative learning. In this talk, we will provide an overview over recent works by NVIDIA on diffusion models and their applications for image, video, and 3D content creation. We will start with a short introduction to diffusion models and then discuss large-scale text-to-image generation. Next, we will highlight different efforts on 3D generative modeling. This includes both object-centric 3D synthesis as well as full scene-level generation. Finally, we will discuss our recent work on high-resolution video generation with video latent diffusion models. We turn the state-of-the-art text-to-image model Stable Diffusion into a high-resolution text-to-video generator and we also demonstrate the simulation of real in-the-wild driving scene videos.

ff72ab744d1d822193613523019b6fc1.png

Karsten Kreis, Research Scientist, NVIDIA

Karsten Kreis is a senior research scientist at NVIDIA’s Toronto AI Lab. Prior to joining NVIDIA, he worked on deep generative modeling at D-Wave Systems and co-founded Variational AI, a startup utilizing generative models for drug discovery. Before switching to deep learning, Karsten did his M.Sc. in quantum information theory at the Max Planck Institute for the Science of Light and his Ph.D. in computational and statistical physics at the Max Planck Institute for Polymer Research. Currently, Karsten’s research focuses on developing novel generative learning methods, primarily diffusion models, and on applying deep generative models on problems in areas such as computer vision, graphics and digital artistry, as well as in the natural sciences.

a1314d1329d80c7707b10af61154eefd.jpeg

Huan Ling, Research Scientist, NVIDIA

Ling Huan is an artificial intelligence scientist at Nvidia Toronto AI Lab, a Ph.D. at the University of Toronto, and a member of the Vector Institute in Toronto. During his doctoral period, Ling Huan studied under Professor Sanja Fidler, published more than 10 papers and owned a number of related patents. His research direction focuses on large-scale image and video generation models, and the application of generative models in the field of computer vision. His representative works include PolyRNN++, DatasetGAN, EditGAN and the recent Align Your Latents: VideoLDM.

5. Round table discussion

Round table forum guests:

 Wei Yunchao: Professor of Beijing Jiaotong University

 Pan Xingang: Assistant Professor, Department of Computer Science and Engineering, Nanyang Technological University

 Jun Gao: Research Scientist at NVIDIA

 Wang Xinlong: Researcher of Zhiyuan Research Institute

 Xia Wei: Vice President of Moore Thread AI

505c6eac7b5c2be340704352fa360243.jpeg

Xia Wei, Vice President of Moore Thread R&D

Ph.D. from the National University of Singapore. He has been a visiting researcher at Panasonic Singapore Research Institute and Lund University in Europe. He has published more than 30 papers in international journals and conferences, has more than 30 US patents, and has won the first and second places in Pascal VOC and Imagenet Challenges . He participated in the founding of Orbeus, an artificial intelligence company in Silicon Valley, and launched the Rekognition intelligent recognition platform and PhotoTime, the first smart photo album in the US market. Later, the company was acquired by Amazon and served as the Principal Scientist at AWS AI, responsible for the research and development of AWS artificial intelligence cloud service Rekognition/Textract and other products. During AWS, he and the team jointly created a new research field of machine learning model compatibility.

Scan the QR code or click "Read the original text" to register for offline participation & online live broadcast

ec0a2e4c138352999c5a7c5b1d9a37ea.png

Guess you like

Origin blog.csdn.net/BAAIBeijing/article/details/131058858