Live registration|Meituan Technology Salon 56th: Meituan Computer Vision and Multimedia Technology Practice--ACM MM2020 Special Session...

【Meituan Technology Salon】Sponsored by the Meituan technical team and the Meituan Association for Science and Technology, each salon invites technical experts from Meituan and other Internet companies to share practical experience from the frontline, covering all major technical fields.

From September 2020, the Meituan Technology Salon will also create a series of academic activities, including top conference paper sharing, academic hotspot discussions, etc., inviting industry and academia to discuss cutting-edge issues.

Activity time: October 31, 2020 14:00-17:00

Event address: online event

Event registration: poke me to sign up

/ Seller /

Ma Lin|Meituan Researcher

He is currently a researcher at the Visual Intelligence Center of the AI ​​Platform Department of Meituan. He was an expert researcher at Tencent AI Lab and a researcher at Noah's Ark Laboratory in Hong Kong. Mainly engaged in deep learning, computer vision, video analysis and understanding, etc., especially the research of multi-modal deep learning of vision and language. Published many papers in top conferences and journals such as CVPR, ICCV, ECCV, NIPS, ICML, ACL, EMNLP, TPAMI, etc.

| Activity Introduction

As the top international event in the field of multimedia technology, ACM MM is the focus of common concern in the academic and industrial circles. This technical salon will introduce the results published by Meituan and its partners at ACM MM2020, and hope to exchange and learn with technical colleagues in the industry.

| Schedule

| Share Introduction

Topic 1: Application of Hybrid Attention Model in Video Summarization Task

Wang Junyan|PhD student, University of New South Wales, Sydney

He is currently a PhD candidate at the University of New South Wales in Sydney, and has been an intern at the Meituan Vision Intelligence Center. His research direction is video understanding and medical image processing, and his research interests are graph network and meta-learning.

brief introduction

This work uses the self-attention network as the basic structure, proposes a hybrid attention layer through secondary auxiliary query features and hybrid attention distribution, and adopts a "time-space" dual-channel feature extraction method, while using single video meta learning Based on the training strategy, we get our final dual-channel hybrid attention network, which can better solve the SoftMax bottleneck problem in a small database and obtain better abstract extraction capabilities.

Topic 2: Large-scale food recognition technology based on stacked global-local attention network

Wang Zhiling|Meituan Research Project Intern

He is currently studying for a master's degree in computer technology at the University of Chinese Academy of Sciences, and is currently an intern at the AI ​​Platform Visual Intelligence Center. The main research direction is fine-grained recognition of dish images.

brief introduction

The report introduces the dish data set ISIA Food500 (500 categories and 399,726 images) proposed by the paper. This data set exceeds the existing benchmark data set in terms of the number of categories and the amount of images. In addition, we will introduce our latest dish image recognition network (SGLANet), which jointly learns the overall and local visual characteristics of dish images, and has reached the leading level in multiple dish benchmark data sets.

Theme 3: Research on "Language-Visual" Information Fusion in Dialogue Tasks

Xu Zipeng| Postgraduate of Beijing University of Posts and Telecommunications

A postgraduate student majoring in Intelligence Science and Technology at Beijing University of Posts and Telecommunications. His research direction is visual dialogue, and his research interests are dialogue and visual-language.

brief introduction

In this work, we emphasized the role of "answer" in goal-oriented visual dialogue, and proposed a response-driven visual state estimator for fusion of dialogue history information and picture information in visual dialogue, with focused attention The mechanism effectively strengthens the response information, and the conditional visual information fusion mechanism is used to adaptively select global and difference information. This estimator can be used not only to generate questions, but also to guess targets. The experimental results on GuessWhat?!, an international public data set of visual dialogue, show that the model has achieved the current leading level in both problem generation and target guessing.

Topic 4: Unpaired image enhancement based on quality attention generation adversarial network

Ni Zhangkai|PhD student, City University of Hong Kong

PhD candidate in the Department of Computer Science, City University of Hong Kong. His research direction is generative models, unsupervised learning and image/video quality evaluation. Published more than ten papers in TIP, TCSVT, ACM MM and other journals/conferences.

brief introduction

The enhancement of image aesthetics is a basic and challenging task. Existing image quality enhancement models based on supervised learning still have a series of limitations, such as the high cost and time consuming to obtain paired training data, and more importantly, the high-quality images obtained are not necessarily recognized by every user. Based on this, we propose a generative model based on quality attention that can effectively learn a user-oriented image aesthetics quality enhancement model from unpaired data.

Topic 5: Video description generation based on example sentences

Yuan Yitian|Doctoral student of Tsinghua University

PhD student in Tsinghua University, his research direction is multimedia analysis and understanding, and joint analysis of video and text.

brief introduction

In this work, we propose a challenging task, namely, the problem of generating a video description with controllable syntax based on example sentences. Specifically, given a video and any grammatically correct example sentence, the task is to generate a natural language description for the video. This sentence can not only describe the semantic content of the video, but also follow the syntactic form of the given example sentence. In order to solve this problem of video description generation based on example sentences, we propose a novel video description generator based on syntactic modulation. The generator takes the video semantic representation as input, and conditionally modulates the gating vector of the long and short-term memory network for the syntactic information of a given example sentence, thereby controlling the hidden state update of the long and short-term memory network used for word prediction. Finally, syntactically customized video description generation is realized. A large number of experimental results prove the effectiveness of our method in generating video descriptions with controllable syntax and accurate semantic expression. By providing different example sentences, our method can generate video description sentences with various syntactic structures, thus providing a novel and effective perspective for enhancing the diversity of video descriptions.

| Thanks

Event organizer: Meituan technical team, Meituan Science and Technology Association

Promotion partner: event line

| Registration method

"Meituan Technology Salon Issue 56: Meituan Computer Vision and Multimedia Technology Practice-ACM MM 2020 Special" registration please poke: sign up .

| Important reminder

Add the WeChat (MTDPtech05) of the assistant Meimei and reply: 1031 to join the event WeChat group and communicate with lecturers and peers.

For PPT and video dry goods of past events, please scan the QR code below, follow the official account of the Meituan technical team (meituantech), and then view it through the [Technical Salon] under [Menu Bar].

Guess you like

Origin blog.csdn.net/MeituanTech/article/details/109108419