Sogou Big Data Director and Polarr co-founder sharing and communicating on deep learning

Architect group exchange meeting: Choose one of the most popular technical topics for practical experience sharing in each issue.

Gong Enhao, co-founder of Polarr, Gao Jun, director of Sogou Big Data, and Peng Yao, head of Qiniuyun AI Lab, were invited to discuss the selection of deep learning frameworks and future trends.

Free Communication

Polarr Gong Enhao

My is Gong Enhao. I am currently studying at Stanford, mainly doing deep learning research, especially research related to medical imaging. At the same time, he is also in a startup company, which is called Spicy Retouching in China, and Polarr in the United States. The big data of pictures is done on the cloud, on the mobile phone, and on the PC. Our App collects data and establishes an optimized algorithm. We provide picture sorting, picture selection, and picture information identification in the cloud. The identified model is compressed and the entire deep learning is placed in the mobile phone App to realize picture adjudication, identification, Selection and rendering, it may be called Thunder Album in China. I'm mostly responsible for all the AI ​​parts.



Sogou Gao Jun

My is Gao Jun. Currently, I am in charge of algorithm research and big data related technology research and development in Sogou. In terms of Sogou's user-end products, there are two products that have a strong relationship with deep learning. One is speech recognition, which is used in input methods to convert voice input to text, and the other is image search. In my team, deep learning is mainly used in the advertising field, such as CTR estimation, advertisement retrieval, and advertisement relevance evaluation. In the future, we hope to do some valuable work on NLU, and also hope to achieve some achievements in the direction of network compression.



Qiniu Peng Yao

I am Peng Yao, the head of Qiniu Cloud AI Lab. Qiniu is a company that started with cloud storage. We have a wide range of image, video, and audio data on our cloud platform, and there are many rich media customers. , so the main responsibility of our AI lab is to analyze a large amount of rich media data, and to do some applications in related fields such as content auditing and identification to serve customers on our platform.



Topic exchange
Moderator : Can you share about network compression?

Polarr Gong Enhao: There are several parts to network compression.

Part of the first is to find the most suitable architecture. I personally think that this part has to be determined according to the specific application and performance requirements.

The second part is network compression, which reduces the model parameters as much as possible without changing the model effect. One of my classmates is doing this work called Deep Compression, and then I also participate in some new deep learning algorithm research with this classmate, and further optimize the model based on Deep Compression. Some recent studies have found that a deep model can be compressed or even compressed to dozens of times and hundreds of times, which means that the whole has a lot of redundancy. Based on this, can we choose some appropriate models to trade off and maintain the performance on the basis of the model volume. It is significantly improved than the original. For example, its own network is Dense Network, compressed into Sparse Network, and then grown into Dense Network, which can be optimized step by step. It can be imagined that this Network will be fat and thin for a while, and finally reach a state with better volume and performance. Personally, what I am mainly doing is the statistical analysis of this method.



The third part is model coding. Our company is trying to put the network of image recognition on the mobile phone, so the first is based on model compression. The specific implementation method is: at each iteration, a small part becomes zero, and then iteratively optimizes, and the final model still has some optimizations, which will be much smaller. At the same time, you can also optimize the encoding on the network on the mobile phone. When I experiment, the weight can also be changed from the original 32-bit float to 16-bit, which is half less, or if it becomes 8-bit, the encoding will be much smaller. Based on these (optimizing the model structure, thresholding to make the model sparse, coding to reduce storage), compression can be continuously performed. But it mainly depends on the demand. For example, if it is in the cloud, it may not be necessary to press very deeply, but when we move to the mobile terminal, we need to press more heavily. Otherwise, the app will be too large. At the same time, after you compress it, you have to do decompression. work that takes a certain amount of time.

Sogou Gaojun: This year, I saw a paper talking about the network of teach student, and then I have such an assumption, because the feature space involved in the advertising field is very large. For a network above the level of 1, try to make it down to the order of millions, and at the same time, its performance can still be maintained at a good performance.

Polarr Gong Enhao: Let me talk about it. First of all, I don't think the smaller the model, the faster it is. It may be related to the architecture. I think I can try some simple examples. You can start by looking at smaller and faster models that others have verified. See if it is reasonable to do it based on that, and whether it can meet your needs. Then if it doesn't work, sacrifice some accuracy, and it depends on how much of your specific accuracy and performance can be used.

Moderator: Dr. Gong, you are now doing model compression, mainly on mobile phones. After compression, can your computing power consumption be reduced with a certain accuracy?

Polarr Gong Enhao: In terms of computing power consumption, if you use its Framework directly, it is actually the same. But you can hack something and it will improve. For example, low-precision multiplication can be used.



I think Metal for iOS is great. For example, AlexNet can be used on mobile phones, which can be 30 to 42 fps, and then the inception model, which is about 10 fps. At the same time, some of them have just been optimized, so I think that in the future of deep learning on the mobile terminal, there will be many companies using the mobile terminal to solve problems, which is very promising.

Moderator: Which framework are you using?

Polarr Gong Enhao: In fact, it is quite useful, and its Metal Framework must be used on iOS. Then the rest is the back-end, and many will use it. Caffe and Tensorflow have some contacts.

Moderator: Sogou's advertising recommendation is mainly based on recommending structured data, or is it based on recommending unstructured data?

Sogou Gaojun: There are two kinds, there are search advertising problems, and display advertising problems. For search advertising, it has a clear query word, which you can understand as a structure. Assuming that the text is understood as structured, it is very complicated for display advertisements. In order to improve the online CTR, you need to clarify the interests of users. In the process of processing user interests, its data is very different. You must be sure It will use search, but you will also use some browsing behaviors in its site. For example, we get all the data in the customer site, and the source of its entire data is very complicated. Therefore, for display advertisements, it can be considered that all the processed data are basically heterogeneous, which can be understood as an unstructured problem.



Moderator: In the field of advertising, what is the application of deep learning?

Sogou Gaojun: In fact, the work in this field is quite different from the image. The big reason is that the academic world does not pay attention to advertising. Another very important reason is that there is not that much advertising data, so it is difficult to see some papers that will specifically focus on the application of deep learning in the advertising field, so the industry's practices are crossing the river by feeling the stones. In my case, deep learning, at least in the ranking problem, will be at least 10 points higher than our existing strategy base. Deep learning, in the field of advertising, Baidu used relatively early. Now Ali is developing rapidly, and there are many applications in the field of product recommendation. So from the application point of view, I feel that there is a benefit, but the investment is not proportional to the income at present.



In the advertising field, what we see is that the GPU machine has not shown an advantage in the acceleration ratio, which may be because we are in the advertising field, unlike the image field, which will have a large number of CNNs. In the field of advertising, I have compared some speedup ratios on a small scale. GPU machines have no advantage, so I have always had a question in my mind, why do people consider using GPU machines in image and voice? Is it because of the volume The reason for the accumulated network? It doesn't take into account any problems with the CPU at all.

Moderator: The core is these functions, these equations, and a large number of matrix calculations, so there is definitely no advantage to the CPU for matrix calculation. Because the GPU can calculate multiple data at the same time, its advantages are obvious. Therefore, in image and speech, including NLP processing, the advantage of GPU is obvious. Basically, the computing contribution of CPU is very small, and many advertisements are not matrix calculations, so it is normal that the acceleration ratio is not high, and there may be no CPU computing. fast.

Qiniu Peng Yao: Actually, I have used the CPU here to run some tests. Some customers have used our yellow detection system before. At the beginning, they said that they could not purchase GPU machines, so I tested them with CPU for a round, and his efficiency was very low. If the ratio of a single GPU to CPU is about 20 times.



Sogou Gaojun: I still have a small question. I don’t know how big a cluster problem you will deal with when doing deep learning in parallel on multiple machines. At least when we are doing some multi-machine parallelism, we migrated from Tensor to MXNet, and then we found that the efficiency of Tensor seems to be a bit problematic. I don't know if there is any better in the industry in terms of multi-machine and multi-card. Effectively improve the speedup problem. Dr. Gong, on the US side, on the parallel issue, have you learned of any new developments?



Polarr Gong Enhao: I have seen a Spark-based deep learning caffe franework variant on a CPU cluster before, but I didn't pay much attention to it. I think it may be feasible. Spark is used more in data processing. But I personally have not involved multiple machines and multiple cards for the time being. But I think since Amazon pushes mxnet so much, they will definitely launch better multi-machine and multi-card things.



Qiniu Peng Yao: I have investigated multiple machines and cards before, including Tensorflow and Caffe. Tensorflow itself does not provide a good paramter server design. The framework allows you to better design parameter servers according to your application. I feel that Caffe poseidon provides a good design of the Paramter server, including how to transmit when the matrix is ​​synchronized, and the matrix is ​​transformed to make it smaller, which can be synchronized more efficiently.



Do you feel that training with Tensorflow is much slower than MXNet and Caffe, and have you encountered such a problem?



Sogou Gaojun: I have encountered it, and the gap between multiple machines is very large, so we also made a small part of the traditional part, involving the multi-level parallel strategy, and there are not many changes, but on the basis of CPU On, we took a look at it at the time, and the effect is not bad.



Qiniu Peng Yao: Has anyone used Torch, because I heard from some friends that Torch has better convergence rate and accuracy than Caffe when running the same data set and network, maybe it is because he is in the underlying algorithm There are some tricky places.



Polarr Gong Enhao: I used Torch for DSD research before, which is Torch based on resnet. The first thing I feel about Torch is that it is too troublesome, because too few people use it, and it is not easy to ask any questions. But it has some advantages. For example, I want to change some regularization and weights in the iteration process. It is relatively convenient to change it on Torch, because many of its underlying operations are more exposed, which is more convenient than changing in Caffe. For example, if we want to make an adjustment every step, and get the latest adjustment, we can pass Torch. Relatively speaking, it is similar to Python, and it is easier to implement. This is a feeling.

Moderator: What do you think about the development of deep learning in the application field?

Qiniu Peng Yao: Content review, for example, Huang Jian, is to identify pornographic videos, which greatly simplifies Jian Huang’s work. There are some content tags, especially for social networking sites, we will tag social networking sites, live broadcasts, short videos, and help customers understand the content of images and videos.



Sogou Gaojun: I have a small question. You mentioned that you are doing some work for social networking sites. Did you do something in the direction of video understanding?



Qiniu Peng Yao: For example, according to the needs of customers, we have done a face detection to check whether the uploaded photos have avatars. If there is no avatar in the photos he uploaded, then the user is actually a bad user. For example, we collect pictures of a social networking site, but in fact these pictures are disorganized, then we make an application to label all pictures, including face clustering, scene recognition, social networking Customers can classify albums according to our tag application, so that they can do some data analysis, analyze the number of selfies of each user on the website, etc. It is to do some crowd analysis from the image aspect.



Sogou Gaojun: One of the more interesting deep learning applications that I have heard this year is the application of video recommendation. The traditional video recommendation is processed by text. Kuaishou has very little text information, and it is all about uploading videos by users, so they used deep learning this year to understand the content of the video, and then make recommendations, which is quite interesting.



Qiniu Peng Yao: I think this piece is equivalent to labeling some unstructured data for customers. Then after labeling, I can actually do a lot of things, I can do classification, search, recommendation, and there are many things I can do. I can even hit each slice, for example, a slice of a video every 10 seconds. mark, and then you can do a lot of things. For example, the editing of a newsreel is to label every part of the newsreel. For example, in this piece of news of mine, I have my host appear, and then he will detect the text of the following topic, put the text in OCR, and he will label the news piece by piece, which can facilitate editing. , post-processing, etc.



Sogou Gaojun: Qiniu's AI is mainly to provide to B services, that is, to help some enterprises to solve their internal needs and use machine learning to solve problems?



Qiniu Peng Yao: We started out as a content review system for pornography, and later we developed various labeling systems and customized identification applications.



Sogou Gaojun: Under this model of Qiniu AI, do you consider this business model to operate as a long-term business model, because I have contacted some companies in Beijing, even large companies, such as China Merchants Bank. I have not seen a strong ability to pay for a company with a high level of payment. It is difficult for them to propose such a method that requires machine learning to solve problems, and it is also difficult for them to form a valuation, that is, a valuation. I've always been curious about this. Can this model really be a profitable model?



Qiniu Peng Yao: It depends on the customer group. For example, Jianhuang saves a lot of costs for customers. It turns out that they need a lot of basic manpower, and the labor cost is very high, so it is actually very happy to do this, and Jianhuang Teachers are particularly difficult to do. They all have to be skilled workers, and then if they work for half a year and a year, they will stop doing it. In fact, the cost of labor is very high. There are other applications, and we are all working on applications that save a lot of labor costs.

Moderator: Does Qiniu have some AI strategies?

Qiniu Peng Yao: We will mainly do some articles in the video direction in the later stage, including things like video analysis, including some general video detection. We will focus on solving the actual problems of customers on our platform to invest in this aspect. Research is mainly in the field of video analysis, because we store a lot of videos above, and fine-grained detection of videos is also one of the key directions.

Moderator: What is your vision for deep learning?

Sogou Gaojun: I asked a small open-ended question. After deep learning came out, Amazon did the echo thing. Will there really be a family secretary like the one in Iron Man in five years, just like the original Apple phone did all the mobile phones, will such a thing happen in five years? What do you all think about this.



Amazon's echo now provides a very sufficient API to connect some devices in the family, etc., or some functions on your App. So I was thinking to myself, if there is such a trend in the future, it is likely to become a must-have device for the family. In this scenario, it can derive a lot of services, for example, it can connect to the camera, it also has voice, and it can become omnipotent. That is, all the things we may do now can be done by it. I've been wondering if this will ever happen because it can be totally life changing.

Moderator: I think this matter, if it is just a smart home, I think it should be possible. If you're a super geek, put some lights in your home, or a robot in your home, I don't think it's a problem. But many people may be more concerned about privacy protection, and they may not be happy to put robots in their homes. This problem, I think there should be no problem with small-scale push, but I think there will still be problems in large-scale.

Polarr Gong Enhao: I think echo is very popular recently, but I think everyone can have this kind of service on their mobile phones in the future. In fact, it is more direct. Now it seems that many startups are engaged in personal assistants. Their main idea is to become AI assistants, such as calling a taxi for me, don’t have to bother to take a taxi by myself, and the mobile assistant can connect with Internet services through AI. I think this is all possible in the near future.

Sogou Gaojun: Are there many companies doing business in this direction in the United States?

Polarr Gong Enhao: I have seen some recently, including in China. I had a classmate who came back to China to be a personal assistant. In the end, he definitely wanted to do it. It was speech recognition, and it was artificial intelligence. I think it is still a beginning direction, and I want to do it in AI.

Sogou Gaojun: I remember that there are similar teams in China, very similar to Amazon echo. There are even car-mounted rear-view mirrors that seem to be heading in this direction.



I usually use Microsoft's Xiaobing, and sometimes I use it to adjust some programs and make small things.

Moderator: Personally, from the general public, how many people will use these things, I think they may not use many.

Polarr Gong Enhao: I think there are mainly a few problems, which is the recognition accuracy, and the other is that he and others, such as sending something in WeChat, he can't realize this function, for example, it is very tricky at present. Charge something.

Moderator: I think you can discuss chatbots. At present, I don’t think there is a particularly good application, and the algorithm may not be particularly mature.

Sogou Gaojun: A friend told me about chatbots before. He mentioned that corpus is a very troublesome thing. I don’t know how you deal with it.

Moderator: The core is the construction of knowledge graph. In chatbots, it is not a technical problem, but actually a problem of production materials, which is how you build a chat knowledge graph in the professional field. How to apply in-depth with the industry, this is a future trend. There is no threshold for technology, just a few people can create a robot chat company.

Sogou Gaojun: If you do an automatic question answering in a vertical field, there is a domain-level knowledge base, which may be of great help in solving these problems. For example, doing Xiaobing is very broad. I have always been curious about a question. For example, there are a lot of dialogues in movies and TV series. Is there any value in using this dialogue in this scene to help this chat? Will robot algorithms get better? If it is only from the perspective of this kind of QA, it is very labor-intensive to collect this pairing relationship. But sometimes a chatbot might just want people to feel like a human. So why can't you get a lot of dialogue from TV shows and movies.

Qiniu Peng Yao: I think the customer service robot is relatively easy to do, but it is more difficult to make it look like a human. I came across an example before, which is to let the robot learn the content of everyone's usual chat. For example, "I'm sick, I'm not feeling well today", and then do manual marking, for example, 5 answers, one of which is "what's wrong". As a result, it engaged several groups of people to mark, and the most people chose "what's wrong". In fact, you can use the phrase "what's wrong" in any scene. This robot will give you everything. What's wrong. In fact, it's still not at the point where it's contextualized and able to understand everything.

Moderator: You can explore some new areas.

Polarr Gong Enhao: In addition to the company, my personal research mainly focuses on medical imaging, which is a relatively new application, such as using deep learning to help doctors make some diagnoses, or seeing some diagnoses that people cannot see. The quality of providing pictures is actually related to highlighting. At the same time, I think NLP can also be used in medical diagnosis. Recently, it seems that some people use all kinds of unstructured data to predict, which means that I Personally, I am more interested and may be doing some relatively small attempts.

Sogou Gaojun: The laboratory during my Ph.D. is a cv laboratory. Many of my brothers and sisters are doing some image-related entrepreneurship. The medical imaging mentioned by Dr. Gong just now is something I am currently paying attention to, and I am really interested, because there are several small start-up companies in China, such as deep care, and they seem to be doing similar work, it seems IBM's Watson, which many people are learning, seems to be that method. There are such a group of companies in China that are doing it, and another group of companies are indeed using the NLP method to diagnose and triage problems. In medicine, these two direction. I feel that there are more start-up companies now, but I haven't heard about pharmaceuticals at the moment, so I usually pay most attention to medical imaging after the advertisement is over. Chat and listen to some of their thoughts, because I think this matter seems to have great commercial value.

Qiniu Peng Yao: This kind of project is generally very large. For this kind of project, it actually solves some very general problems. You only need to solve some of the problems in one department, such as medical imaging, then it actually solves this very general problem. .

Sogou Gaojun: But I am not optimistic about doing this in China at the moment, because one of my basic judgments is that it is not very reliable if you want the hospital to pull out the useful data, because they told me that For one thing, they have obtained tens of thousands of cases, and then related data. After deleting, the data that can be used is probably thousands. My feeling at the time is that this industry does not use deep learning, you It is impossible to get a logistic regression, the amount of data is too small. In fact, it is difficult to have such a large amount of time in China to allow you to do this, so it may be rarely done in the United States, but I think there should be a long-term opportunity in China.

Polarr Gong Enhao: In fact, there are still many cases of this kind in China, mainly because hospitals and schools can also cooperate. For example, Tsinghua University has a lot of resources in this area. If you want to do this again in the future, it will start from each patient, and Now the number of patients in China for more than a week is almost the same as the number of patients in the United States for a month and a year.

Qiniu Peng Yao: Yes, there are many affiliated hospitals in a medical research institute like Tsinghua and Zhejiang Jiaotong University, and there is still a lot of data. Just now, Dr. Gong also said that it is given to certain universities, and these imaging centers in universities can flow out, so there are actually many opportunities in this area. The question is which diseases are the breakthrough points, and this can be explored further.

Polarr Gong Enhao: Recently, there has been some progress in the development of segmentation based on CNN, which can be used for many medical applications.

Sogou Gaojun: But it is a bit like medical imaging. Even if we can get good data, it is unlikely to become a major method, right? I think so, after all, it has an error rate. If you let a machine do the main advice, it will be more troublesome in the event of an accident. This is how I feel. So I think this tool will only serve as a reference for doctors in the future. I don't know if everyone will have greater expectations for this?

Polarr Gong Enhao: These medical ethics and management issues are mainly due to the fact that no matter how well you do, you can't have a machine to do this or that for you. Someone is responsible for the final signature, but for doctors For example, he needs to see many layers of pictures, many different layers of films, then if you can tell him, you should watch this layer, which is the main layer. This reduces his workload, in essence, very very good. A few days ago, I was talking to a medical school teacher about this, and he felt that he needed something in this regard.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326489383&siteId=291194637