The battle between multi-view and multi-modal

This week saw the CCF-AI 6 series of reports into the university, looking at Tianjin University Associate Professor Zhang Changqing grant to do more than view reports when learning. In the questioning session, there is an interesting question. What is the difference between multi-view and multi-modal? Traditional machine learning in general based on a single view modeling analysis, some scholars called multi-view is multi-modal. Let me talk about the author's own understanding, only to understand, whether right or wrong.

Let's take a look at the answer from the boss:

This question was raised by Professor Wang Xizhao of Shenzhen University . Teacher Wang gave a side answer with the answer of a doctoral graduate of Shenzhen University : The data obtained by multiple sensors is multi-modal, while the data obtained by a single sensor at different locations is multi-view.

Teacher Zhang's answer: Multi-view includes multi-modality. Multi-view is closer to machine learning and more abstract. Multi-modality is closer to the application, combined with an actual application.

Understand for yourself:

   Generally speaking, there is no big difference between them, and they can be interchanged in many places, such as multi-view multi-clustering, multi-view multi-instance multi-label learning or multi-modal multi-instance multi-label learning. If you want to fight for a high or low, I prefer the answer from Teacher Zhang's side, but the answer is consistent but not the same.

Recently, I have been investigating and expressing learning. I will explain the difference between the two from this perspective, and everyone may understand it a little deeper. In representation learning, Graph embeding and Network embeding (both graph embedding and network embedding) are often mixed by everyone, just like multi-view and multi-modality. Consider a question. Why did you introduce the Graph structure but not the network structure when studying the computer professional course "Data Structure"? Because Graph is an abstraction of reality, it is an abstraction. When we talk about the Internet, we usually talk about XX networks, such as social networks, citation networks, and so on. Then use the abstract structure or technology of graph to model and analyze the actual network.

    Then talk about the difference between Graph embeding and Network embeding. The purpose of Graph embeding is to reduce the dimensionality, and the learned low-dimensional embedding can be reconstructed, and it can be restored to the original'data form'. Network embeding requires not only reconstruction, but also some inference tasks, such as node classification, link prediction, community discovery, and so on.

   So how to use the thinking of Graph embeding and Network embeding to analyze the difference between multi-view and multi-modal? Multi-view is more biased towards'data structure', which is more abstract and more convenient for modeling and analysis. It is not only oriented to data structure, but also a kind of machine learning paradigm. However, multi-modality is more oriented towards solutions. Practical applications are not only oriented towards solving, but also a specific solution.

The above is only a statement of one family. If there is any improper understanding, please include a lot (picture invasion, contact deletion).

Guess you like

Origin blog.csdn.net/qq_39463175/article/details/106992307