【Big data detection to the evolution of artificial intelligence】

640?wxfrom=5&wx_lazy=1

Source: http://infolaw.fyfz.cn/b/944054?from=groupmessage

Original: Xie Junze

Thanks to the author for authorizing the repost!

Regarding big data, I have been doing follow-up research in recent years, but I only care about the modeling of big data. Because there is no good modeling, big data can't play any value. Then there are no other issues. If you have friends who are studying big data modeling, you should be more interested in this issue.

My theme is inspired. Recently, I participated in an expert demonstration on the construction of a big data platform and found that there were many negative problems in big data investigation. For example, some leaders suggested that after the construction of the big data platform for investigation, there will be too much disruptive information and very little effective information. Some police officers also pointed out that the bigger the data, the more difficult it is to solve the case. It is better to do simple word analysis to find the suspect. Earlier, Professor Chen Gang from the Public Security University also said that a good investigation method must be good and fast. That's why big data detection has a negative trend: the bigger the data, the lower the efficiency and the worse the effect.

My thinking is: the natural attributes of big data restrict the social function of investigation.

We know that more and more data can be used for investigation now. We claim that we can mine at least level 6 or even unlimited data information in the big data platform, but there is very little meaningful information. Why is this? This reflects the first natural attribute problem of big data: "massity". Our big data investigation is to output a "large" data from massive "big" data, rather than "small" data with investigative value. Secondly, there is the second natural attribute of big data, "hybridity", that is to say, valuable investigative information is mixed with a large amount of irrelevant information. When we cannot extract valuable investigative information from the mixed big data Come out, the promiscuity becomes a very large "disturbance". Furthermore, the "relevance" of big data also brings us troubles. If irrelevant information cannot be eliminated, the "relevance" of big data will turn negative, that is, "irrelevance". The above is the root cause of the negative trend of big data investigation.

So, we need to think further, what is the difference between big data in the field of investigation and big data in other fields?

We all say that big data can predict the future, but unfortunately, the social function of reconnaissance is not to predict the future, but to pinpoint the past. For this reason, someone once proposed that there has never been "big data investigation" in the field of investigation, but "big data investigation", which is somewhat reasonable. Secondly, Professor Chen Gang from the Public Security University mentioned earlier that good investigation methods must be "good and fast". This also reflects the social attribute of "efficiency and effectiveness need to be balanced", which is also different from other fields. If big data reconnaissance takes a very long time, then it is likely to be abandoned. Finally, since big data investigation belongs to the judicial field, it does not care about the relevant facts, but only the behavioral facts. Although the big data platform can find out the eighteenth-generation relationship of the suspect's ancestors, if the information has nothing to do with the facts of his criminal behavior, it is "relevant facts" that have no investigative value. What we care about is who the suspect contacted before and after the crime, where and what did he do, information about the facts of the behavior. For example, in today's meeting, if the big data platform only tells me that Mr. Chen Gang and I know each other, it means nothing. I need a big data platform to tell me: what time and place, what do I do with Mr. Chen Gang together. All in all, if the big data platform cannot separate the behavioral facts from the relevant facts, it must not be a successful big data investigation.

So, how to break through these "bottlenecks" of big data detection?

In fact, Mr. Chen Gang has already reflected the conclusion when he talked about video investigation just now, that is, turning "static relationship" into "dynamic process". When Mr. Chen Gang was speaking just now, I paid great attention to the details he said. When the amount of information in the video is very large, if it is only static, it will not see the detection value. Only through the "concatenation method" and "association method" allows information to dynamically display the process of behavior, which has the value of detection. Obviously, whether it is big data investigation or video investigation, they are all in the same line as the thinking method of traditional investigation, and there is no essential difference. We are most concerned with the dynamic process of behavior, not the static relationship between people. Therefore, in the future, the big data investigation platform should not tell us who knows whom, but should tell us the time of the incident, the location of the incident, who has been in contact with whom, and what has happened.

So, how to make the big data investigation platform achieve the above-mentioned target effect? This involves modeling problems.

Models are the production line of big data! Here, I first want to share with you my research experience in network jurisprudence in recent years, which may help you understand this issue. All of you here, whether you are studying big data, artificial intelligence, or network law or electronic evidence, must have heard a sentence, that is: law must be combined with technology. So, I would like to ask everyone here, who knows "how to combine law and technology"? Then, I tell you my research conclusion is: the combination of law and technology must be mediated by "behavior". That said, if you're going to do a good job of combining law and technology, be sure to study "behavior." why? the reason is simple. What technology changes is the way people behave, and when people's behavior changes, the law's rules for evaluating behaviors are about to change. If a new type of case occurs in the Internet field now, and you insist on citing existing laws and regulations, you will find that no matter how you explain it, it is inappropriate. This is the problem with the method of legal thinking. The correct way of thinking should be to first consider how network technology affects people's behavior in this new type of network case, and whether this behavior is the same as the behavior pattern assumed in the original legislation. If it is, then the original legal rules can be cited. If not, I am afraid that the use of legal hermeneutics should be considered for active interpretation, and it may even be necessary to subvert the existing legal provisions. Of course, the legal principles of legislation and judiciary are also involved here, and we have the opportunity to discuss them in depth. Then, as a part of judicial activities, investigation also applies the aforementioned legal principles. How does reconnaissance combine with technology? That is behavior. Only by extracting behavior-related information from massive data through technology can we effectively draw investigation conclusions.

To ask a problem is to solve it. Talk about how to solve the negative problems caused by the "correlation", "hybridity" and "massiveness" of big data in big data investigation. The first question, many people know that big data only has correlation, but only correlation cannot be the evidence in judicial proof. The reason is very simple. Judicial proof requires causality. Without causality, only correlation cannot become judicial evidence. So, how to get through the correlation of big data and the causality of judicial proof? The answer is mediated by behavioral relationships. For example, many Internet companies are now facing a new form of crime called "swiping". For example, brushing advertisements, brushing registration, brushing evaluation and so on. I have handled several such cases in the past two years. My thinking is divided into two parts: First, the big data of conviction must form a closed loop of behavior and establish a causal relationship. Second, big data without a closed loop of behavior and no causal relationship is used as a sentencing circumstance. This is how big data causality is established through behavior.

The second question, how to change from "promiscuous" data to "small" data with detective value? I think everyone should know that there are two main modeling methods in the field of big data: data-driven modeling and demand-driven modeling. The so-called data-driven modeling means: I don't care what specific needs you have, I only tell you what kind of conclusions you have now through big data analysis. As for whether these conclusions meet your current needs, I don't know. The main function of data-driven modeling is to predict the future. Obviously, data-driven modeling is not used in investigations. On the contrary, crime prevention is likely to use this modeling method. As for demand-driven modeling, it is better to understand that the so-called "one-click search" of the current investigation system is that its demand is: tell me what information is related to this person or this mobile phone number. However, everyone must understand a truth: the need for investigation and the need for "one-click search" are completely different. The need for "one-click search" is clear, that is, to find all relevant information about a specified person or a specified mobile phone number. The requirement of the investigation is to find the perpetrator of the case, which is an abstract and unclear requirement. Since even the requirements are not clear, it is obvious that big data investigation should not and cannot adopt the method of requirement modeling. It can be seen that data-driven modeling and demand-driven modeling, which are often used in big data in other fields, are not feasible in the field of investigation.

So, how to model big data in the field of reconnaissance?

The conclusion is mediated by behavioral characteristics. This is determined by the fact that the nature of big data investigation is to pursue behavioral facts rather than relevant facts. This brings us to a unique modeling approach to big data reconnaissance: behavioral modeling. Behavior modeling has two levels of meaning in the industry: one is the investigative model, which is actually the "technical method" of various types of cases. The investigative model is often passive, and the investigative model is used to "find" data and information only when the case requires it. It generally cannot achieve the effect of active machine investigation. The other is the criminological model, also known as the criminal behavior model, which must reflect the behavioral characteristics, behavioral patterns, and behavioral laws of a certain type of crime. With such a behavioral model, the big data platform can actively and automatically perform operations and matching, and draw conclusions of "small" data with investigative value. When we say big data investigative modeling, we mean criminological models rather than investigative models.

Secondly, the method of behavior modeling pays great attention to the decomposition of behavior elements, and the specific decomposition method is closely related to the type of case. For example, in the case of the public security "looking for" people based on things, it should pay more attention to the decomposition of the elements of behavior time and behavior space. That is, to find out who was in contact with the victim at the time and place of the crime. For cases of corruption and bribery, more attention should be paid to the decomposition of the elements of behavior objects and behavior results. Namely, find the illicit money of officials and associated bribe-payers. In any case, in order to quickly and effectively discover causal data from a large amount of seemingly related data, it is necessary to evolve from an investigative model to a criminological model.

In order to let everyone understand the charm of criminological behavior modeling more emotionally, I will give a specific successful case, which is bus big data to catch thieves. The first step in this modeling is to extract normal behavioral features: from the traffic trajectory from hotspot A to hotspot B, the vast majority of pedestrians will choose the optimal mode of transportation (the shortest time/distance, or the least transfer). ). Of course, the behavioral characteristics of different types of travel groups, such as office workers, shoppers, and tourists, should also be considered here. The second step is to establish an abnormal behavior model: if a person chooses a traffic route A->C->D->B, and the number of abnormal features is sufficient, then he is likely to be a thief. This case is a very simple and effective criminological behavioral model modeling. The police uncle catches the thief and is no longer tired.

It is worth noting that behavioral modeling talent has never been and cannot be the mythical "algorithmist" like other big data fields. The masters are at the grassroots level and among the people. This is because the people who understand the characteristics and laws of criminal behavior best are the grass-roots police, and the people who have the "wisdom" and "potential" of crime. I believe everyone has wondered how to steal something without being caught? Bus big data to catch thieves was also developed by a “civilian” research team.

Finally, just as my senior brother Pan Guanyuan said just now: Today, artificial intelligence does not refer to a technology, but to a method and thinking. Through the optimization of behavior modeling, big data investigation can be completely upgraded to active investigation and intelligent investigation, and complete the evolution to artificial intelligence. In view of this, I summarize this investigation mode as: artificial intelligence investigation based on big data!



About the author : Xie Junze , from Wenzhou, Zhejiang, is currently the secretary general of the Cybercrime and Security Research Center of Renmin University of China, and the deputy director of the Physical Evidence Technology Identification Center of Renmin University of China. Background, long-term commitment to the intersection of information and law, such as computer network forensics , electronic evidence forensics , informatization investigation , cybercrime, cybercriminal law , cybersecurity law, cyberjurisprudence, etc. .


Recommended reading:

How does the technical investigation level of criminal cases in my country compare with developed countries?

The most complete analysis of AI chip pattern

Application of big data and innovation of inspection work

View cases in a digital way to make judicial justice visible

640?wx_fmt=jpeg

640?wx_fmt=png

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325984398&siteId=291194637