Quantitative investment _ synthetic data

1. What is the synthesis of the data?
The computer that manufacturing data, instead of measuring the actual situation, the collected data, referred to as synthesis data. But also a source of synthetic data from the actual measurement data collection. Data is anonymous and give parameters to create a user-specified, as close as possible to the attribute data from the real scene. One way to create and data is to use real-world data to generate a model that can be learned from real data, you can also create a data set approximates real data attributes.
Measure the synthesis of data is the difference between the synthetic data and the actual data is reduced as much as possible.

2. The advantage of synthetic data
in most cases, to create synthetic data more efficiently than the real-world data on the phone and more economical. According to the needs, specification creation, rather than waiting to collect data when actual data occurs. Data synthesis can also add real-world data, so that even if there is no good in the real dataset example, can also be tested for every conceivable variable, which can make training Jia Xu system performance testing and the new system of organization.

3. shortcoming of synthetic data
to create high-quality synthetic data is challenging, especially in the case of complex systems, it is important to create a sound field synthesis data model is good, otherwise it generates data will be affected. If the synthetic data and the actual data set is almost not the same, then it will affect the quality of decisions based on data. Even synthetic data very well indeed, but still is a copy of real data sets specific attributes. Model look for trends to be copied, and therefore may ignore some random behavior.

4. Synthesis of data is "applicable to the particular case of any production data, it can not be measured directly."

The synthetic data to improve the accuracy and depth of learning systems training tool.

6. Synthesis completely: this data does not contain any of the original data. This means that almost impossible to re-identify any single unit, and all variables remain fully available.

7. partially synthetic: Only sensitive data would be replaced with synthetic data. This requires heavily dependent interpolation model. This results in low dependency model, but it does mean that the true value due to the retention of a data set, some of which may be public.

8. Construction of synthesized data two general strategies:
drawing from a digital distribution: This method is by observing the actual statistical distribution and the dummy data reproducing work. This also includes creating generation model.
L proxy model to implement this method on combined data, create a model to explain the observed behavior, and then using the model to reproduce the same random data. He stressed that to understand the true impact of the interaction between a system agents.

Guess you like

Origin www.cnblogs.com/noah0532/p/11494981.html