[Aiqi produced]-[Nankai Computer] "Introduction to Data Science" at the end of the fall of the 19th autumn assessment, reference materials

[Disclaimer] If there is any infringement, please inform the content of this blog. Non-commercial use will be deleted. If there is any infringement, please inform me and I will delete it

If the response is not timely, or do not understand, please add me   WeChat island68 QQ823173334 If  you can, please indicate from CSDN    

I hope to communicate with you through the platform of CSDN

Keep for own use

"Introduction to Data Science" at the end of autumn 19

1. The regression equation between the output (X, pieces) and unit cost (Y, yuan / piece) of a commodity is ^ Y = 100-1.2X, which means ().
The unit cost increases by 100 yuan for each additional production unit | The unit cost decreases by 1.2 yuan for each additional production unit | The unit cost decreases by an average of 1.2 yuan for each additional production unit

2. The relationship between variables can generally be divided into deterministic relationship and ().
Non-deterministic relationship | linear relationship | function relationship | correlation relationship

3. According to the different mapping relationship, it can be divided into linear regression and ().
Logarithmic regression | Nonlinear regression | Logistic regression | Multiple regression

4. The applicable data type for K-means clustering is ().
Numerical data | Character data | Voice data | All data

5. Clustering is a kind of ().
Supervised learning | Unsupervised learning | Reinforcement learning | Semi-supervised learning

6. In the univariate linear regression model, the residual terms follow () distribution.
Poisson | Normal | Linear | Nonlinear

7. When using least squares method to estimate the parameters of multiple linear regression, the goal is ().
Minimize variance # Minimize standard deviation | Minimize residual sum of squares | Maximize information entropy

8. For example, map the salary income attribute value to [-1,1] or [0,1] belonging to the ()
simple function transformation in the data transformation | normalization | attribute construction | dispersion of continuous attributes

9. The two tables associated with the database both store the user's personal information, but when the user's personal information changes, only the data in one table is updated. At this time, there is inconsistent data in the two tables, which belongs to ()
Outliers | Missing Values ​​| Inconsistent Values ​​| Duplicate Values

10. The single-layer perceptron is composed of () layer neurons.
One | two | three | four

11. BFR clustering is used to cluster data in () Euclidean space
High dimension | Middle dimension | Low dimension | Middle high dimension

12. The simplest and most basic method of clustering is ().
Partition clustering | hierarchical clustering | density clustering | distance clustering

13. The binary attribute that is only important for non-zero values ​​is called: (), where shopping basket data belongs to this attribute.
Counting attribute | Discrete attribute | Asymmetric binary attribute # Symmetric attribute

14. A single-layer perceptron has () layers of functional neurons.
One | two | three | four

15. By constructing a new indicator-line loss rate, when it exceeds the normal range of line loss rate, it can be judged that the user of this line may have abnormal behavior such as electricity theft and leakage, which belongs to the data conversion ()
simple function conversion | normalization | attribute Construction | Continuous attribute discretization

16. In regression analysis, the independent variable is () and the dependent variable is ().
Discrete variables, discrete variables | continuous variables, discrete variables | discrete variables, continuous variables | continuous variables, continuous variables

17. Confidence (confidence) is an index to measure the interest measure ().
Simplicity | Determinism | Practicality | Novelty

18. The main task of data quality inspection is to check whether there is "dirty data" in the original data. In summary, the dirty data does not include the following ()
ordinary values ​​| outliers | inconsistent values ​​| duplicate values

19. A network with deviations and at least () S-type hidden layers plus one () output layer can approximate any rational number.
1, linear | 2, linear | 1, nonlinear | 2, nonlinear

20. Which of the following is not a data transformation ()
simple function transformation | normalization | attribute merging | continuous attribute discretization

21. The computational complexity of the Apriori algorithm is affected by ().
Support Threshold | Number of Items | Number of Transactions | Average Width of Transactions

22. The method of hierarchical clustering is ()
aggregation method | separation method | combination method | comparison method

23. The learning process of multi-layer perceptron includes ().
Forward propagation of signal | Back propagation of signal | Forward propagation of error | Back propagation of error

24. The method of selecting K value in K-means clustering is ().
Density classification | elbow method | thigh method | random selection

25. The following example belongs to the category of () to
detect whether there is a face in the image | classify customers according to the size of the loan risk | recognize handwritten numbers | estimate shopping mall traffic

26. The evaluation metrics of association rules mainly include: ().
Support | Confidence | Accuracy | Error rate

27. The basic elements of the k-nearest neighbor method include ().
Distance measurement | k value selection | sample size | classification decision rule

28. Under what circumstances does the node need not be divided ()
The samples contained in the current node all belong to the same category | the current attribute set is empty, or all samples have the same value on all attributes | | There are subsets that cannot be basically correctly classified

29. The basic characteristics of system log collection are ()
high availability | high reliability | scalability | high efficiency

30. The following options are BFR's objects are ()
discarded sets | temporary sets | compressed sets | retained sets

31. For multilayer perceptrons, the () layer has functional neurons with activation functions.
Input layer | Hidden layer | Output layer

32. The parameter solving method for unary regression parameter estimation is ().
Maximum likelihood method | Distance estimation method | Least square method | Euclidean distance method

33. What are the properties of data science ()
effectiveness | availability | unexpected | understandable

34. The main methods of clustering are ().
Partition clustering | hierarchical clustering | density clustering | distance clustering

35. The classification of relevance can be divided into () according to the relevant direction.
Positive correlation | Negative correlation | Left correlation | Right correlation

36. Crosstabs can help people discover the interactions between variables.
Right | wrong

37. The standard BP algorithm is an algorithm that updates parameters uniformly after reading all data sets.
Right | wrong

38. Association rules can be widely used in the fields of communications, finance, transportation, health care, and web user behavior analysis.
Right | wrong

39. When the characteristics are discrete, information gain can be used as an evaluation statistic.
Right | wrong

40. Given a data set, if there is a hyperplane S that can correctly divide part of the positive and negative instance points of the data set to both sides of the hyperplane, the data set is said to be a linearly separable data set.
Right | wrong

41. Association rules that do not satisfy a given evaluation metric are boring.
Right | wrong

42. The more similar the two objects are, the higher their dissimilarity.
Right | wrong

43. The decision tree can also represent the conditional probability distribution of a class under a given feature condition. This probability distribution is defined on a partition of the feature space. The feature space is divided into disjoint units or regions, and each unit is defined. the probability distribution of a class constitutes a conditional probability distribution
of the | wrong

44. The greater the information gain of a feature, the less important it is.
Right | wrong

45. The information entropy, the lower the purity of the sample bind
to | wrong

46. ​​If at least one subset of a candidate set is infrequent, such candidate set is definitely infrequent based on the anti-monotonic attribute of support.
Right | wrong

47. EDA can maximize the analyst ’s insight into the dataset and the underlying structure of the dataset, and provide the analyst with all kinds of information contained in the dataset.
Right | wrong

48. As the dimension increases, the volume of the feature space increases rapidly, making the available data dense.
Right | wrong

49. In multiple linear regression models, standardized partial regression coefficients have no units.
Right | wrong

50. When classifying the decision tree, the instance of the node is forcibly divided into the category with a large conditional probability. Right
| wrong

51. The prior probability of each class can be estimated by the proportion of training records belonging to that class.
Right | wrong

52. The K-Means algorithm is density clustering.
Right | wrong

53. Association rules can be generated by enumeration.
Right | wrong

54. Acquiring data provides material and basis for data analysis. The data here includes only directly acquired data.
Right | wrong

55. The story of beer and diapers is a typical example of cluster analysis.
Right | wrong

56. The basic composition of a decision tree consists of nodes and directed edges. What are the two types of nodes and what are their meanings? And the basic idea of ​​the decision tree?


57. What kind of problems do single-layer and multi-layer perceptrons solve?


58. What is a neural network? What are the most basic components of neural networks?


 

Published 96 original articles · praised 7 · 20,000+ views

Guess you like

Origin blog.csdn.net/island33/article/details/105113899