[Disclaimer] If there is any infringement, please inform the content of this blog. Non-commercial use will be deleted. If there is any infringement, please inform me and I will delete it
If the response is not timely, or do not understand, please add me WeChat island68 QQ823173334 If you can, please indicate from CSDN
I hope to communicate with you through the platform of CSDN
Keep for own use
"Introduction to Data Science" at the end of autumn 19
1. The regression equation between the output (X, pieces) and unit cost (Y, yuan / piece) of a commodity is ^ Y = 100-1.2X, which means ().
The unit cost increases by 100 yuan for each additional production unit | The unit cost decreases by 1.2 yuan for each additional production unit | The unit cost decreases by an average of 1.2 yuan for each additional production unit
2. The relationship between variables can generally be divided into deterministic relationship and ().
Non-deterministic relationship | linear relationship | function relationship | correlation relationship
3. According to the different mapping relationship, it can be divided into linear regression and ().
Logarithmic regression | Nonlinear regression | Logistic regression | Multiple regression
4. The applicable data type for K-means clustering is ().
Numerical data | Character data | Voice data | All data
5. Clustering is a kind of ().
Supervised learning | Unsupervised learning | Reinforcement learning | Semi-supervised learning
6. In the univariate linear regression model, the residual terms follow () distribution.
Poisson | Normal | Linear | Nonlinear
7. When using least squares method to estimate the parameters of multiple linear regression, the goal is ().
Minimize variance # Minimize standard deviation | Minimize residual sum of squares | Maximize information entropy
8. For example, map the salary income attribute value to [-1,1] or [0,1] belonging to the ()
simple function transformation in the data transformation | normalization | attribute construction | dispersion of continuous attributes
9. The two tables associated with the database both store the user's personal information, but when the user's personal information changes, only the data in one table is updated. At this time, there is inconsistent data in the two tables, which belongs to ()
Outliers | Missing Values | Inconsistent Values | Duplicate Values
10. The single-layer perceptron is composed of () layer neurons.
One | two | three | four
11. BFR clustering is used to cluster data in () Euclidean space
High dimension | Middle dimension | Low dimension | Middle high dimension
12. The simplest and most basic method of clustering is ().
Partition clustering | hierarchical clustering | density clustering | distance clustering
13. The binary attribute that is only important for non-zero values is called: (), where shopping basket data belongs to this attribute.
Counting attribute | Discrete attribute | Asymmetric binary attribute # Symmetric attribute
14. A single-layer perceptron has () layers of functional neurons.
One | two | three | four
15. By constructing a new indicator-line loss rate, when it exceeds the normal range of line loss rate, it can be judged that the user of this line may have abnormal behavior such as electricity theft and leakage, which belongs to the data conversion ()
simple function conversion | normalization | attribute Construction | Continuous attribute discretization
16. In regression analysis, the independent variable is () and the dependent variable is ().
Discrete variables, discrete variables | continuous variables, discrete variables | discrete variables, continuous variables | continuous variables, continuous variables
17. Confidence (confidence) is an index to measure the interest measure ().
Simplicity | Determinism | Practicality | Novelty
18. The main task of data quality inspection is to check whether there is "dirty data" in the original data. In summary, the dirty data does not include the following ()
ordinary values | outliers | inconsistent values | duplicate values
19. A network with deviations and at least () S-type hidden layers plus one () output layer can approximate any rational number.
1, linear | 2, linear | 1, nonlinear | 2, nonlinear
20. Which of the following is not a data transformation ()
simple function transformation | normalization | attribute merging | continuous attribute discretization
21. The computational complexity of the Apriori algorithm is affected by ().
Support Threshold | Number of Items | Number of Transactions | Average Width of Transactions
22. The method of hierarchical clustering is ()
aggregation method | separation method | combination method | comparison method
23. The learning process of multi-layer perceptron includes ().
Forward propagation of signal | Back propagation of signal | Forward propagation of error | Back propagation of error
24. The method of selecting K value in K-means clustering is ().
Density classification | elbow method | thigh method | random selection
25. The following example belongs to the category of () to
detect whether there is a face in the image | classify customers according to the size of the loan risk | recognize handwritten numbers | estimate shopping mall traffic
26. The evaluation metrics of association rules mainly include: ().
Support | Confidence | Accuracy | Error rate
27. The basic elements of the k-nearest neighbor method include ().
Distance measurement | k value selection | sample size | classification decision rule
28. Under what circumstances does the node need not be divided ()
The samples contained in the current node all belong to the same category | the current attribute set is empty, or all samples have the same value on all attributes | | There are subsets that cannot be basically correctly classified
29. The basic characteristics of system log collection are ()
high availability | high reliability | scalability | high efficiency
30. The following options are BFR's objects are ()
discarded sets | temporary sets | compressed sets | retained sets
31. For multilayer perceptrons, the () layer has functional neurons with activation functions.
Input layer | Hidden layer | Output layer
32. The parameter solving method for unary regression parameter estimation is ().
Maximum likelihood method | Distance estimation method | Least square method | Euclidean distance method
33. What are the properties of data science ()
effectiveness | availability | unexpected | understandable
34. The main methods of clustering are ().
Partition clustering | hierarchical clustering | density clustering | distance clustering
35. The classification of relevance can be divided into () according to the relevant direction.
Positive correlation | Negative correlation | Left correlation | Right correlation
36. Crosstabs can help people discover the interactions between variables.
Right | wrong
37. The standard BP algorithm is an algorithm that updates parameters uniformly after reading all data sets.
Right | wrong
38. Association rules can be widely used in the fields of communications, finance, transportation, health care, and web user behavior analysis.
Right | wrong
39. When the characteristics are discrete, information gain can be used as an evaluation statistic.
Right | wrong
40. Given a data set, if there is a hyperplane S that can correctly divide part of the positive and negative instance points of the data set to both sides of the hyperplane, the data set is said to be a linearly separable data set.
Right | wrong
41. Association rules that do not satisfy a given evaluation metric are boring.
Right | wrong
42. The more similar the two objects are, the higher their dissimilarity.
Right | wrong
43. The decision tree can also represent the conditional probability distribution of a class under a given feature condition. This probability distribution is defined on a partition of the feature space. The feature space is divided into disjoint units or regions, and each unit is defined. the probability distribution of a class constitutes a conditional probability distribution
of the | wrong
44. The greater the information gain of a feature, the less important it is.
Right | wrong
45. The information entropy, the lower the purity of the sample bind
to | wrong
46. If at least one subset of a candidate set is infrequent, such candidate set is definitely infrequent based on the anti-monotonic attribute of support.
Right | wrong
47. EDA can maximize the analyst ’s insight into the dataset and the underlying structure of the dataset, and provide the analyst with all kinds of information contained in the dataset.
Right | wrong
48. As the dimension increases, the volume of the feature space increases rapidly, making the available data dense.
Right | wrong
49. In multiple linear regression models, standardized partial regression coefficients have no units.
Right | wrong
50. When classifying the decision tree, the instance of the node is forcibly divided into the category with a large conditional probability. Right
| wrong
51. The prior probability of each class can be estimated by the proportion of training records belonging to that class.
Right | wrong
52. The K-Means algorithm is density clustering.
Right | wrong
53. Association rules can be generated by enumeration.
Right | wrong
54. Acquiring data provides material and basis for data analysis. The data here includes only directly acquired data.
Right | wrong
55. The story of beer and diapers is a typical example of cluster analysis.
Right | wrong
56. The basic composition of a decision tree consists of nodes and directed edges. What are the two types of nodes and what are their meanings? And the basic idea of the decision tree?
57. What kind of problems do single-layer and multi-layer perceptrons solve?
58. What is a neural network? What are the most basic components of neural networks?