Project Practice 3: Kaggle Diabetes Prediction

1. Data introduction

        The data modeled in this chapter is the Indian diabetes database downloaded from the kaggle website.

        Data link: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

        Dataset introduction: This dataset was originally obtained from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to diagnostically predict whether a patient has diabetes based on certain diagnostic measurements contained in the dataset. All patients here are women of Indian ancestry who are at least 21 years old. The dataset consists of several medical predictor variables and a target variable. Predictor variables include the patient's number of pregnancies, BMI, insulin levels, age, etc.

Figure 1 Diabetes database (data preview) 

        The meaning of each variable in the data set is introduced as follows:

        Pregnancies: number of pregnancies

        Glucose: 2-hour plasma glucose concentration during oral glucose tolerance test

        BloodPressure: diastolic blood pressure (mm Hg)

        SkinThickness: triceps skinfold thickness (mm)

        Insulin: 2-hour serum insulin (mu U/ml)

        BMI: body mass index (weight in kilograms/(height in meters)^2)

        DiabetesPedigreeFunction: Diabetes Pedigree Function

        Age: age (years)

        Outcome: target variable (0 or 1). 268 in the data set is 1, 500 is 0, 0 means not suffering from diabetes, and 1 means suffering from diabetes.

2. Modeling steps

        (1) Read csv data

        (2) Convert string type data into floating point type

        ÿ

Guess you like

Origin blog.csdn.net/qq_36171491/article/details/124879752