top of page
Food analysis

Food analysis

Data Science

Data Science

Machine Learning

Machine Learning

Welcome to Foodatascience !!!

This blog is about Food data science, here I will show you how I analyse some data sets regarding health and food and see how you can apply it to your data.

I will do exploratory data analysis, machine learning and data visualization with R and Python.

I've moved this blog here https://lsaa2014.github.io/

EDA of Pima Indians Diabetes

  • Immagine del redattore: saalaure
    saalaure
  • 25 feb 2018
  • Tempo di lettura: 2 min

Diabetes is a disease caused by a malfunctionning hormone insulin. The pancreas

releases insulin to help our body store and use carbohydrate and fat from the food

we consume. Diabetes may occur when the pancreas produces very little or no insulin,

or when our body does not respond appropriately to insulin. Even though this is beyond

the scope of this analysis, there are three main types of diabetes: type 1, type 2 and gestational.


This is an exploratory analysis of Pima Indians Diabetes data set. The data is from the

The data set consist of 768 records with 9 variables of females with at least 21 years old of Pima Indian heritage. The variables are:

1. Number of times pregnant

2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test

3. Diastolic blood pressure (mm Hg)

4. Triceps skin fold thickness (mm)

5. 2-Hour serum insulin (mu U/ml)

6. Body mass index (weight in kg/(height in m)^2)

7. Diabetes pedigree function

8. Age (years)

9. Class variable (0 or 1) (class value 1 is interpreted as "tested positive for diabetes")

The class variable is the dependent or response variable with 0 (tested negative for diabetes) and 1 (tested positive). I want to respond to few questions here:

1. Can the Diabetes pedigree function tell us something?

2. Can the Bmi, number of times pregnant influence the diabetes test?

3. Is there a relationship between Bmi and Triceps skin fold thickness?

The original data has 500 tested negatives and 268 positives after removing the missing values, we have 262 tested negatives and 130 positives.


Can the Diabetes pedigree function influence the diabetes test?

Most female tested negative for diabetes, have a diabetes pedigree function less than 0.80. The diabetes pedigree function gives a synthesis of the diabetes mellitus history in an individual’s family. So, it is most likely that a person with a high value will be tested positive for diabetes.


Can Bmi, number of times pregnant influence the diabetes test?

This plot shows that female who tested positive have a higher bmi and number times pregnant than those who tested negative.


Is there a relationship between Bmi and Triceps skin fold thickness?

This plot shows that there is a strong relationship between Bmi and triceps skin.

It is clear that as Bmi increases, triceps skin also increases. The correlation is higher for female tested negative than those who tested positive, but there are some points than could be considered as outliers.


Conclusion

From this analysis, we can see that all these parameters like Bmi, diabetes pedigree function, number of time pregnant and triceps skin can help to predict if a woman will be tested positive for diabetes or not. There were many missing values and outliers, so for a prediction this should be taking into account.

 
 
 

Comments


© 2023 by Saalaure. Proudly created with Wix.com

  • Black Facebook Icon
  • Black Twitter Icon
  • Black Pinterest Icon
  • Black Instagram Icon
bottom of page