Stroke Prediction Analysis

Stroke is the fifth leading cause of death for American females, the leading cause for American males, and the leading cause of serious long-term disability. 32,000 brain cells are killed per second during a stroke. Early identification of individuals at risk of stroke is crucial for implementing preventive measures and interventions, and avoiding lasting issues.

Softwares Utilized
Weka, Python

Year
Fall 2023

Primary goal:

Create a machine learning model capable of accurately predicting the likelihood of an individual experiencing a stroke based on the risk factors and demographic factors.

Risk Factors Clusters:

Behavioral – smoking, work type

Environmental – residence type (exposure to pollution)

Metabolic – high BMI, previous stroke, history of heart disease or hypertension

  • 4 in 5 strokes are preventable.

    Stroke Awareness Foundation

  • Every 40 sec one person in America has a stroke.

    Stroke Awareness Foundation

  • 795,000 Americans have a stroke yearly

    Stroke Awareness Foundation

  • $56.5 billion spent annually on stroke.

    Stroke Awareness Foundation

Data Source and Description

Open Source Dataset

  • Kaggle

  • 5,110 samples

  • 12 variables

  • 249 had a stroke

Risk Factors

Behavioral

Ever Married (Binary)

Stroke (Binary)

Hypertension (Binary)

Heart Disease (Binary)

Behavioral

Smoking Status: 

•Formerly smoked

•Never smoked

•Smokes

•Unknown

Work Type: 

•Private

•Self-Employed

•Govt Job

•Self employed

•Never worked

Profile

ID

Gender (Male, Female)

Age

Metabolic

Avg Glucose Level

BMI

Environmental

Residence Type (Urban, Rural)

Limitations

  • Method of data collection confidential (HIPAA policy)

Conclusion

Our analysis had insufficient positive stroke instances; however, we were able to identify non-stroke patients. Maintaining healthy BMI, glucose levels, and hypertension is associated with lower stroke risks, as indicated by the logistic regression and decision tree analysis.

Recommendation

Discuss with your doctor & monitor these numbers on your chart: Heart disease, Hypertension, Smoking Status, Age, BMI, Average Glucose Level.