Stroke Prediction Analysis
Stroke is the fifth leading cause of death for American females, the leading cause for American males, and the leading cause of serious long-term disability. 32,000 brain cells are killed per second during a stroke. Early identification of individuals at risk of stroke is crucial for implementing preventive measures and interventions, and avoiding lasting issues.
Softwares Utilized
Weka, Python
Fall 2023
Primary goal:
Create a machine learning model capable of accurately predicting the likelihood of an individual experiencing a stroke based on the risk factors and demographic factors.
Risk Factors Clusters:
•Behavioral – smoking, work type
•Environmental – residence type (exposure to pollution)
•Metabolic – high BMI, previous stroke, history of heart disease or hypertension
4 in 5 strokes are preventable.
Stroke Awareness Foundation
Every 40 sec one person in America has a stroke.
Stroke Awareness Foundation
795,000 Americans have a stroke yearly
Stroke Awareness Foundation
$56.5 billion spent annually on stroke.
Stroke Awareness Foundation
Data Source and Description
Open Source Dataset
5,110 samples
12 variables
249 had a stroke
Risk Factors
Ever Married (Binary)
Stroke (Binary)
Hypertension (Binary)
Heart Disease (Binary)
Smoking Status:
•Formerly smoked
•Never smoked
Work Type:
•Govt Job
•Self employed
•Never worked
Gender (Male, Female)
Avg Glucose Level
Residence Type (Urban, Rural)
Method of data collection confidential (HIPAA policy)
Our analysis had insufficient positive stroke instances; however, we were able to identify non-stroke patients. Maintaining healthy BMI, glucose levels, and hypertension is associated with lower stroke risks, as indicated by the logistic regression and decision tree analysis.
Discuss with your doctor & monitor these numbers on your chart: Heart disease, Hypertension, Smoking Status, Age, BMI, Average Glucose Level.