AI Enabled Heart Disease Prediction

AI Enabled Heart Disease Prediction

AI Enabled Heart Disease Prediction

University of Victoria, ECE 470 Project
January 2024 - April 2024
Machine Learning and Healthcare
University of Victoria, ECE 470 Project
January 2024 - April 2024
Machine Learning and Healthcare
University of Victoria, ECE 470 Project
January 2024 - April 2024
Machine Learning and Healthcare

Overview

As part of our ECE 470 course, my team and I developed an AI-enabled heart disease prediction model tailored for British Columbia. Our objective was to leverage advanced machine learning techniques to predict heart disease in patients, thus aiding early detection and efficient management of this prevalent health issue.

Technical Approach

System Components:

  1. Dataset: 1025 instances with 13 features obtained from a public source on Kaggle.

  2. Machine Learning Model: Logistic regression model.

  3. Software Tools: Python, Scikit-learn for model development, training, and evaluation.

Development Process:

  • Initial Research and Problem Formulation: We started by analyzing existing research and identifying key challenges in heart disease prediction. Our goal was to create a model that could accurately identify potential heart disease patients using available data.

  • Dataset Selection and Preprocessing: We selected a comprehensive dataset from Kaggle. The data was preprocessed by detecting and removing outliers, normalizing features, and splitting the dataset into training and testing sets.

  • Model Development: We developed a logistic regression model using Scikit-learn. The model was trained on the preprocessed dataset and evaluated using k-fold cross-validation to ensure it was not overfitting.

  • Feature Engineering: We conducted feature importance analysis to identify and select the most significant features contributing to heart disease prediction.

Challenges and Solutions

Dataset Selection:

  • Challenge: Finding a dataset with relevant and sufficient data for accurate prediction.

  • Solution: After evaluating multiple datasets, we chose one from Kaggle that provided a good balance of features and instances.

Outlier Detection and Data Normalization:

  • Challenge: Ensuring the dataset was clean and standardized to improve model accuracy.

  • Solution: We used z-score thresholding to identify and remove outliers and normalized the data to a standard scale.

Model Selection and Evaluation:

  • Challenge: Choosing the right model and avoiding overfitting.

  • Solution: We selected logistic regression due to its simplicity and effectiveness for binary classification. We used k-fold cross-validation to validate the model.

Project Outcomes

Current Functionality:

  • The logistic regression model achieved an 85.56% testing accuracy after preprocessing and feature engineering.

  • The model can predict the likelihood of heart disease in patients based on input features like age, sex, cholesterol levels, and blood pressure.

Limitations:

  • The model's accuracy is dependent on the quality and completeness of the input data.

  • Logistic regression, while effective, may not capture complex relationships in the data as well as more advanced models.

Future Improvements

  • Advanced Models: Exploring more complex models like Random Forest, XGBoost, or SVM to improve prediction accuracy.

  • Larger Dataset: Acquiring a larger, more diverse dataset to enhance the model's generalizability and robustness.

  • Feature Expansion: Including additional relevant features to improve the model's predictive power.

Conclusion

The AI Enabled Heart Disease Prediction project demonstrated the potential of machine learning in healthcare diagnostics. By leveraging logistic regression and thorough data preprocessing, we developed a model that can significantly aid in the early detection of heart disease, potentially improving patient outcomes in British Columbia.

Overview

As part of our ECE 470 course, my team and I developed an AI-enabled heart disease prediction model tailored for British Columbia. Our objective was to leverage advanced machine learning techniques to predict heart disease in patients, thus aiding early detection and efficient management of this prevalent health issue.

Technical Approach

System Components:

  1. Dataset: 1025 instances with 13 features obtained from a public source on Kaggle.

  2. Machine Learning Model: Logistic regression model.

  3. Software Tools: Python, Scikit-learn for model development, training, and evaluation.

Development Process:

  • Initial Research and Problem Formulation: We started by analyzing existing research and identifying key challenges in heart disease prediction. Our goal was to create a model that could accurately identify potential heart disease patients using available data.

  • Dataset Selection and Preprocessing: We selected a comprehensive dataset from Kaggle. The data was preprocessed by detecting and removing outliers, normalizing features, and splitting the dataset into training and testing sets.

  • Model Development: We developed a logistic regression model using Scikit-learn. The model was trained on the preprocessed dataset and evaluated using k-fold cross-validation to ensure it was not overfitting.

  • Feature Engineering: We conducted feature importance analysis to identify and select the most significant features contributing to heart disease prediction.

Challenges and Solutions

Dataset Selection:

  • Challenge: Finding a dataset with relevant and sufficient data for accurate prediction.

  • Solution: After evaluating multiple datasets, we chose one from Kaggle that provided a good balance of features and instances.

Outlier Detection and Data Normalization:

  • Challenge: Ensuring the dataset was clean and standardized to improve model accuracy.

  • Solution: We used z-score thresholding to identify and remove outliers and normalized the data to a standard scale.

Model Selection and Evaluation:

  • Challenge: Choosing the right model and avoiding overfitting.

  • Solution: We selected logistic regression due to its simplicity and effectiveness for binary classification. We used k-fold cross-validation to validate the model.

Project Outcomes

Current Functionality:

  • The logistic regression model achieved an 85.56% testing accuracy after preprocessing and feature engineering.

  • The model can predict the likelihood of heart disease in patients based on input features like age, sex, cholesterol levels, and blood pressure.

Limitations:

  • The model's accuracy is dependent on the quality and completeness of the input data.

  • Logistic regression, while effective, may not capture complex relationships in the data as well as more advanced models.

Future Improvements

  • Advanced Models: Exploring more complex models like Random Forest, XGBoost, or SVM to improve prediction accuracy.

  • Larger Dataset: Acquiring a larger, more diverse dataset to enhance the model's generalizability and robustness.

  • Feature Expansion: Including additional relevant features to improve the model's predictive power.

Conclusion

The AI Enabled Heart Disease Prediction project demonstrated the potential of machine learning in healthcare diagnostics. By leveraging logistic regression and thorough data preprocessing, we developed a model that can significantly aid in the early detection of heart disease, potentially improving patient outcomes in British Columbia.

Overview

As part of our ECE 470 course, my team and I developed an AI-enabled heart disease prediction model tailored for British Columbia. Our objective was to leverage advanced machine learning techniques to predict heart disease in patients, thus aiding early detection and efficient management of this prevalent health issue.

Technical Approach

System Components:

  1. Dataset: 1025 instances with 13 features obtained from a public source on Kaggle.

  2. Machine Learning Model: Logistic regression model.

  3. Software Tools: Python, Scikit-learn for model development, training, and evaluation.

Development Process:

  • Initial Research and Problem Formulation: We started by analyzing existing research and identifying key challenges in heart disease prediction. Our goal was to create a model that could accurately identify potential heart disease patients using available data.

  • Dataset Selection and Preprocessing: We selected a comprehensive dataset from Kaggle. The data was preprocessed by detecting and removing outliers, normalizing features, and splitting the dataset into training and testing sets.

  • Model Development: We developed a logistic regression model using Scikit-learn. The model was trained on the preprocessed dataset and evaluated using k-fold cross-validation to ensure it was not overfitting.

  • Feature Engineering: We conducted feature importance analysis to identify and select the most significant features contributing to heart disease prediction.

Challenges and Solutions

Dataset Selection:

  • Challenge: Finding a dataset with relevant and sufficient data for accurate prediction.

  • Solution: After evaluating multiple datasets, we chose one from Kaggle that provided a good balance of features and instances.

Outlier Detection and Data Normalization:

  • Challenge: Ensuring the dataset was clean and standardized to improve model accuracy.

  • Solution: We used z-score thresholding to identify and remove outliers and normalized the data to a standard scale.

Model Selection and Evaluation:

  • Challenge: Choosing the right model and avoiding overfitting.

  • Solution: We selected logistic regression due to its simplicity and effectiveness for binary classification. We used k-fold cross-validation to validate the model.

Project Outcomes

Current Functionality:

  • The logistic regression model achieved an 85.56% testing accuracy after preprocessing and feature engineering.

  • The model can predict the likelihood of heart disease in patients based on input features like age, sex, cholesterol levels, and blood pressure.

Limitations:

  • The model's accuracy is dependent on the quality and completeness of the input data.

  • Logistic regression, while effective, may not capture complex relationships in the data as well as more advanced models.

Future Improvements

  • Advanced Models: Exploring more complex models like Random Forest, XGBoost, or SVM to improve prediction accuracy.

  • Larger Dataset: Acquiring a larger, more diverse dataset to enhance the model's generalizability and robustness.

  • Feature Expansion: Including additional relevant features to improve the model's predictive power.

Conclusion

The AI Enabled Heart Disease Prediction project demonstrated the potential of machine learning in healthcare diagnostics. By leveraging logistic regression and thorough data preprocessing, we developed a model that can significantly aid in the early detection of heart disease, potentially improving patient outcomes in British Columbia.

Other Projects

Let's Connect!

Let's Connect!

Let's Connect!

© Copyright 2024. All rights Reserved.

Made by

Rudra Aryan Potluri

© Copyright 2024. All rights Reserved.

Made by

Rudra Aryan Potluri