Learn Diabetes Prediction Using Azure Machine Learning: A Complete End-to-End Example

Here is a complete end-to-end example of a Diabetes Prediction project using Azure Machine Learning (Azure ML), where we will train a machine learning model to predict whether a person has diabetes based on their health data. This guide covers the entire process, from setting up the Azure environment to deploying the model.

Explanation:

  • Azure ML Workspace: This represents the Azure ML environment where resources are managed.
  • Data Preparation: Involves uploading and splitting the diabetes dataset.
  • Model Training: The process of training the logistic regression model and evaluating it.
  • Model Registration: The trained model is registered in Azure ML for tracking and deployment.
  • Model Deployment: The model is deployed as a web service using Azure Container Instances (ACI).
  • Model Testing: Involves testing the deployed model using HTTP requests.
  • Monitoring & Cleanup: Tracking the performance of the deployed model and cleaning up resources after testing.

This diagram gives a clear overview of the components involved in the Azure ML diabetes prediction project and their interactions.


Step 1: Set Up Azure Machine Learning Workspace

1.1 Create an Azure Account:

  • If you don't already have an Azure account, sign up at Azure.

1.2 Create a New Azure ML Workspace:

  1. Go to the Azure Portal.
  2. Search for Machine Learning and click Create.
  3. Fill in the necessary details (e.g., subscription, resource group, workspace name, and region).
  4. Click Review + Create and then Create.

1.3 Install Azure ML SDK:

To interact with Azure Machine Learning from your local environment, install the Azure ML SDK.

pip install azureml-sdk

Step 2: Set Up Development Environment

2.1 Login to Azure:

Log in to your Azure account using the Azure CLI.

az login

2.2 Create a Compute Instance:

You can create a compute instance for training models in Azure.

  1. In the Azure Portal, navigate to Machine Learning > Compute > Compute Instances.
  2. Click New and select a virtual machine configuration (e.g., Standard_DS3_v2).

Step 3: Prepare the Data

We'll use the Pima Indians Diabetes Dataset for this example.

3.1 Upload Dataset to Azure:

First, download the Pima Indians Diabetes Dataset from Kaggle or use a sample dataset and upload it to Azure.

3.2 Code to Upload Dataset to Azure:

from azureml.core import Workspace, Dataset

# Load the Azure ML workspace
ws = Workspace.from_config()

# Get the default datastore (where we will upload the dataset)
datastore = ws.get_default_datastore()

# Upload the dataset from your local machine
datastore.upload_files(files=["./data/diabetes.csv"], target_path="diabetes/")

# Register the dataset
diabetes_data = Dataset.Tabular.from_delimited_files(path=(datastore, "diabetes/diabetes.csv"))
diabetes_data.register(workspace=ws, name="diabetes_dataset")

3.3 Load and Split the Data:

You can use the dataset to split it into training and test sets.

train_data, test_data = diabetes_data.random_split(percentage=0.8)

Step 4: Train the Model

4.1 Prepare the Training Script:

We’ll create a simple training script (train.py) that trains a Logistic Regression model for diabetes prediction.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import joblib

# Load the data (adjust path as needed)
data = pd.read_csv('diabetes.csv')

# Split data into features (X) and target (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Logistic Regression model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Evaluate the model
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Model Accuracy: {accuracy}")

# Save the trained model
joblib.dump(model, 'diabetes_model.pkl')

4.2 Run the Training Script:

Submit the training script as an experiment in Azure ML.

from azureml.core import Experiment, ScriptRunConfig
from azureml.core import Environment

# Define the environment (use a conda environment or custom dependencies)
env = Environment.from_conda_specification(name="diabetes-env", file_path="./environment.yml")

# Create the experiment
experiment = Experiment(workspace=ws, name="diabetes-prediction-experiment")

# Define the configuration for the experiment
config = ScriptRunConfig(source_directory="./", script="train.py", environment=env)

# Submit the experiment
run = experiment.submit(config)
run.wait_for_completion(show_output=True)

Step 5: Register the Model

Once the model is trained, we need to register it so we can use it for deployment later.

from azureml.core import Model

# Register the trained model
model = run.register_model(model_name="diabetes_model", model_path="outputs/diabetes_model.pkl")

Step 6: Deploy the Model

6.1 Prepare the Scoring Script (score.py):

Create a score.py script that will be used for scoring (inference) when the model is deployed.

import joblib
import json
import numpy as np

def init():
    global model
    model = joblib.load('diabetes_model.pkl')

def run(input_data):
    try:
        # Parse input data and make predictions
        data = np.array(json.loads(input_data)['data'])
        result = model.predict(data)
        return json.dumps(result.tolist())
    except Exception as e:
        return str(e)

6.2 Deploy the Model as a Web Service:

from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import InferenceConfig

# Create an inference configuration using the score.py script
inference_config = InferenceConfig(entry_script="score.py", environment=env)

# Define deployment configuration (e.g., Azure Container Instance)
deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

# Deploy the model as a web service
service = Model.deploy(workspace=ws, name="diabetes-prediction-service", models=[model],
                       inference_config=inference_config, deployment_config=deployment_config)
service.wait_for_deployment(True)
print(f"Service state: {service.state}")

Step 7: Test the Deployed Model

Once the model is deployed, you can test it by sending HTTP requests.

import requests
import json

# Example input data for prediction (replace with real data)
input_data = json.dumps({"data": [[6, 148, 72, 35, 0, 33.6, 0.627, 50]]})

# Set headers for the HTTP request
headers = {'Content-Type': 'application/json'}

# Make a POST request to the deployed service
response = requests.post(service.scoring_uri, data=input_data, headers=headers)

# Print the result (predicted outcome)
print(response.json())

Step 8: Monitor and Clean Up

8.1 Monitor the Web Service:

You can monitor the performance and logs of the deployed model via the Azure portal or programmatically.

8.2 Delete the Web Service (After Testing):

service.delete()

8.3 Delete the Workspace (Optional):

If you no longer need the workspace, you can delete it from the Azure portal.


Conclusion

You have now successfully completed the end-to-end diabetes prediction project using Azure Machine Learning (Azure ML), including:

  1. Setting up the Azure ML workspace.
  2. Preparing the data.
  3. Training a classification model.
  4. Registering and deploying the model as a web service.
  5. Testing the deployed model via HTTP requests.

This workflow provides a robust foundation for using Azure ML in your machine learning projects. 

Comments

Popular posts from this blog

Spring Boot OpenAI Integration: Step-by-Step Guide

Orchestration-Based Saga Architecture and Spring Boot Microservices Implementation Guide

Spring Boot 3 + Angular 15 + Material - Full Stack CRUD Application Example