Azure Function Batch Processing: Remote File Handling with Python and Azure SQL

Steps in the Diagram

  1. Start: Timer trigger activates the Azure Function based on the schedule.
  2. File Download: Downloads the file from the remote URL and checks if the download is successful.
  3. Batch Processing:
    • Reads data in batches from the downloaded file.
    • Inserts each batch into the Azure SQL Database.
  4. Decision Points:
    • Checks for the success of the file download.
    • Loops through the file until all data is processed.
  5. End: Logs success or error and stops the process.

To create an Azure Function that reads files from a remote location, processes them in batches, and inserts data into an Azure SQL Database on a schedule, you can follow these steps:

Prerequisites

  1. Azure Function Tools: Install the Azure Functions Core Tools.
  2. Python Libraries:
    • requests for HTTP operations.
    • pyodbc for Azure SQL Database connectivity. Install them using pip install requests pyodbc.
  3. Azure SQL Database:
    • Ensure the table exists in your database.
    • Example table:
      CREATE TABLE SampleData (
          Id INT IDENTITY PRIMARY KEY,
          Name NVARCHAR(100),
          Age INT
      );

Step-by-Step Code

1. Set Up the Timer-Triggered Azure Function

Azure Function code for reading a file, processing, and inserting data.

Directory Structure:

MyAzureFunction/ __init__.py function.json requirements.txt

function.json (Timer Trigger Configuration):

{
  "bindings": [
    {
      "name": "timer",
      "type": "timerTrigger",
      "direction": "in",
      "schedule": "0 0 * * * *"
    }
  ]
}

The schedule 0 0 * * * * runs the function every minute (adjust as needed).


__init__.py (Function Logic):

import logging
import requests
import pyodbc
import os

# Azure SQL Database connection string (use Azure Function App settings for production)
CONNECTION_STRING = os.getenv("SQL_CONNECTION_STRING")
REMOTE_FILE_URL = os.getenv("REMOTE_FILE_URL")

def main(timer: dict):
    logging.info("Azure Function triggered.")

    # Step 1: Download the remote file
    local_file_path = "/tmp/data.csv"  # Temporary storage in Azure Function environment
    try:
        download_file(REMOTE_FILE_URL, local_file_path)
        logging.info(f"File downloaded to {local_file_path}")
    except Exception as e:
        logging.error(f"Failed to download file: {e}")
        return

    # Step 2: Process the file and insert data into Azure SQL
    try:
        process_and_insert_file(local_file_path)
        logging.info("File processed and data inserted successfully.")
    except Exception as e:
        logging.error(f"Error during file processing or database insertion: {e}")


def download_file(url, local_path):
    """Downloads a file from a remote URL."""
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(local_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
    else:
        raise Exception(f"Failed to download file. Status code: {response.status_code}")


def process_and_insert_file(file_path):
    """Reads the file in batches and inserts data into Azure SQL."""
    batch_size = 100
    batch = []

    with open(file_path, 'r') as file:
        # Skip the header
        next(file)

        for line in file:
            name, age = line.strip().split(",")  # Assuming CSV format: Name,Age
            batch.append((name, int(age)))

            if len(batch) >= batch_size:
                insert_batch_to_db(batch)
                batch = []

        # Insert any remaining rows
        if batch:
            insert_batch_to_db(batch)


def insert_batch_to_db(batch):
    """Inserts a batch of data into Azure SQL Database."""
    with pyodbc.connect(CONNECTION_STRING) as conn:
        cursor = conn.cursor()
        cursor.executemany("INSERT INTO SampleData (Name, Age) VALUES (?, ?)", batch)
        conn.commit()

requirements.txt:

azure-functions
requests
pyodbc

Step 2: Deploy the Function

  1. Publish the Function to Azure:

    func azure functionapp publish <YOUR_FUNCTION_APP_NAME>
  2. Set Environment Variables in Azure Portal:

    • SQL_CONNECTION_STRING: Your Azure SQL connection string.
    • REMOTE_FILE_URL: URL of the file to download.

Step 3: Test the End-to-End Workflow

  1. Ensure the remote file exists (e.g., a CSV file like below):

    Name,Age
    John Doe,30
    Jane Smith,25
    Bob Johnson,40
  2. Trigger the function manually or wait for the schedule:

    func azure functionapp invoke <FUNCTION_NAME>
  3. Verify the data in your Azure SQL Database:

    SELECT * FROM SampleData;

What Happens?

  1. The timer trigger activates the Azure Function on schedule.
  2. The file is downloaded from the remote URL.
  3. The file is processed in batches, and data is inserted into the Azure SQL Database.

Comments

Popular posts from this blog

Spring Boot OpenAI Integration: Step-by-Step Guide

Orchestration-Based Saga Architecture and Spring Boot Microservices Implementation Guide

Spring Boot 3 + Angular 15 + Material - Full Stack CRUD Application Example