Use Case 023: Application Health Monitoring

Overview

Property Value
Use Case ID UC-023
Use Case Name Application Health Monitoring
Module Operations — Health Monitoring
Priority High
Status Implemented
Version 1.0
Last Updated April 3, 2026

Description

This use case describes how Application Manager monitors the health and availability of registered applications through automated endpoint polling and on-demand health checks. The system provides a centralized health dashboard showing real-time status for all monitored applications, detailed per-application health statistics (uptime percentage, average response time, check history), and manual health check triggers for immediate verification. A background service performs scheduled health checks every 60 seconds, classifying applications as Healthy, Degraded, Unhealthy, or Unknown based on response time thresholds and HTTP status codes. Consecutive failure tracking enables alerting when applications exceed failure thresholds.

Actors

Actor Description Role
Administrator Authenticated admin monitoring application health Primary
System (Background Service) HealthCheckBackgroundService running periodic checks Primary
Registered Application Application exposing a health check endpoint Supporting
Monitoring System External system consuming health API Secondary

Preconditions

  1. Application Manager is running and accessible
  2. User authenticated with valid session
  3. At least one application registered with BaseUrl and HealthCheckEndpoint configured
  4. HealthCheckBackgroundService is running (started on application startup)
  5. IHttpClientFactory registered in DI container

Postconditions

Success Postconditions

  1. Health check record persisted with status, response time, and HTTP status code
  2. Dashboard displays current health for all applications
  3. Statistics calculated (uptime %, average response time, check counts)
  4. Consecutive failure counter updated appropriately
  5. Alert threshold evaluation completed

Failure Postconditions

  1. Application marked as Unhealthy with error message
  2. Consecutive failure counter incremented
  3. Alert logged if threshold exceeded (default: 3 consecutive failures)
  4. Previous healthy status preserved in history for trend analysis

Triggers

  • Background service timer fires (every 60 seconds)
  • Administrator navigates to health monitoring dashboard
  • Administrator triggers manual health check for specific application
  • External monitoring system queries health API endpoint
  • Application deployment completed and verification needed
  • Reported issue requires immediate health verification

Basic Flow (Happy Path — Background Health Monitoring)

sequenceDiagram participant BG as HealthCheckBackgroundService participant Svc as ApplicationHealthMonitoringService participant Repo as IRegisteredApplicationRepository participant HTTP as HttpClient participant App as Registered Application participant DB as Identity DB Note over BG,DB: Scheduled Health Check Flow (every 60 seconds) BG->>Svc: PerformAllHealthChecksAsync() Svc->>Repo: GetAllAsync(isActive: true) Repo->>Svc: Active applications with health endpoints loop For each application with BaseUrl + HealthCheckEndpoint Svc->>HTTP: GET {BaseUrl}/{HealthCheckEndpoint} (10s timeout) HTTP->>App: HTTP GET /health App->>HTTP: 200 OK (150ms) HTTP->>Svc: Response (statusCode, responseTime) alt Response OK and responseTime < 2000ms Svc->>Svc: healthCheck.RecordSuccess(150, 200) Note over Svc: Status = Healthy, ConsecutiveFailures = 0 else Response OK but responseTime >= 2000ms Svc->>Svc: healthCheck.RecordDegraded(2500, 200, reason) Note over Svc: Status = Degraded else HTTP error or timeout Svc->>Svc: healthCheck.RecordFailure(errorMessage) Note over Svc: Status = Unhealthy, ConsecutiveFailures++ end Svc->>DB: SaveChangesAsync() Svc->>Svc: ShouldTriggerAlert(threshold: 3)? end BG->>BG: Log summary (total, healthy, degraded, unhealthy) BG->>BG: Wait 60 seconds

Alternative Flows

Alt-1: Manual Health Check

Trigger: Administrator triggers on-demand health check

Flow:

  1. Administrator views application health details page
  2. Administrator clicks "Trigger Health Check" button
  3. POST /applications/health/{applicationId}/check with anti-forgery token
  4. Controller calls PerformHealthCheckAsync(applicationId)
  5. Service performs HTTP GET to application's health endpoint
  6. Health check result persisted to database
  7. TempData["SuccessMessage"] set with status result
  8. Administrator redirected to application details page
  9. Updated health status displayed

Alt-2: Health Dashboard Overview

Trigger: Administrator navigates to health monitoring dashboard

Flow:

  1. GET /applications/health/
  2. Controller calls GetAllApplicationsHealthOverviewAsync()
  3. Service iterates all active applications
  4. For each application, retrieves latest health check record
  5. Returns ApplicationHealthOverview with: name, status, response time, last checked, failure count
  6. View displays table with status badges:
    • Green: Healthy
    • Yellow: Degraded
    • Red: Unhealthy
    • Grey: Unknown (no endpoint configured or never checked)
  7. Each row has "View Details" and "Trigger Check" action buttons

Alt-3: Detailed Health Statistics

Trigger: Administrator views detailed health for specific application

Flow:

  1. GET /applications/health/{applicationId}
  2. Controller calls GetHealthStatisticsAsync(applicationId, hours: 24)
  3. Service computes from 24-hour history:
    • Total checks performed
    • Healthy/Degraded/Unhealthy counts
    • Uptime percentage: (healthy + degraded) / total × 100
    • Average response time (excluding Unknown status)
    • Latest status and check time
  4. Controller calls GetHealthHistoryAsync(applicationId, hours: 24)
  5. View displays statistics cards and chronological check history table

Alt-4: Application Without Health Endpoint

Trigger: Application has no BaseUrl or HealthCheckEndpoint configured

Flow:

  1. Background service or manual check triggered
  2. Service detects missing health endpoint configuration
  3. Health check recorded with:
    • Status = Unknown
    • ErrorMessage = "No health check endpoint configured"
  4. Application appears with grey "Unknown" badge on dashboard
  5. No HTTP request attempted

Alt-5: Health Check Timeout

Trigger: Application's health endpoint does not respond within 10 seconds

Flow:

  1. Service makes HTTP GET request to health endpoint
  2. HttpClient timeout (10 seconds) triggers TaskCanceledException
  3. Service records failure:
    • Status = Unhealthy
    • ErrorMessage = "Health check timed out after 10 seconds"
    • ConsecutiveFailures incremented
  4. If ConsecutiveFailures >= 3, alert logged: "ALERT: Application has consecutive failures"

Alt-6: API Health Status Query

Trigger: External monitoring system queries health API

Flow:

  1. GET /applications/health/api/{applicationId}
  2. Controller calls GetLatestHealthStatusAsync(applicationId)
  3. If no health data exists, returns 404 with error JSON
  4. Otherwise returns JSON with: applicationId, status, responseTimeMs, checkedAt, errorMessage, consecutiveFailures
  5. External system processes response for monitoring dashboards

Business Rules

Health Status Classification

  • BR-023-01: Healthy — HTTP 2xx response with response time < 2000ms
  • BR-023-02: Degraded — HTTP 2xx response with response time ≥ 2000ms
  • BR-023-03: Unhealthy — HTTP error status, timeout, or network failure
  • BR-023-04: Unknown — No health endpoint configured or never checked

Monitoring Rules

  • BR-023-05: Background service checks every 60 seconds (after 30-second initial delay)
  • BR-023-06: Health check HTTP timeout: 10 seconds
  • BR-023-07: Only active applications with configured endpoints are checked automatically
  • BR-023-08: Manual checks can be triggered for any application regardless of endpoint config

Failure Tracking Rules

  • BR-023-09: ConsecutiveFailures incremented on each Unhealthy check
  • BR-023-10: ConsecutiveFailures reset to 0 on Healthy check
  • BR-023-11: Alert threshold: 3 consecutive failures (configurable)
  • BR-023-12: StatusChangedAt updated only when status transitions between states

Statistics Rules

  • BR-023-13: Statistics computed over 24-hour sliding window by default
  • BR-023-14: Uptime percentage = (Healthy + Degraded) / Total × 100
  • BR-023-15: Average response time excludes Unknown status and zero-response checks
  • BR-023-16: History ordered by CheckedAt descending (most recent first)

Data Rules

  • BR-023-17: Each health check creates or updates the latest record for that application
  • BR-023-18: Health metadata stored as optional JSON for endpoint-specific data
  • BR-023-19: URL construction: BaseUrl (trimmed trailing /) + "/" + HealthCheckEndpoint (trimmed leading /)

Data Model

ApplicationHealthCheck Entity

Field Type Description
Id Guid (PK) Unique identifier
ApplicationId Guid Application being monitored
Status HealthStatus Healthy, Degraded, Unhealthy, Unknown
ResponseTimeMs int Response time in milliseconds
HttpStatusCode int? HTTP status code from endpoint
ErrorMessage string? Failure description
ConsecutiveFailures int Count of consecutive failures
CheckedAt DateTimeOffset When check occurred (UTC)
StatusChangedAt DateTimeOffset? When status last transitioned
HealthMetadata string? Additional JSON metadata

HealthStatus Enum

Value Int Description
Unknown 0 Not yet checked or no endpoint configured
Healthy 1 Responding normally
Degraded 2 Responding but slow (> 2000ms)
Unhealthy 3 Not responding or returning errors

Supporting Classes

  • HealthStatistics: ApplicationId, TotalChecks, HealthyCount, DegradedCount, UnhealthyCount, UptimePercentage, AverageResponseTimeMs, LatestStatus, LatestCheckTime
  • ApplicationHealthOverview: ApplicationId, ApplicationName, DisplayName, Status, LastChecked, ResponseTimeMs, ConsecutiveFailures

API Endpoints

Method Route Description
GET /applications/health/ Health dashboard (all applications)
GET /applications/health/{applicationId} Detailed health stats and history
POST /applications/health/{applicationId}/check Trigger manual health check
GET /applications/health/api/{applicationId} JSON API for health status

Test Scenarios

ID Scenario Expected Result
TS-023-01 RecordSuccess sets Healthy status Status = Healthy, ConsecutiveFailures = 0
TS-023-02 RecordDegraded sets Degraded status Status = Degraded, ErrorMessage set
TS-023-03 RecordFailure increments ConsecutiveFailures ConsecutiveFailures++, Status = Unhealthy
TS-023-04 RecordSuccess after failures resets counter ConsecutiveFailures = 0
TS-023-05 ShouldTriggerAlert with 3+ failures Returns true
TS-023-06 ShouldTriggerAlert with < 3 failures Returns false
TS-023-07 StatusChangedAt updates on state transition Updated when status changes
TS-023-08 StatusChangedAt unchanged on same status Not updated when status same
TS-023-09 New ApplicationHealthCheck defaults Status = Unknown, ConsecutiveFailures = 0
TS-023-10 Health dashboard loads with no applications Empty list returned
TS-023-11 Manual health check for missing application InvalidOperationException thrown

Dependencies

  • IHttpClientFactory (for health endpoint HTTP calls)
  • IRegisteredApplicationRepository (for application lookups)
  • IdentityDbContext (for health check persistence)
  • HealthCheckBackgroundService (for scheduled monitoring)