Use Case 023: Application Health Monitoring

Overview

Property	Value
Use Case ID	UC-023
Use Case Name	Application Health Monitoring
Module	Operations — Health Monitoring
Priority	High
Status	Implemented
Version	1.0
Last Updated	April 3, 2026

Description

This use case describes how Application Manager monitors the health and availability of registered applications through automated endpoint polling and on-demand health checks. The system provides a centralized health dashboard showing real-time status for all monitored applications, detailed per-application health statistics (uptime percentage, average response time, check history), and manual health check triggers for immediate verification. A background service performs scheduled health checks every 60 seconds, classifying applications as Healthy, Degraded, Unhealthy, or Unknown based on response time thresholds and HTTP status codes. Consecutive failure tracking enables alerting when applications exceed failure thresholds.

Actors

Actor	Description	Role
Administrator	Authenticated admin monitoring application health	Primary
System (Background Service)	HealthCheckBackgroundService running periodic checks	Primary
Registered Application	Application exposing a health check endpoint	Supporting
Monitoring System	External system consuming health API	Secondary

Preconditions

Application Manager is running and accessible
User authenticated with valid session
At least one application registered with BaseUrl and HealthCheckEndpoint configured
HealthCheckBackgroundService is running (started on application startup)
IHttpClientFactory registered in DI container

Postconditions

Success Postconditions

Health check record persisted with status, response time, and HTTP status code
Dashboard displays current health for all applications
Statistics calculated (uptime %, average response time, check counts)
Consecutive failure counter updated appropriately
Alert threshold evaluation completed

Failure Postconditions

Application marked as Unhealthy with error message
Consecutive failure counter incremented
Alert logged if threshold exceeded (default: 3 consecutive failures)
Previous healthy status preserved in history for trend analysis

Triggers

Background service timer fires (every 60 seconds)
Administrator navigates to health monitoring dashboard
Administrator triggers manual health check for specific application
External monitoring system queries health API endpoint
Application deployment completed and verification needed
Reported issue requires immediate health verification

Basic Flow (Happy Path — Background Health Monitoring)

sequenceDiagram participant BG as HealthCheckBackgroundService participant Svc as ApplicationHealthMonitoringService participant Repo as IRegisteredApplicationRepository participant HTTP as HttpClient participant App as Registered Application participant DB as Identity DB Note over BG,DB: Scheduled Health Check Flow (every 60 seconds) BG->>Svc: PerformAllHealthChecksAsync() Svc->>Repo: GetAllAsync(isActive: true) Repo->>Svc: Active applications with health endpoints loop For each application with BaseUrl + HealthCheckEndpoint Svc->>HTTP: GET {BaseUrl}/{HealthCheckEndpoint} (10s timeout) HTTP->>App: HTTP GET /health App->>HTTP: 200 OK (150ms) HTTP->>Svc: Response (statusCode, responseTime) alt Response OK and responseTime < 2000ms Svc->>Svc: healthCheck.RecordSuccess(150, 200) Note over Svc: Status = Healthy, ConsecutiveFailures = 0 else Response OK but responseTime >= 2000ms Svc->>Svc: healthCheck.RecordDegraded(2500, 200, reason) Note over Svc: Status = Degraded else HTTP error or timeout Svc->>Svc: healthCheck.RecordFailure(errorMessage) Note over Svc: Status = Unhealthy, ConsecutiveFailures++ end Svc->>DB: SaveChangesAsync() Svc->>Svc: ShouldTriggerAlert(threshold: 3)? end BG->>BG: Log summary (total, healthy, degraded, unhealthy) BG->>BG: Wait 60 seconds

Alternative Flows

Alt-1: Manual Health Check

Trigger: Administrator triggers on-demand health check

Flow:

Administrator views application health details page
Administrator clicks "Trigger Health Check" button
POST /applications/health/{applicationId}/check with anti-forgery token
Controller calls PerformHealthCheckAsync(applicationId)
Service performs HTTP GET to application's health endpoint
Health check result persisted to database
TempData["SuccessMessage"] set with status result
Administrator redirected to application details page
Updated health status displayed

Alt-2: Health Dashboard Overview

Trigger: Administrator navigates to health monitoring dashboard

Flow:

GET /applications/health/
Controller calls GetAllApplicationsHealthOverviewAsync()
Service iterates all active applications
For each application, retrieves latest health check record
Returns ApplicationHealthOverview with: name, status, response time, last checked, failure count
View displays table with status badges:
- Green: Healthy
- Yellow: Degraded
- Red: Unhealthy
- Grey: Unknown (no endpoint configured or never checked)
Each row has "View Details" and "Trigger Check" action buttons

Alt-3: Detailed Health Statistics

Trigger: Administrator views detailed health for specific application

Flow:

GET /applications/health/{applicationId}
Controller calls GetHealthStatisticsAsync(applicationId, hours: 24)
Service computes from 24-hour history:
- Total checks performed
- Healthy/Degraded/Unhealthy counts
- Uptime percentage: (healthy + degraded) / total × 100
- Average response time (excluding Unknown status)
- Latest status and check time
Controller calls GetHealthHistoryAsync(applicationId, hours: 24)
View displays statistics cards and chronological check history table

Alt-4: Application Without Health Endpoint

Trigger: Application has no BaseUrl or HealthCheckEndpoint configured

Flow:

Background service or manual check triggered
Service detects missing health endpoint configuration
Health check recorded with:
- Status = Unknown
- ErrorMessage = "No health check endpoint configured"
Application appears with grey "Unknown" badge on dashboard
No HTTP request attempted

Alt-5: Health Check Timeout

Trigger: Application's health endpoint does not respond within 10 seconds

Flow:

Service makes HTTP GET request to health endpoint
HttpClient timeout (10 seconds) triggers TaskCanceledException
Service records failure:
- Status = Unhealthy
- ErrorMessage = "Health check timed out after 10 seconds"
- ConsecutiveFailures incremented
If ConsecutiveFailures >= 3, alert logged: "ALERT: Application has consecutive failures"

Alt-6: API Health Status Query

Trigger: External monitoring system queries health API

Flow:

GET /applications/health/api/{applicationId}
Controller calls GetLatestHealthStatusAsync(applicationId)
If no health data exists, returns 404 with error JSON
Otherwise returns JSON with: applicationId, status, responseTimeMs, checkedAt, errorMessage, consecutiveFailures
External system processes response for monitoring dashboards

Business Rules

Health Status Classification

BR-023-01: Healthy — HTTP 2xx response with response time < 2000ms
BR-023-02: Degraded — HTTP 2xx response with response time ≥ 2000ms
BR-023-03: Unhealthy — HTTP error status, timeout, or network failure
BR-023-04: Unknown — No health endpoint configured or never checked

Monitoring Rules

BR-023-05: Background service checks every 60 seconds (after 30-second initial delay)
BR-023-06: Health check HTTP timeout: 10 seconds
BR-023-07: Only active applications with configured endpoints are checked automatically
BR-023-08: Manual checks can be triggered for any application regardless of endpoint config

Failure Tracking Rules

BR-023-09: ConsecutiveFailures incremented on each Unhealthy check
BR-023-10: ConsecutiveFailures reset to 0 on Healthy check
BR-023-11: Alert threshold: 3 consecutive failures (configurable)
BR-023-12: StatusChangedAt updated only when status transitions between states

Statistics Rules

BR-023-13: Statistics computed over 24-hour sliding window by default
BR-023-14: Uptime percentage = (Healthy + Degraded) / Total × 100
BR-023-15: Average response time excludes Unknown status and zero-response checks
BR-023-16: History ordered by CheckedAt descending (most recent first)

Data Rules

BR-023-17: Each health check creates or updates the latest record for that application
BR-023-18: Health metadata stored as optional JSON for endpoint-specific data
BR-023-19: URL construction: BaseUrl (trimmed trailing /) + "/" + HealthCheckEndpoint (trimmed leading /)

Data Model

ApplicationHealthCheck Entity

Field	Type	Description
Id	Guid (PK)	Unique identifier
ApplicationId	Guid	Application being monitored
Status	HealthStatus	Healthy, Degraded, Unhealthy, Unknown
ResponseTimeMs	int	Response time in milliseconds
HttpStatusCode	int?	HTTP status code from endpoint
ErrorMessage	string?	Failure description
ConsecutiveFailures	int	Count of consecutive failures
CheckedAt	DateTimeOffset	When check occurred (UTC)
StatusChangedAt	DateTimeOffset?	When status last transitioned
HealthMetadata	string?	Additional JSON metadata

HealthStatus Enum

Value	Int	Description
Unknown	0	Not yet checked or no endpoint configured
Healthy	1	Responding normally
Degraded	2	Responding but slow (> 2000ms)
Unhealthy	3	Not responding or returning errors

Supporting Classes

HealthStatistics: ApplicationId, TotalChecks, HealthyCount, DegradedCount, UnhealthyCount, UptimePercentage, AverageResponseTimeMs, LatestStatus, LatestCheckTime
ApplicationHealthOverview: ApplicationId, ApplicationName, DisplayName, Status, LastChecked, ResponseTimeMs, ConsecutiveFailures

API Endpoints

Method	Route	Description
GET	`/applications/health/`	Health dashboard (all applications)
GET	`/applications/health/{applicationId}`	Detailed health stats and history
POST	`/applications/health/{applicationId}/check`	Trigger manual health check
GET	`/applications/health/api/{applicationId}`	JSON API for health status

Test Scenarios

ID	Scenario	Expected Result
TS-023-01	RecordSuccess sets Healthy status	Status = Healthy, ConsecutiveFailures = 0
TS-023-02	RecordDegraded sets Degraded status	Status = Degraded, ErrorMessage set
TS-023-03	RecordFailure increments ConsecutiveFailures	ConsecutiveFailures++, Status = Unhealthy
TS-023-04	RecordSuccess after failures resets counter	ConsecutiveFailures = 0
TS-023-05	ShouldTriggerAlert with 3+ failures	Returns true
TS-023-06	ShouldTriggerAlert with < 3 failures	Returns false
TS-023-07	StatusChangedAt updates on state transition	Updated when status changes
TS-023-08	StatusChangedAt unchanged on same status	Not updated when status same
TS-023-09	New ApplicationHealthCheck defaults	Status = Unknown, ConsecutiveFailures = 0
TS-023-10	Health dashboard loads with no applications	Empty list returned
TS-023-11	Manual health check for missing application	InvalidOperationException thrown

Dependencies

IHttpClientFactory (for health endpoint HTTP calls)
IRegisteredApplicationRepository (for application lookups)
IdentityDbContext (for health check persistence)
HealthCheckBackgroundService (for scheduled monitoring)