Use Case 023: Application Health Monitoring
Overview
| Property | Value |
|---|---|
| Use Case ID | UC-023 |
| Use Case Name | Application Health Monitoring |
| Module | Operations — Health Monitoring |
| Priority | High |
| Status | Implemented |
| Version | 1.0 |
| Last Updated | April 3, 2026 |
Description
This use case describes how Application Manager monitors the health and availability of registered applications through automated endpoint polling and on-demand health checks. The system provides a centralized health dashboard showing real-time status for all monitored applications, detailed per-application health statistics (uptime percentage, average response time, check history), and manual health check triggers for immediate verification. A background service performs scheduled health checks every 60 seconds, classifying applications as Healthy, Degraded, Unhealthy, or Unknown based on response time thresholds and HTTP status codes. Consecutive failure tracking enables alerting when applications exceed failure thresholds.
Actors
| Actor | Description | Role |
|---|---|---|
| Administrator | Authenticated admin monitoring application health | Primary |
| System (Background Service) | HealthCheckBackgroundService running periodic checks | Primary |
| Registered Application | Application exposing a health check endpoint | Supporting |
| Monitoring System | External system consuming health API | Secondary |
Preconditions
- Application Manager is running and accessible
- User authenticated with valid session
- At least one application registered with BaseUrl and HealthCheckEndpoint configured
- HealthCheckBackgroundService is running (started on application startup)
- IHttpClientFactory registered in DI container
Postconditions
Success Postconditions
- Health check record persisted with status, response time, and HTTP status code
- Dashboard displays current health for all applications
- Statistics calculated (uptime %, average response time, check counts)
- Consecutive failure counter updated appropriately
- Alert threshold evaluation completed
Failure Postconditions
- Application marked as Unhealthy with error message
- Consecutive failure counter incremented
- Alert logged if threshold exceeded (default: 3 consecutive failures)
- Previous healthy status preserved in history for trend analysis
Triggers
- Background service timer fires (every 60 seconds)
- Administrator navigates to health monitoring dashboard
- Administrator triggers manual health check for specific application
- External monitoring system queries health API endpoint
- Application deployment completed and verification needed
- Reported issue requires immediate health verification
Basic Flow (Happy Path — Background Health Monitoring)
Alternative Flows
Alt-1: Manual Health Check
Trigger: Administrator triggers on-demand health check
Flow:
- Administrator views application health details page
- Administrator clicks "Trigger Health Check" button
- POST
/applications/health/{applicationId}/checkwith anti-forgery token - Controller calls
PerformHealthCheckAsync(applicationId) - Service performs HTTP GET to application's health endpoint
- Health check result persisted to database
- TempData["SuccessMessage"] set with status result
- Administrator redirected to application details page
- Updated health status displayed
Alt-2: Health Dashboard Overview
Trigger: Administrator navigates to health monitoring dashboard
Flow:
- GET
/applications/health/ - Controller calls
GetAllApplicationsHealthOverviewAsync() - Service iterates all active applications
- For each application, retrieves latest health check record
- Returns
ApplicationHealthOverviewwith: name, status, response time, last checked, failure count - View displays table with status badges:
- Green: Healthy
- Yellow: Degraded
- Red: Unhealthy
- Grey: Unknown (no endpoint configured or never checked)
- Each row has "View Details" and "Trigger Check" action buttons
Alt-3: Detailed Health Statistics
Trigger: Administrator views detailed health for specific application
Flow:
- GET
/applications/health/{applicationId} - Controller calls
GetHealthStatisticsAsync(applicationId, hours: 24) - Service computes from 24-hour history:
- Total checks performed
- Healthy/Degraded/Unhealthy counts
- Uptime percentage:
(healthy + degraded) / total × 100 - Average response time (excluding Unknown status)
- Latest status and check time
- Controller calls
GetHealthHistoryAsync(applicationId, hours: 24) - View displays statistics cards and chronological check history table
Alt-4: Application Without Health Endpoint
Trigger: Application has no BaseUrl or HealthCheckEndpoint configured
Flow:
- Background service or manual check triggered
- Service detects missing health endpoint configuration
- Health check recorded with:
- Status = Unknown
- ErrorMessage = "No health check endpoint configured"
- Application appears with grey "Unknown" badge on dashboard
- No HTTP request attempted
Alt-5: Health Check Timeout
Trigger: Application's health endpoint does not respond within 10 seconds
Flow:
- Service makes HTTP GET request to health endpoint
- HttpClient timeout (10 seconds) triggers TaskCanceledException
- Service records failure:
- Status = Unhealthy
- ErrorMessage = "Health check timed out after 10 seconds"
- ConsecutiveFailures incremented
- If ConsecutiveFailures >= 3, alert logged: "ALERT: Application has consecutive failures"
Alt-6: API Health Status Query
Trigger: External monitoring system queries health API
Flow:
- GET
/applications/health/api/{applicationId} - Controller calls
GetLatestHealthStatusAsync(applicationId) - If no health data exists, returns 404 with error JSON
- Otherwise returns JSON with: applicationId, status, responseTimeMs, checkedAt, errorMessage, consecutiveFailures
- External system processes response for monitoring dashboards
Business Rules
Health Status Classification
- BR-023-01: Healthy — HTTP 2xx response with response time < 2000ms
- BR-023-02: Degraded — HTTP 2xx response with response time ≥ 2000ms
- BR-023-03: Unhealthy — HTTP error status, timeout, or network failure
- BR-023-04: Unknown — No health endpoint configured or never checked
Monitoring Rules
- BR-023-05: Background service checks every 60 seconds (after 30-second initial delay)
- BR-023-06: Health check HTTP timeout: 10 seconds
- BR-023-07: Only active applications with configured endpoints are checked automatically
- BR-023-08: Manual checks can be triggered for any application regardless of endpoint config
Failure Tracking Rules
- BR-023-09: ConsecutiveFailures incremented on each Unhealthy check
- BR-023-10: ConsecutiveFailures reset to 0 on Healthy check
- BR-023-11: Alert threshold: 3 consecutive failures (configurable)
- BR-023-12: StatusChangedAt updated only when status transitions between states
Statistics Rules
- BR-023-13: Statistics computed over 24-hour sliding window by default
- BR-023-14: Uptime percentage = (Healthy + Degraded) / Total × 100
- BR-023-15: Average response time excludes Unknown status and zero-response checks
- BR-023-16: History ordered by CheckedAt descending (most recent first)
Data Rules
- BR-023-17: Each health check creates or updates the latest record for that application
- BR-023-18: Health metadata stored as optional JSON for endpoint-specific data
- BR-023-19: URL construction: BaseUrl (trimmed trailing /) + "/" + HealthCheckEndpoint (trimmed leading /)
Data Model
ApplicationHealthCheck Entity
| Field | Type | Description |
|---|---|---|
| Id | Guid (PK) | Unique identifier |
| ApplicationId | Guid | Application being monitored |
| Status | HealthStatus | Healthy, Degraded, Unhealthy, Unknown |
| ResponseTimeMs | int | Response time in milliseconds |
| HttpStatusCode | int? | HTTP status code from endpoint |
| ErrorMessage | string? | Failure description |
| ConsecutiveFailures | int | Count of consecutive failures |
| CheckedAt | DateTimeOffset | When check occurred (UTC) |
| StatusChangedAt | DateTimeOffset? | When status last transitioned |
| HealthMetadata | string? | Additional JSON metadata |
HealthStatus Enum
| Value | Int | Description |
|---|---|---|
| Unknown | 0 | Not yet checked or no endpoint configured |
| Healthy | 1 | Responding normally |
| Degraded | 2 | Responding but slow (> 2000ms) |
| Unhealthy | 3 | Not responding or returning errors |
Supporting Classes
- HealthStatistics: ApplicationId, TotalChecks, HealthyCount, DegradedCount, UnhealthyCount, UptimePercentage, AverageResponseTimeMs, LatestStatus, LatestCheckTime
- ApplicationHealthOverview: ApplicationId, ApplicationName, DisplayName, Status, LastChecked, ResponseTimeMs, ConsecutiveFailures
API Endpoints
| Method | Route | Description |
|---|---|---|
| GET | /applications/health/ |
Health dashboard (all applications) |
| GET | /applications/health/{applicationId} |
Detailed health stats and history |
| POST | /applications/health/{applicationId}/check |
Trigger manual health check |
| GET | /applications/health/api/{applicationId} |
JSON API for health status |
Test Scenarios
| ID | Scenario | Expected Result |
|---|---|---|
| TS-023-01 | RecordSuccess sets Healthy status | Status = Healthy, ConsecutiveFailures = 0 |
| TS-023-02 | RecordDegraded sets Degraded status | Status = Degraded, ErrorMessage set |
| TS-023-03 | RecordFailure increments ConsecutiveFailures | ConsecutiveFailures++, Status = Unhealthy |
| TS-023-04 | RecordSuccess after failures resets counter | ConsecutiveFailures = 0 |
| TS-023-05 | ShouldTriggerAlert with 3+ failures | Returns true |
| TS-023-06 | ShouldTriggerAlert with < 3 failures | Returns false |
| TS-023-07 | StatusChangedAt updates on state transition | Updated when status changes |
| TS-023-08 | StatusChangedAt unchanged on same status | Not updated when status same |
| TS-023-09 | New ApplicationHealthCheck defaults | Status = Unknown, ConsecutiveFailures = 0 |
| TS-023-10 | Health dashboard loads with no applications | Empty list returned |
| TS-023-11 | Manual health check for missing application | InvalidOperationException thrown |
Dependencies
- IHttpClientFactory (for health endpoint HTTP calls)
- IRegisteredApplicationRepository (for application lookups)
- IdentityDbContext (for health check persistence)
- HealthCheckBackgroundService (for scheduled monitoring)