ASP.NET Core 健康检查实战:不只是一个 /health 接口

张开发
2026/6/7 15:42:33 15 分钟阅读
ASP.NET Core 健康检查实战:不只是一个 /health 接口
不少.NET开发者只依赖基础 /health 端点判服务存活这是典型误区。单纯返回 200 OK仅能证明进程正常运行、路由可命中完全无法核验业务核心链路。真实生产事故里经常出现数据库断连、Redis离线、磁盘爆满但基础健康校验依旧绿灯正常最终酿成核心业务全量报错、监控无告警、容器不调度排查的静默故障。想要规避这类隐患必须落地深度贴合业务的精细化健康巡检。基础配置的核心短板初期快速上线的项目大多只用两类最简写法看似可用实则给线上稳定性埋下隐患只能虚假兜底。极简硬编码兜底写法无任何实际校验逻辑// Program.cs — 仅兜底无实际依赖校验 app.MapGet(/health, () Healthy);依托原生中间件的轻量化写法同样不核验外部依赖builder.Services.AddHealthChecks(); app.MapHealthChecks(/health);两类配置通病高度统一只巡检进程存活状态跳过数据库、缓存、第三方接口等全量核心依赖无法真实反馈服务可用度。ASP.NET Core 健康检查底层核心原理原生健康巡检能力内置在Microsoft.AspNetCore.Diagnostics.HealthChecks包中随框架预装无需额外引用。核心设计采用接口扩展模式开发者自主实现IHealthCheck按需定制各类场景专属巡检逻辑适配全业务架构。全局统一巡检接口标准仅需实现核心异步校验方法public interface IHealthCheck { TaskHealthCheckResult CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken default); }框架固定三类巡检结果状态贴合全线上故障分级场景Healthy正常全链路依赖就绪服务可完整承接业务流量Degraded降级非核心依赖异常核心业务不受影响性能小幅衰减Unhealthy不可用核心链路中断服务无法正常对外提供服务。关键调度规则全局健康状态取所有巡检项的最差结果任意核心项返回不可用端点直接兜底 503联动监控告警、容器调度策略精准触发运维联动操作。核心业务依赖一站式落地实操配置线上服务稳定性完全绑定基础依赖可用性。下面贴合生产标准提供可直接复用、无适配门槛的全场景实操配置。数据库连通性强制核验数据库是API服务核心底座优先核验连通性是健康巡检刚需。按需安装对应数据库专属巡检NuGet包即可。dotnet add package AspNetCore.HealthChecks.SqlServer dotnet add package AspNetCore.HealthChecks.NpgSql dotnet add package AspNetCore.HealthChecks.MySql批量注册多库巡检轻量化语句高效核验不拖慢接口响应builder.Services.AddHealthChecks() .AddSqlServer( connectionString: builder.Configuration.GetConnectionString(DefaultConnection)!, healthQuery: SELECT 1, name: sql-server, failureStatus: HealthStatus.Unhealthy, tags: [database, sql]) .AddNpgSql( connectionString: builder.Configuration.GetConnectionString(Postgres)!, name: postgresql, tags: [database, postgres]);实操贴士行业通用SELECT 1轻量化核验低耗时高兼容如需核验读写权限可极简查表严禁复杂聚合、联表慢查询拖累巡检链路。Redis 缓存智能降级巡检缓存异常不阻断核心业务因此统一标记为降级状态避免盲目重启容器贴合资源调度规范。dotnet add package AspNetCore.HealthChecks.Redisbuilder.Services.AddHealthChecks() .AddRedis( redisConnectionString: builder.Configuration.GetConnectionString(Redis)!, name: redis, failureStatus: HealthStatus.Degraded, tags: [cache]);外部三方HTTP接口容灾巡检支付、推送、第三方算力接口等外呼链路统一纳入降级巡检规避单一方服务拖垮全局。示例对接支付公共网关链路失效可直接替换自研内部服务地址。dotnet add package AspNetCore.HealthChecks.Urisbuilder.Services.AddHealthChecks() .AddUrlGroup( uri: new Uri(https://api.stripe.com/v1/), name: external-payment-api, failureStatus: HealthStatus.Degraded, tags: [external]);自定义业务专属健康巡检适配个性化场景通用包无法覆盖自研业务中间件、私有服务网关等场景可自主实现接口定制专属业务巡检规则贴合内部业务闭环。示例订单队列积压仓储连通性双重核验精准预判订单服务拥堵故障public classOrderServiceHealthCheck : IHealthCheck { privatereadonly IOrderRepository _orderRepository; public OrderServiceHealthCheck(IOrderRepository orderRepository) _orderRepository orderRepository; public async TaskHealthCheckResult CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken default) { try { if (!await _orderRepository.CanConnectAsync(cancellationToken)) return HealthCheckResult.Unhealthy(订单仓储链路无法连通); int pending await _orderRepository.GetPendingCountAsync(cancellationToken); if (pending 10000) return HealthCheckResult.Degraded($订单队列积压超限{pending}条待处理); return HealthCheckResult.Healthy(new Dictionarystring, object { [pending] pending }); } catch (Exception ex) { return HealthCheckResult.Unhealthy(订单服务巡检异常, ex); } } }极简注册方式一键纳入全局健康巡检体系builder.Services.AddHealthChecks() .AddCheckOrderServiceHealthCheck(order-service, HealthStatus.Unhealthy, [orders]);磁盘空间兜底巡检规避落地层突发故障线上高发隐蔽故障磁盘爆满导致日志落库失败、文件读写报错、服务进程卡死。新增磁盘余量实时巡检提前预警扩容规避突发宕机。public classDiskSpaceHealthCheck : IHealthCheck { privatereadonlylong _minFreeBytes; public DiskSpaceHealthCheck(int minMb 500) _minFreeBytes minMb * 1024 * 1024; public TaskHealthCheckResult CheckHealthAsync(HealthCheckContext context, CancellationToken ct default) { var drive DriveInfo.GetDrives().FirstOrDefault(d d.IsReady d.Name /); if (drive isnull) return Task.FromResult(HealthCheckResult.Unhealthy(未识别系统根磁盘)); var free drive.AvailableFreeSpace; var data new Dictionarystring, object { [free_mb] free / 1024 / 1024 }; if (free _minFreeBytes) return Task.FromResult(HealthCheckResult.Unhealthy(磁盘空间严重不足, data)); if (free _minFreeBytes * 2) return Task.FromResult(HealthCheckResult.Degraded(磁盘余量临近预警阈值, data)); return Task.FromResult(HealthCheckResult.Healthy(data)); } }标签分组 K8s 探针精准联动调度K8s三大核心探针分工明确严禁混用同源端点否则会引发循环重启、流量误剔除等生产事故。依托标签分组拆分独立巡检端点精准适配调度逻辑。探针类型核心用途故障联动后果Liveness 存活探针核验进程是否正常运行直接重启异常容器Readiness 就绪探针核验全链路是否可承接流量临时剔除负载均衡流量池Startup 启动探针护航慢启动服务平稳初始化延后其他探针调度检测第一步给所有巡检项绑定专属分类标签精准分组管控第二步拆分两大核心业务端点隔离调度逻辑。// 1. 按业务维度绑定标签 builder.Services.AddHealthChecks() .AddSqlService(/* 配置省略 */ tags: [ready]) .AddRedis(/* 配置省略 */ tags: [ready]) .AddCheckDiskSpaceHealthCheck(tags: [live, ready]); // 2. Liveness仅核验基础存活不校验业务依赖 app.MapHealthChecks(/health/live, new HealthCheckOptions { Predicate x x.Tags.Contains(live) }); // 3. Readiness全量核验业务依赖保障流量可用 app.MapHealthChecks(/health/ready, new HealthCheckOptions { Predicate x x.Tags.Contains(ready) });最后在集群Yaml中指向对应路径即可实现零人工干预、全自动智能容灾调度。结构化JSON响应赋能快速运维排障默认纯文本响应无实用价值自定义标准化JSON输出携带耗时、异常描述、节点标签适配监控大屏采集、日志检索溯源。app.MapHealthChecks(/health, new HealthCheckOptions { ResponseWriter async (ctx, report) { ctx.Response.ContentType application/json; var res new { status report.Status.ToString(), duration report.TotalDuration.TotalMilliseconds, checks report.Entries.Select(e new { name e.Key, status e.Value.Status.ToString(), desc e.Value.Description, cost e.Value.Duration.TotalMilliseconds }) }; await ctx.Response.WriteAsync(JsonSerializer.Serialize(res, new JsonSerializerOptions { WriteIndented true })); } });单巡检项超时管控杜绝全局链路阻塞外部接口、跨机房链路极易出现慢响应单节点超时会拖垮全量健康巡检。通过限时令牌隔离单任务卡死不联动全局。// 核心写法单次外部巡检强制3秒超时 using var timeout CancellationTokenSource.CreateLinkedTokenSource(ct); timeout.CancelAfter(3000); // 后续请求携带timeout.Token发起调用超时自动熔断不阻塞高频实操误区 标准化规避方案简单配置不算落地避开线上高频坑点才能真正筑牢稳定性防线。误区1公网暴露敏感巡检详情堆栈、连接串泄露高危漏洞。规避内网专属端点 请求头密钥鉴权拦截隔绝外网访问。误区2高频执行 heavy 巡检逻辑探针秒级轮询叠加复杂SQL压垮数据库。规避本地缓存30秒复用结果低频核验降低资源开销。误区3存活/就绪探针共用端点依赖抖动触发循环重启批量影响业务。规避强制拆分双端点各司其职不混用。误区4只测进程不测真实依赖形同虚设无法拦截静默故障。规避全量复盘历史事故全覆盖薄弱依赖节点。极简落地最佳实践清单统一规范团队开发标准开箱即用快速落地命名规范可视化、非核心依赖优先降级、挂载业务指标溯源、预发环境模拟故障压测、按生命周期合理注册巡检类。轻量化可视化UI面板一键落地观测能力无需自研大屏两行代码集成开源UI直观查看巡检趋势、节点状态。测试环境用内存存储快速搭建生产环境切换持久库留存运维日志。结语极简200兜底端点只看心跳精细化健康巡检才保生产。一小时标准化配置就能把夜间突发故障排查时长从数小时压缩到分钟级。核心落地口诀收好双探针拆分端点、全核心依赖全覆盖、结构化日志溯源、可视化兜底观测稳稳守住服务线上稳定性。注文档部分内容由 AI 生成引用① Microsoft Learn - 异步枚举与取消令牌最佳实践https://learn.microsoft.com/zh-cn/dotnet/csharp/whats-new/csharp-8#asynchronous-streams② .NET Blog - IAsyncEnumerable 设计原理与性能分析https://devblogs.microsoft.com/dotnet/asynchronous-streams-in-csharp-8/③ ASP.NET Core 文档 - 流式响应与 IAsyncEnumerable 集成https://learn.microsoft.com/zh-cn/aspnet/core/fundamentals/min-apis?viewaspnetcore-8.0#return-iasyncenumerable④ .NET 9 发布说明 - LINQ 异步流原生支持https://learn.microsoft.com/zh-cn/dotnet/core/whats-new/dotnet-9/core-libraries#linq-support-for-iasyncenumerable

更多文章