Guardrails-AI — LLM 输入输出安全防护框架

GitHub: guardrails-ai/guardrails
Stars: 7,000+ | License: Apache-2.0 | 3200+ Commits
官网: guardrailsai.com | Hub: guardrailsai.com/hub

项目速览

Guardrails-AI 是一个为 LLM 应用添加安全护栏的 Python 框架。它的核心理念是”为 LLM 设置交通规则”——就像高速公路上的护栏防止车辆冲出道路一样，Guardrails-AI 在 LLM 的输入和输出两端设置可组合的风险检测规则（称为 Validator），在问题内容进入 LLM 之前拦截风险输入，在回答返回用户之前过滤不安全输出。

项目的独特之处在于 Guardrails Hub 生态——一个包含 70+ 预置校验器的开放市场，覆盖品牌风险、辱骂检测、事实性验证、格式校验、代码注入防护、数据泄露检测、越狱攻击防御等 8 大风险类别。每个校验器都是独立的 Python 包，通过 guardrails hub install 按需安装，也可以提交自定义校验器到 Hub。

框架另一核心功能是 Pydantic 结构化生成：通过 Guard.for_pydantic() 方法，开发者可以用 Pydantic Model 定义期望的输出 Schema，Guardrails-AI 自动将 Schema 注入 Prompt 并校验 LLM 输出是否符合定义的结构。此外，Guardrails-AI 可部署为独立的 Flask REST 服务（通过 Docker + Gunicorn 支持生产环境），提供与 OpenAI SDK 兼容的 API 端点。

截至 2026 年 6 月（v0.10.2），Guardrails-AI 累计超过 3200 次提交和 68 个版本发布，GitHub Star 数超过 7,000。

功能概述

Guardrails Hub：70+ 校验器

Guardrails Hub 是框架的核心生态，提供按风险类别组织的可组合校验器：

风险类别	校验器示例	说明
Brand Risk（品牌风险）	CompetitorCheck、BiasCheck、BanList、DetectJailbreak	防止提及竞品、检测偏见、拦截越狱攻击
Etiquette（礼仪规范）	ToxicLanguage、ProfanityFree、RestrictToTopic、PolitenessCheck	毒性语言过滤、脏话检测、话题限制
Factuality（事实性）	ProvenanceLLM、GroundedAIHallucination、SaliencyCheck	幻觉检测、来源验证、事实一致性
Formatting（格式校验）	RegexMatch、ValidJSON、ValidURL、ValidAddress	正则匹配、JSON/URL/地址格式验证
Code Exploits（代码漏洞）	EndpointIsReachable、HasUrl、WebSanitization	端点可达性检查、URL 风险检测
Data Leakage（数据泄露）	DetectPII、GuardrailsPII	检测身份证号、电话、邮箱等个人身份信息
Invalid Code（无效代码）	ValidPython、ValidSQL	Python/SQL 代码语法校验
Jailbreaking（越狱攻击）	PromptInjectionDetector、QARelEvanceLLMEval	Prompt 注入检测、QA 相关性验证

安装校验器：

guardrails hub install hub://guardrails/regex_match
guardrails hub install hub://guardrails/toxic_language
guardrails hub install hub://guardrails/competitor_check
guardrails hub install hub://guardrails/detect_pii

输入输出 Guard

Guard 是框架的核心抽象，通过 Guard().use() 链式组合多个校验器。每个校验器支持三种失败策略（OnFailAction）：

OnFailAction.EXCEPTION：检测到风险时抛出异常，阻断请求
OnFailAction.FIX：尝试自动修复（如重新生成回答）
OnFailAction.REASK：让 LLM 重新回答并再次验证

基础输入 Guard——使用正则匹配校验电话号码格式：

from guardrails import Guard, OnFailAction
from guardrails.hub import RegexMatch

guard = Guard().use(
    RegexMatch,
    regex="\\(?\\d{3}\\)?-? *\\d{3}-? *-?\\d{4}",
    on_fail=OnFailAction.EXCEPTION
)

guard.validate("123-456-7890")  # 通过
# guard.validate("abc-def-ghij")  # 抛出异常

多校验器组合——同时检测竞品提及和毒性语言：

from guardrails.hub import CompetitorCheck, ToxicLanguage

guard = Guard().use(
    CompetitorCheck(
        ["Apple", "Microsoft", "Google"],
        on_fail=OnFailAction.EXCEPTION
    ),
    ToxicLanguage(
        threshold=0.5,
        validation_method="sentence",
        on_fail=OnFailAction.EXCEPTION
    )
)

# 通过——未提及竞品且无毒性语言
guard.validate("An apple a day keeps a doctor away.")

# 两个校验器均失败——提及 Apple 且包含毒性语言
try:
    guard.validate("Shut the hell up! Apple just released a new iPhone.")
except Exception as e:
    print(e)

Pydantic 结构化生成

Guardrails-AI 的另一核心能力是驱动 LLM 生成符合预定义 Schema 的结构化输出。开发者通过 Pydantic Model 描述期望的 JSON 结构，框架自动将 Schema 嵌入 Prompt 并解析校验 LLM 的返回：

from pydantic import BaseModel, Field
from guardrails import Guard
import openai

class Pet(BaseModel):
    pet_type: str = Field(description="Species of pet")
    name: str = Field(description="a unique pet name")

prompt = """
    What kind of pet should I get and what should I name it?
    ${gr.complete_json_suffix_v2}
"""
guard = Guard.for_pydantic(output_class=Pet, prompt=prompt)

raw_output, validated_output, *rest = guard(
    llm_api=openai.completions.create,
    engine="gpt-3.5-turbo-instruct"
)

print(validated_output)
# Pet(pet_type='dog', name='Buddy')

${gr.complete_json_suffix_v2} 是一个特殊占位符，会被替换为 JSON 格式指令。guard() 调用返回三个值：原始 LLM 输出、经 Pydantic 校验的结构化对象及额外元数据。

独立服务部署

Guardrails-AI 可部署为独立的 Flask REST 服务，提供与 OpenAI API 兼容的端点：

# 创建 guard 配置
guardrails create --validators=hub://guardrails/two_words --guard-name=two-word-guard

# 启动服务
guardrails start --config=./config.py

然后通过 OpenAI SDK 透明调用：

import openai
openai.base_url = "http://localhost:8000/guards/two-word-guard/openai/v1/"

completion = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "tell me about an apple with 3 words exactly"}],
)

生产环境推荐使用 Docker + Gunicorn 部署，配置文件通过 config.py 管理 Guard 组合和校验器参数。

适用场景

场景	说明
客服机器人安全防护	在客服对话中检测竞品提及、品牌负面言论和毒性语言
内容审核管道	对 UGC 平台中 LLM 生成的内容做毒性、偏见、NSFW 检测
金融合规	确保 AI 理财顾问的输出不包含内幕信息、虚假承诺或误导性陈述
医疗应用	检测输出中的 PII 泄露，确保不透露患者隐私信息
教育场景	限制 LLM 仅回答指定学科范围内的问题，阻断越狱尝试
API 安全网关	将 Guardrails 部署为 LLM API 前置网关，统一做输入输出安全校验
结构化数据抽取	通过 Pydantic 约束确保 LLM 从非结构化文本中提取的信息格式正确

快速上手

安装

pip install guardrails-ai

配置 CLI

guardrails configure

第一个 Guard——检测 PII 泄露

# 安装 PII 检测校验器
guardrails hub install hub://guardrails/detect_pii

from guardrails import Guard, OnFailAction
from guardrails.hub import DetectPII

guard = Guard().use(
    DetectPII, on_fail=OnFailAction.EXCEPTION
)

# 包含身份证号的输出——失败
try:
    guard.validate("用户的身份证号是 110101199001011234")
except Exception as e:
    print(e)  # 检测到 PII 泄露

# 不包含 PII 的输出——通过
guard.validate("用户来自北京市，出生于1990年。")

第一个 Guard——结构化输出

from pydantic import BaseModel, Field
from guardrails import Guard
import openai

class MovieReview(BaseModel):
    title: str = Field(description="电影名称")
    rating: float = Field(description="评分，0-10 之间")
    summary: str = Field(description="一句话剧情总结")
    sentiment: str = Field(description="情感倾向: positive, neutral, 或 negative")

prompt = """
    Review the following movie: ${movie_description}
    ${gr.complete_json_suffix_v2}
"""
guard = Guard.for_pydantic(output_class=MovieReview, prompt=prompt)

raw, validated, *rest = guard(
    llm_api=openai.completions.create,
    engine="gpt-3.5-turbo-instruct",
    prompt_params={"movie_description": "A futuristic tale where AI learns to love humanity."}
)

print(validated)
# MovieReview(title='Wall-E', rating=8.5, summary='...', sentiment='positive')

源码架构

Guardrails-AI 以 Python 为核心语言（99.7%），通过 Poetry 管理依赖：

guardrails/
├── guardrails/               # 核心包
│   ├── guard.py              # Guard 核心类（.use(), .validate(), for_pydantic()）
│   ├── validators/           # 校验器抽象与内置校验器
│   ├── run/                  # Runner 引擎（异步执行、OnFailAction 处理）
│   ├── schema/               # JSON Schema 处理与验证
│   ├── prompt/               # Prompt 模板引擎
│   ├── llm_providers/        # LLM 提供商适配器
│   ├── cli/                  # CLI 命令（configure, create, start, hub）
│   ├── server/               # Flask REST 服务
│   └── hub/                  # Hub 安装器与注册表
├── docs/                     # 文档
├── tests/                    # 测试套件
├── server_ci/                # 服务 CI 配置
├── Makefile                  # 开发命令
├── pyproject.toml            # 项目配置
└── README.md

核心设计模式：

Validator 插件化架构：每个校验器是独立的 Python 包，通过 guardrails/hub 目录下的命名空间包机制安装。校验器只需实现 validate() 方法，接收 LLM 输出并返回 ValidationResult（pass/fail + 详细错误信息）。Hub 上的 70+ 校验器均遵循此协议。
三阶段处理管道：
- Input Guard：校验器在 LLM 调用前检查用户输入，拦截风险内容
- LLM Generation：通过 Prompt 注入（${gr.complete_json_suffix_v2}）引导 LLM 生成符合 Schema 的输出
- Output Guard：校验器检查 LLM 的原始输出，根据 OnFailAction 策略决定抛出异常、自动修复还是要求 LLM 重新生成
OnFailAction 策略模式：EXCEPTION（阻断）、FIX（通过 Script 自动修复，如 JSON 语法错误）、REASK（让 LLM 基于错误信息重新生成）。三种策略可分别配置，实现精细化的错误处理。
Pydantic Schema 到 JSON Schema 到 Prompt 的转换链：Pydantic Model -> JSON Schema -> XML/JSON Prompt Instruction -> LLM 输出 -> 解析 + 校验 -> Pydantic Instance。Guard.for_pydantic() 将这段流程封装为单一方法调用。

实操 Demo

以下演示两个完整场景：多校验器组合的客服安全 Guard、独立部署的安全网关。

Demo 1：电商客服机器人的三重安全护栏

步骤 1：安装所需校验器

guardrails hub install hub://guardrails/competitor_check
guardrails hub install hub://guardrails/toxic_language
guardrails hub install hub://guardrails/detect_pii

步骤 2：编写 Guard 配置

# customer_service_guard.py
from guardrails import Guard, OnFailAction
from guardrails.hub import CompetitorCheck, ToxicLanguage, DetectPII

guard = Guard().use(
    DetectPII(
        on_fail=OnFailAction.EXCEPTION,
        pii_entities=["EMAIL_ADDRESS", "PHONE_NUMBER", "SSN"]
    ),
    ToxicLanguage(
        threshold=0.5,
        validation_method="sentence",
        on_fail=OnFailAction.REASK
    ),
    CompetitorCheck(
        ["AliExpress", "Shopee", "Pinduoduo"],
        on_fail=OnFailAction.EXCEPTION
    )
)

# 测试 1：正常客服对话——全部通过
result = guard.validate(
    "感谢您选择我们的平台！您的订单预计3-5个工作日送达。"
)
print("测试1通过:", result.validation_passed)

# 测试 2：提及竞品——CompetitorCheck 失败
try:
    guard.validate("你们的价格比 Shopee 贵太多了！")
except Exception as e:
    print("测试2被拦截:", e)

# 测试 3：包含 PII 的手机号——DetectPII 失败
try:
    guard.validate("我的电话是 138-1234-5678，请回电。")
except Exception as e:
    print("测试3被拦截:", e)

步骤 3：运行测试

python customer_service_guard.py

Demo 2：部署为安全网关服务

步骤 1：创建 Guard 配置

guardrails create --validators=hub://guardrails/toxic_language,hub://guardrails/detect_pii --guard-name=safety-gateway

步骤 2：编写 config.py

# config.py
from guardrails import Guard
from guardrails.hub import ToxicLanguage, DetectPII

guard = Guard().use(
    DetectPII(on_fail="exception"),
    ToxicLanguage(threshold=0.5, validation_method="sentence", on_fail="exception")
)

步骤 3：启动服务

guardrails start --config=./config.py --port=8000

步骤 4：通过 OpenAI SDK 调用

import openai
openai.base_url = "http://localhost:8000/guards/safety-gateway/openai/v1/"
openai.api_key = "dummy"  # 本地服务不需要真实 key

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Write a short poem about friendship."}
    ]
)
print(response.choices[0].message.content)

安全性有问题的请求会被 Guard 在网关层拦截，不会到达 LLM。

维度	Guardrails-AI	NVIDIA NeMo Guardrails	Llama Guard	Promptfoo Red Team
核心理念	可组合校验器市场 + 结构化输出	对话安全护栏，基于 Colang 语言	Meta 开源的内容安全分类器	Prompt 安全扫描与红队测试
校验器/规则数量	70+（Hub 市场持续增长）	用户自建 dialogue flows	13 类安全分类标签	预置 Red Team 插件 + 自定义
输入 Guard	是（通过校验器在 LLM 调用前检查）	是（用户输入预处理）	是（Prompt 安全分类）	是（Red Team 攻击测试）
输出 Guard	是（校验器 + OnFailAction）	是（Bot 输出验证）	是（Response 安全分类）	否（评测框架，非在线防护）
结构化输出	Pydantic 原生集成，Prompt 自动化	不支持	不支持	不支持
校验器扩展	`guardrails hub install` + 自定义提交	自定义 Colang 脚本	N/A（固定分类器）	自定义 Red Team 插件
部署模式	Python SDK + Flask REST 服务	Colang 运行时服务器	Python SDK（模型推理）	CLI + 本地 Web UI
失败处理	EXCEPTION / FIX / REASK 三种策略	预定义安全回复覆盖	返回安全评分	不适用
在线/离线	校验器本地运行，Hub 需网络安装	完全本地	模型需本地加载	完全本地
OpenAI 兼容	是（REST API 透明代理）	部分	否	否
安装	`pip install guardrails-ai`	`pip install nemoguardrails`	`pip install llama-guard`	`npm install promptfoo`
学习曲线	低（Python 生态，Pydantic 熟悉者零门槛）	高（需学习 Colang DSL）	中	低（YAML 配置）
开源协议	Apache 2.0	Apache 2.0	Llama 3.3 Community License	MIT
GitHub Stars	7,000+	14,400+	N/A（Meta 仓库）	22,200+

参考资源

GitHub 仓库： https://github.com/guardrails-ai/guardrails
官方文档： https://docs.guardrailsai.com
Guardrails Hub： https://guardrailsai.com/hub
Hub 校验器列表： https://guardrailsai.com/hub
结构化生成指南： https://docs.guardrailsai.com/concepts/structured_data
服务器部署指南： https://docs.guardrailsai.com/concepts/server
Guardrails Index 基准： https://guardrailsai.com/index — 24 个护栏的跨 6 个类别基准排名
相关阅读： 本文与《SKILL-DeepEval》和《SKILL-lm-evaluation-harness》分别覆盖运行时安全防护、LLM 应用质量评估和标准化基准评测这三个评估与安全的核心维度