本文由 OpenAI 发布于2024 年 8 月 6 日

我们在 API 中引入了结构化输出功能——模型输出现在能够可靠地遵循开发者提供的 JSON Schema。

去年在 DevDay 开发者大会上,我们推出了 JSON 模式,帮助开发者构建更加可靠的应用程序。尽管 JSON 模式提高了生成有效 JSON 输出的可靠性,但并不能确保模型的响应会完全符合指定的 Schema。今天,我们引入了 API 的结构化输出功能,确保模型生成的输出能够严格匹配开发者提供的 JSON Schema。

生成结构化数据是 AI 在当今应用中的一个核心应用场景。开发者使用 OpenAI API 构建功能强大的助手,这些助手能够通过函数调用获取数据并回答问题,提取结构化数据用于数据录入,并创建允许大语言模型 (LLM) 执行操作的多步骤智能工作流。开发者通常会使用开源工具、提示工程和反复请求来确保模型输出与他们系统的格式互操作。结构化输出通过限制 OpenAI 模型的输出,使其严格遵循开发者提供的 Schema,并训练模型更好地理解复杂的 Schema,从而解决了这一问题。

在我们对复杂 JSON Schema 的评估中,启用了结构化输出的 gpt-4o-2024-08-06 模型获得了满分 100%。相比之下,gpt-4-0613 的得分不足 40%。

通过结构化输出, gpt-4o-2024-08-06 在我们的评估中实现了 100% 的可靠性,完美匹配输出模式。

如何使用结构化输出

我们在 API 中提供了两种形式的结构化输出:

  1. 函数调用:通过工具实现的结构化输出,可以在函数定义中设置 strict: true 来启用。此功能适用于所有支持工具的模型,包括 gpt-4-0613、gpt-3.5-turbo-0613 及之后的模型。启用后,模型的输出将严格匹配提供的工具定义。
Request
POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
    },
    {
      "role": "user",
      "content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "query",
        "description": "Execute a query.",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "table_name": {
              "type": "string",
              "enum": ["orders"]
            },
            "columns": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "id",
                  "status",
                  "expected_delivery_date",
                  "delivered_at",
                  "shipped_at",
                  "ordered_at",
                  "canceled_at"
                ]
              }
            },
            "conditions": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "column": {
                    "type": "string"
                  },
                  "operator": {
                    "type": "string",
                    "enum": ["=", ">", "<", ">=", "<=", "!="]
                  },
                  "value": {
                    "anyOf": [
                      {
                        "type": "string"
                      },
                      {
                        "type": "number"
                      },
                      {
                        "type": "object",
                        "properties": {
                          "column_name": {
                            "type": "string"
                          }
                        },
                        "required": ["column_name"],
                        "additionalProperties": false
                      }
                    ]
                  }
                },
                "required": ["column", "operator", "value"],
                "additionalProperties": false
              }
            },
            "order_by": {
              "type": "string",
              "enum": ["asc", "desc"]
            }
          },
          "required": ["table_name", "columns", "conditions", "order_by"],
          "additionalProperties": false
        }
      }
    }
  ]
}
Output JSON
{
  "table_name": "orders",
  "columns": ["id", "status", "expected_delivery_date", "delivered_at"],
  "conditions": [
    {
      "column": "status",
      "operator": "=",
      "value": "fulfilled"
    },
    {
      "column": "ordered_at",
      "operator": ">=",
      "value": "2023-05-01"
    },
    {
      "column": "ordered_at",
      "operator": "<",
      "value": "2023-06-01"
    },
    {
      "column": "delivered_at",
      "operator": ">",
      "value": {
        "column_name": "expected_delivery_date"
      }
    }
  ],
  "order_by": "asc"
}
  1. response_format 参数的新选项:开发者现在可以通过 json_schema 提供 JSON Schema 作为 response_format 参数的新选项。当模型不是调用工具而是以结构化方式响应用户时,这个选项非常有用。此功能适用于最新的 GPT-4o 模型:gpt-4o-2024-08-06 和 gpt-4o-mini-2024-07-18。当 response_format 配合 strict: true 使用时,模型输出将严格匹配提供的 Schema。
Request
POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful math tutor."
    },
    {
      "role": "user",
      "content": "solve 8x + 31 = 2"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "math_response",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "steps": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "explanation": {
                  "type": "string"
                },
                "output": {
                  "type": "string"
                }
              },
              "required": ["explanation", "output"],
              "additionalProperties": false
            }
          },
          "final_answer": {
            "type": "string"
          }
        },
        "required": ["steps", "final_answer"],
        "additionalProperties": false
      }
    }
  }
}
Output JSON
{
  "steps": [
    {
      "explanation": "Subtract 31 from both sides to isolate the term with x.",
      "output": "8x + 31 - 31 = 2 - 31"
    },
    {
      "explanation": "This simplifies to 8x = -29.",
      "output": "8x = -29"
    },
    {
      "explanation": "Divide both sides by 8 to solve for x.",
      "output": "x = -29 / 8"
    }
  ],
  "final_answer": "x = -29 / 8"
}

安全的结构化输出

安全性始终是 OpenAI 的首要任务——新的结构化输出功能将遵循现有的安全政策,仍然允许模型拒绝不安全的请求。为简化开发,API 响应中新增了 refusal 字符串值,开发者可以据此编程地检测模型是否生成了拒绝响应而非符合 Schema 的输出。如果响应中未包含拒绝信息,并且模型的响应未被过早中断(通过 finish_reason 标记),那么模型将可靠地生成与提供的 Schema 匹配的有效 JSON。

{
  "id": "chatcmpl-9nYAG9LPNonX8DAyrkwYfemr3C8HC",
  "object": "chat.completion",
  "created": 1721596428,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "refusal": "I'm sorry, I cannot assist with that request."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 81,
    "completion_tokens": 11,
    "total_tokens": 92
  },
  "system_fingerprint": "fp_3407719c7f"
}

原生 SDK 支持

我们的 Python 和 Node SDK 已经原生支持结构化输出。提供工具的 Schema 或作为响应格式都非常简单,只需提供 Pydantic 或 Zod 对象,我们的 SDK 就会自动处理将数据类型转换为支持的 JSON Schema,并将 JSON 响应反序列化为相应的类型数据结构,同时解析可能出现的拒绝响应。

以下示例展示了如何使用原生支持的结构化输出来进行函数调用:

Python
from enum import Enum
from typing import Union

from pydantic import BaseModel

import openai
from openai import OpenAI


class Table(str, Enum):
    orders = "orders"
    customers = "customers"
    products = "products"


class Column(str, Enum):
    id = "id"
    status = "status"
    expected_delivery_date = "expected_delivery_date"
    delivered_at = "delivered_at"
    shipped_at = "shipped_at"
    ordered_at = "ordered_at"
    canceled_at = "canceled_at"


class Operator(str, Enum):
    eq = "="
    gt = ">"
    lt = "<"
    le = "<="
    ge = ">="
    ne = "!="


class OrderBy(str, Enum):
    asc = "asc"
    desc = "desc"


class DynamicValue(BaseModel):
    column_name: str


class Condition(BaseModel):
    column: str
    operator: Operator
    value: Union[str, int, DynamicValue]


class Query(BaseModel):
    table_name: Table
    columns: list[Column]
    conditions: list[Condition]
    order_by: OrderBy


client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function.",
        },
        {
            "role": "user",
            "content": "look up all my orders in may of last year that were fulfilled but not delivered on time",
        },
    ],
    tools=[
        openai.pydantic_function_tool(Query),
    ],
)

print(completion.choices[0].message.tool_calls[0].function.parsed_arguments)
Node
import OpenAI from 'openai';
import z from 'zod';
import { zodFunction } from 'openai/helpers/zod';

const Table = z.enum(['orders', 'customers', 'products']);
const Column = z.enum([
    'id',
    'status',
    'expected_delivery_date',
    'delivered_at',
    'shipped_at',
    'ordered_at',
    'canceled_at',
]);
const Operator = z.enum(['=', '>', '<', '<=', '>=', '!=']);
const OrderBy = z.enum(['asc', 'desc']);

const DynamicValue = z.object({
    column_name: z.string(),
});

const Condition = z.object({
    column: z.string(),
    operator: Operator,
    value: z.union([z.string(), z.number(), DynamicValue]),
});

const QueryArgs = z.object({
    table_name: Table,
    columns: z.array(Column),
    conditions: z.array(Condition),
    order_by: OrderBy,
});

const client = new OpenAI();

const completion = await client.beta.chat.completions.parse({
    model: 'gpt-4o-2024-08-06',
    messages: [
        { role: 'system', content: 'You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function.' },
        { role: 'user', content: 'look up all my orders in may of last year that were fulfilled but not delivered on time' }
    ],
    tools: [zodFunction({ name: 'query', parameters: QueryArgs })],
});
console.log(completion.choices[0].message.tool_calls[0].function.parsed_arguments);

原生结构化输出支持 也可用于response_format

Python
from pydantic import BaseModel

from openai import OpenAI


class Step(BaseModel):
    explanation: str
    output: str


class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str


client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor."},
        {"role": "user", "content": "solve 8x + 31 = 2"},
    ],
    response_format=MathResponse,
)

message = completion.choices[0].message
if message.parsed:
    print(message.parsed.steps)
    print(message.parsed.final_answer)
else:
    print(message.refusal)
Node
import OpenAI from 'openai';
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';


const Step = z.object({
    explanation: z.string(),
    output: z.string(),
})

const MathResponse = z.object({
    steps: z.array(Step),
    final_answer: z.string(),
})


const client = new OpenAI();

const completion = await client.beta.chat.completions.parse({
    model: 'gpt-4o-2024-08-06',
    messages: [
        {
            "role": "system",
            "content": "You are a helpful math tutor. Only use the schema for math responses.",
        },
        { "role": "user", "content": "solve 8x + 3 = 21" },
    ],
    response_format: zodResponseFormat(MathResponse, 'mathResponse'),
});

const message = completion.choices[0]?.message;
if (message?.parsed) {
    console.log(message.parsed.steps);
    console.log(message.parsed.final_answer);
} else {
    console.log(message.refusal);
}

其他使用案例

开发者经常使用 OpenAI 的模型为各种场景生成结构化数据。以下是一些额外的示例:

根据用户意图动态生成用户界面

开发者可以使用结构化输出来生成代码或 UI。例如,以下所有示例使用相同的 response_format,可以根据用户输入生成不同的 UI。

System

您是一名用户界面助手。您的任务是帮助用户可视化他们的网站和应用程序创意。

Response format

{
  "type": "json_schema",
  "json_schema": {
    "name": "ui",
    "description": "Dynamically generated UI",
    "strict": true,
    "schema": {
      "type": "object",
      "properties": {
        "type": {
          "type": "string",
          "description": "The type of the UI component",
          "enum": ["div", "button", "header", "section", "field", "form"]
        },
        "label": {
          "type": "string",
          "description": "The label of the UI component, used for buttons or form fields"
        },
        "children": {
          "type": "array",
          "description": "Nested UI components",
          "items": {
            "$ref": "#"
          }
        },
        "attributes": {
          "type": "array",
          "description": "Arbitrary attributes for the UI component, suitable for any element",
          "items": {
            "type": "object",
            "properties": {
              "name": {
                "type": "string",
                "description": "The name of the attribute, for example onClick or className"
              },
              "value": {
                "type": "string",
                "description": "The value of the attribute"
              }
            },
            "additionalProperties": false,
            "required": ["name", "value"]
          }
        }
      },
      "required": ["type", "label", "children", "attributes"],
      "additionalProperties": false
    }
  }
}

Assistant

Landing page for a gardener
{
  "type": "div",
  "label": "",
  "children": [
    {
      "type": "header",
      "label": "",
      "children": [
        {
          "type": "div",
          "label": "Green Thumb Gardening",
          "children": [],
          "attributes": [{ "name": "className", "value": "site-title" }]
        },
        {
          "type": "div",
          "label": "Bringing Life to Your Garden",
          "children": [],
          "attributes": [{ "name": "className", "value": "site-tagline" }]
        }
      ],
      "attributes": [{ "name": "className", "value": "header" }]
    },
    {
      "type": "section",
      "label": "",
      "children": [
        {
          "type": "div",
          "label": "",
          "children": [
            {
              "type": "div",
              "label": "About Us",
              "children": [
                {
                  "type": "div",
                  "label": "At Green Thumb Gardening, we specialize in transforming your outdoor spaces into beautiful, thriving gardens. Our team has decades of experience in horticulture and landscape design.",
                  "children": [],
                  "attributes": [
                    { "name": "className", "value": "about-description" }
                  ]
                }
              ],
              "attributes": [{ "name": "className", "value": "about-section" }]
            }
          ],
          "attributes": [{ "name": "className", "value": "content" }]
        }
      ],
      "attributes": [{ "name": "className", "value": "about-container" }]
    },
    {
      "type": "section",
      "label": "",
      "children": [
        {
          "type": "div",
          "label": "",
          "children": [
            {
              "type": "div",
              "label": "Our Services",
              "children": [
                {
                  "type": "div",
                  "label": "Garden Design",
                  "children": [],
                  "attributes": [
                    { "name": "className", "value": "service-item" }
                  ]
                },
                {
                  "type": "div",
                  "label": "Plant Care & Maintenance",
                  "children": [],
                  "attributes": [
                    { "name": "className", "value": "service-item" }
                  ]
                },
                {
                  "type": "div",
                  "label": "Seasonal Cleanup",
                  "children": [],
                  "attributes": [
                    { "name": "className", "value": "service-item" }
                  ]
                },
                {
                  "type": "div",
                  "label": "Custom Landscaping",
                  "children": [],
                  "attributes": [
                    { "name": "className", "value": "service-item" }
                  ]
                }
              ],
              "attributes": [{ "name": "className", "value": "services-list" }]
            }
          ],
          "attributes": [{ "name": "className", "value": "content" }]
        }
      ],
      "attributes": [{ "name": "className", "value": "services-container" }]
    }
  ],
  "attributes": [{ "name": "className", "value": "landing-page" }]
}

Sign up screen for an app
{
  "type": "form",
  "label": "Sign Up Form",
  "children": [
    {
      "type": "header",
      "label": "Sign Up for MovieReviews",
      "children": [],
      "attributes": [{ "name": "className", "value": "signup-header" }]
    },
    {
      "type": "field",
      "label": "Username",
      "attributes": [
        { "name": "type", "value": "text" },
        { "name": "placeholder", "value": "Enter your username" },
        { "name": "name", "value": "username" },
        { "name": "required", "value": "true" },
        { "name": "className", "value": "input-text" }
      ],
      "children": []
    },
    {
      "type": "field",
      "label": "Email",
      "attributes": [
        { "name": "type", "value": "email" },
        { "name": "placeholder", "value": "Enter your email" },
        { "name": "name", "value": "email" },
        { "name": "required", "value": "true" },
        { "name": "className", "value": "input-email" }
      ],
      "children": []
    },
    {
      "type": "field",
      "label": "Password",
      "attributes": [
        { "name": "type", "value": "password" },
        { "name": "placeholder", "value": "Create a password" },
        { "name": "name", "value": "password" },
        { "name": "required", "value": "true" },
        { "name": "className", "value": "input-password" }
      ],
      "children": []
    },
    {
      "type": "field",
      "label": "Confirm Password",
      "attributes": [
        { "name": "type", "value": "password" },
        { "name": "placeholder", "value": "Confirm your password" },
        { "name": "name", "value": "confirm_password" },
        { "name": "required", "value": "true" },
        { "name": "className", "value": "input-password-confirm" }
      ],
      "children": []
    },
    {
      "type": "button",
      "label": "Sign Up",
      "attributes": [
        { "name": "type", "value": "submit" },
        { "name": "className", "value": "submit-button" }
      ],
      "children": []
    }
  ],
  "attributes": [
    { "name": "action", "value": "/signup" },
    { "name": "method", "value": "POST" },
    { "name": "className", "value": "signup-form" }
  ]
}

Stock price widget
{
  "type": "div",
  "label": "",
  "children": [
    {
      "type": "header",
      "label": "Stock App",
      "children": [],
      "attributes": [
        {
          "name": "style",
          "value": "font-size: 24px; font-weight: bold; text-align: center; margin-bottom: 20px;"
        }
      ]
    },
    {
      "type": "div",
      "label": "",
      "children": [
        {
          "type": "div",
          "label": "Company Name: XYZ Corp",
          "children": [],
          "attributes": [
            {
              "name": "style",
              "value": "font-size: 20px; font-weight: bold; margin-bottom: 10px;"
            }
          ]
        },
        {
          "type": "div",
          "label": "Chart Area",
          "children": [],
          "attributes": [
            {
              "name": "style",
              "value": "width: 100%; height: 300px; border: 1px solid #ccc; margin-bottom: 20px;"
            }
          ]
        }
      ],
      "attributes": [
        {
          "name": "style",
          "value": "padding: 20px; border: 1px solid #ddd; margin-bottom: 20px;"
        }
      ]
    },
    {
      "type": "div",
      "label": "",
      "children": [
        {
          "type": "button",
          "label": "Refresh",
          "children": [],
          "attributes": [
            {
              "name": "style",
              "value": "margin-right: 10px; padding: 10px 20px; background-color: #4CAF50; color: white; border: none; font-size: 16px;"
            },
            { "name": "type", "value": "button" }
          ]
        },
        {
          "type": "button",
          "label": "Change Time Scale",
          "children": [],
          "attributes": [
            {
              "name": "style",
              "value": "margin-right: 10px; padding: 10px 20px; background-color: #2196F3; color: white; border: none; font-size: 16px;"
            },
            { "name": "type", "value": "button" }
          ]
        },
        {
          "type": "button",
          "label": "Export Data",
          "children": [],
          "attributes": [
            {
              "name": "style",
              "value": "padding: 10px 20px; background-color: #f44336; color: white; border: none; font-size: 16px;"
            },
            { "name": "type", "value": "button" }
          ]
        }
      ],
      "attributes": [{ "name": "style", "value": "text-align: center;" }]
    }
  ],
  "attributes": [
    {
      "name": "style",
      "value": "max-width: 600px; margin: 0 auto; font-family: Arial, sans-serif;"
    }
  ]
}

将最终答案与支持的推理或附加评论分开

为模型提供单独的推理步骤字段可以提高响应的最终质量。

Request
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "9.11 and 9.9 -- which is bigger?"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "reasoning_schema",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "reasoning_steps": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "The reasoning steps leading to the final conclusion."
          },
          "answer": {
            "type": "string",
            "description": "The final answer, taking into account the reasoning steps."
          }
        },
        "required": ["reasoning_steps", "answer"],
        "additionalProperties": false
      }
    }
  }
}
Structured Output
{
  "reasoning_steps": [
    "First step is to compare the numbers 9.11 and 9.9.",
    "Both numbers have the same whole number part, which is 9.",
    "To compare the decimal parts, convert them to the same number of decimal places.",
    "9.11 has two decimal places: it is 9.11.",
    "9.9 has one decimal place: it can be rewritten as 9.90.",
    "Now, compare 9.11 and 9.90 by looking at the decimal parts.",
    "Compare 11 with 90.",
    "90 is greater than 11, so 9.90 is greater than 9.11."
  ],
  "answer": "9.9 is bigger than 9.11."
}

从非结构化数据中提取结构化数据

例如,可以指示模型从会议记录中提取待办事项、截止日期和任务负责人。

Request
POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "Extract action items, due dates, and owners from meeting notes."
    },
    {
      "role": "user",
      "content": "...meeting notes go here..."
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "action_items",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "action_items": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "description": {
                  "type": "string",
                  "description": "Description of the action item."
                },
                "due_date": {
                  "type": ["string", "null"],
                  "description": "Due date for the action item, can be null if not specified."
                },
                "owner": {
                  "type": ["string", "null"],
                  "description": "Owner responsible for the action item, can be null if not specified."
                }
              },
              "required": ["description", "due_date", "owner"],
              "additionalProperties": false
            },
            "description": "List of action items from the meeting."
          }
        },
        "required": ["action_items"],
        "additionalProperties": false
      }
    }
  }
}
Structured Output
{
  "action_items": [
    {
      "description": "Collaborate on optimizing the path planning algorithm",
      "due_date": "2024-06-30",
      "owner": "Jason Li"
    },
    {
      "description": "Reach out to industry partners for additional datasets",
      "due_date": "2024-06-25",
      "owner": "Aisha Patel"
    },
    {
      "description": "Explore alternative LIDAR sensor configurations and report findings",
      "due_date": "2024-06-27",
      "owner": "Kevin Nguyen"
    },
    {
      "description": "Schedule extended stress tests for the integrated navigation system",
      "due_date": "2024-06-28",
      "owner": "Emily Chen"
    },
    {
      "description": "Retest the system after bug fixes and update the team",
      "due_date": "2024-07-01",
      "owner": "David Park"
    }
  ]
}

技术内幕

我们采用了双管齐下的方法来提高模型输出与 JSON Schema 匹配的可靠性。首先,我们训练了最新的 gpt-4o-2024-08-06 模型,使其能够更好地理解复杂的 Schema,并尽可能生成与之匹配的输出。然而,模型行为本质上是非确定性的——尽管该模型的表现有所提升(在我们的基准测试中达到 93%),但它仍未达到开发者构建强大应用程序所需的可靠性。因此,我们还采用了基于工程学的确定性方法,通过限制模型的输出,确保其达到 100% 的可靠性。

约束解码

我们的方法基于一种称为约束采样或约束解码的技术。默认情况下,模型在生成输出时是完全不受限制的,能够从词汇表中选择任何 Token 作为下一个输出。这种灵活性使得模型可能出错;例如,它们可以随时选择生成一个大括号 Token,即使这不会生成有效的 JSON。为了确保生成的输出有效,我们将模型的选择限制在那些符合指定 Schema 的 Token 内,而不是所有可用的 Token。

在实际应用中实现这种约束相当具有挑战性,因为模型输出过程中有效的 Token 会不断变化。假设我们有以下 Schema:

{
  "type": "object",
  "properties": {
    "value": { "type": "number" }
  },
  "required": ["value"],
  "additionalProperties": false
}

在输出的开头,有效的 Token 可能包括 {, {", {\n 等等。然而,一旦模型已经生成了 {"val,那么 { 就不再是有效的 Token。因此,我们需要实现动态约束解码,在每个 Token 生成后确定哪些 Token 是有效的,而不是在响应的开始时就固定下来。

为此,我们将提供的 JSON Schema 转换为上下文无关文法 (CFG)。文法是一组定义语言的规则,而上下文无关文法是符合特定规则的文法。你可以将 JSON 和 JSON Schema 视为一种有规则定义的语言。就像在英语中没有动词的句子是无效的一样,在 JSON 中尾随逗号也是无效的。

因此,对于每个 JSON Schema,我们计算出一个表示该 Schema 的文法,并预处理其组件,以便在模型采样期间能够快速访问。这就是为什么第一次使用新 Schema 的请求会有延迟——我们必须预处理 Schema 以生成一个可以在采样时高效使用的工件。

在采样过程中,在每个 Token 之后,我们的推理引擎将根据前面生成的 Token 和文法中的规则确定接下来哪些 Token 是有效的。然后,我们使用这个 Token 列表来屏蔽下一次采样,这实际上将无效 Token 的概率降为零。由于我们已经预处理了 Schema,我们可以使用缓存的数据结构来高效完成这一过程,几乎不会增加延迟。

替代方法

对于这个问题的替代方法通常会使用有限状态机 (FSM) 或正则表达式(通常是用 FSM 实现)来进行约束解码。它们的工作方式相似,即在每个 Token 生成后动态更新有效 Token,但它们与 CFG 方法有一些关键区别。尤其是,CFG 能够表达比 FSM 更广泛的语言类别。对于非常简单的 Schema,比如上面显示的这个,这并不重要。然而,我们发现,对于那些涉及嵌套或递归数据结构的复杂 Schema,这种差异是有意义的。例如,FSM 通常无法表达递归类型,这意味着基于 FSM 的方法可能难以在深度嵌套的 JSON 中匹配括号。以下是一个递归 Schema 的示例,它在 OpenAI API 中受结构化输出支持,但使用 FSM 是无法实现的。

{
  "name": "ui",
  "description": "Dynamically generated UI",
  "strict": true,
  "schema": {
    "type": "object",
    "properties": {
      "type": {
        "type": "string",
        "description": "The type of the UI component",
        "enum": ["div", "button", "header", "section", "field", "form"]
      },
      "label": {
        "type": "string",
        "description": "The label of the UI component, used for buttons or form fields"
      },
      "children": {
        "type": "array",
        "description": "Nested UI components",
        "items": {
          "$ref": "#"
        }
      },
      "attributes": {
        "type": "array",
        "description": "Arbitrary attributes for the UI component, suitable for any element",
        "items": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "description": "The name of the attribute, for example onClick or className"
            },
            "value": {
              "type": "string",
              "description": "The value of the attribute"
            }
          }
        }
      }
    },
    "required": ["type", "label", "children", "attributes"],
    "additionalProperties": false
  }
}

请注意,每个 UI 元素可以有任意数量的子元素,这些子元素递归地引用根 Schema。这种灵活性是 CFG 方法所提供的。

限制和约束

使用结构化输出时,需要注意以下几点限制:

  • 结构化输出仅支持 JSON Schema 的一个子集,具体请参阅我们的文档
  • 第一次使用新 Schema 的 API 响应会有额外的延迟,但后续响应将更快速,无需额外延迟。这是因为在第一次请求时,我们需要处理 Schema 并生成可缓存的工件以便后续使用。一般情况下,Schema 处理时间少于 10 秒,但对于复杂的 Schema,可能需要最多一分钟。
  • 如果模型选择拒绝不安全的请求,可能无法遵循 Schema。如果选择拒绝,返回的消息将带有 refusal 标记,表明这一点。
  • 如果生成的内容在达到 max_tokens 或其他停止条件之前未完成,模型可能无法遵循 Schema。
  • 结构化输出并不能防止所有类型的模型错误。例如,模型可能仍会在 JSON 对象的值中犯错(例如在数学方程中出错)。如果开发者发现错误,我们建议在系统指令中提供更多示例或将任务分解为更简单的子任务。
  • 结构化输出与并行函数调用不兼容。当生成并行函数调用时,可能无法与提供的 Schema 匹配。可以通过设置 parallel_tool_calls: false 来禁用并行函数调用。
  • 使用结构化输出的 JSON Schema 不符合零数据保留 (Zero Data Retention, ZDR) 的资格。

可用性

结构化输出功能现已在 API 中全面开放。

支持函数调用的结构化输出适用于 API 中所有支持函数调用的模型。这包括最新的

gpt-4o 和 gpt-4o-mini 模型,所有 gpt-4-0613 和 gpt-3.5-turbo-0613 及之后的模型,以及所有支持函数调用的微调模型。该功能适用于聊天补全 API、助手 API 和批量 API。支持函数调用的结构化输出也与视觉输入兼容。

支持响应格式的结构化输出适用于 gpt-4o-mini 和 gpt-4o-2024-08-06 以及基于这些模型的微调模型。此功能适用于聊天补全 API、助手 API 和批量 API。支持响应格式的结构化输出也与视觉输入兼容。

通过切换到 gpt-4o-2024-08-06,开发者在输入成本上节省 50%(每百万输入 Token $2.50),在输出成本上节省 33%(每百万输出 Token $10.00),与 gpt-4o-2024-05-13 相比。

想要开始使用结构化输出,请查看我们的文档

致谢

结构化输出功能的灵感来源于开源社区的优秀工作,特别是 outlines、jsonformer、instructor、guidance 和 lark 库。