2026-03-05

11.3 图像处理与传输

多模态类型定义

在 src/types/multimodal.ts 中，我们定义了完整的多模态消息类型系统：

基础类型

// src/types/multimodal.ts

/**
 * 图像媒体类型
 */
export type ImageMediaType =
  | "image/png"
  | "image/jpeg"
  | "image/gif"
  | "image/webp";

/**
 * 图像内容
 */
export interface ImageContent {
  type: "image";
  mediaType: ImageMediaType;
  data: string; // base64 编码的图像数据
}

/**
 * 文本内容
 */
export interface TextContent {
  type: "text";
  text: string;
}

/**
 * 消息内容联合类型
 */
export type MessageContent = TextContent | ImageContent;

/**
 * 多模态消息
 */
export interface MultimodalMessage {
  role: "system" | "user" | "assistant" | "tool";
  content: MessageContent | MessageContent[];
  toolCallId?: string;
  name?: string;
}

消息格式示例

纯文本消息：

{
  role: "user",
  content: {
    type: "text",
    text: "你好"
  }
}

纯图像消息：

{
  role: "user",
  content: {
    type: "image",
    mediaType: "image/png",
    data: "iVBORw0KGgo..."
  }
}

文本+图像混合消息：

{
  role: "user",
  content: [
    { type: "text", text: "描述这张图片" },
    {
      type: "image",
      mediaType: "image/jpeg",
      data: "/9j/4AAQSkZJRg..."
    }
  ]
}

支持的图像格式

我们选择支持四种主流图像格式：

格式	MIME 类型	特点	使用场景
PNG	image/png	无损压缩，支持透明	截图、UI 设计
JPEG	image/jpeg	有损压缩，文件小	照片、复杂图像
GIF	image/gif	支持动画	动图、简单图标
WebP	image/webp	现代格式，压缩率高	网页图像

格式检测

通过文件扩展名自动检测格式：

// src/types/multimodal.ts

/**
 * 从文件路径读取图像并转换为 base64
 */
export async function readImageAsBase64(
  filePath: string
): Promise<ImageContent> {
  const fs = await import("fs/promises");
  const path = await import("path");

  // 读取文件
  const buffer = await fs.readFile(filePath);

  // 根据文件扩展名确定媒体类型
  const ext = path.extname(filePath).toLowerCase();
  const mediaTypeMap: Record<string, ImageMediaType> = {
    ".png": "image/png",
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".gif": "image/gif",
    ".webp": "image/webp",
  };

  const mediaType = mediaTypeMap[ext] || "image/png";

  // 转换为 base64
  const base64 = buffer.toString("base64");

  return {
    type: "image",
    mediaType,
    data: base64,
  };
}

Base64 编码

为什么使用 Base64

优点：

文本格式，易于在 JSON 中传输
不需要独立的文件传输
跨平台兼容性好
WebSocket 友好

缺点：

数据量增大约 33%
需要编码/解码开销

Base64 编码实现

// Node.js 内置 base64 编码
const fs = await import("fs/promises");

// 读取文件为 Buffer
const buffer = await fs.readFile(filePath);

// 转换为 base64 字符串
const base64 = buffer.toString("base64");

Data URL 格式

完整的 Data URL 格式：

1	data:<mediaType>;base64,<base64-data>

示例：

1	data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA...

工具函数：

/**
 * 从 base64 字符串创建图像内容
 */
export function base64ToImageContent(
  base64: string,
  mediaType: ImageMediaType = "image/png"
): ImageContent {
  // 移除可能的 data URL 前缀
  const cleanBase64 = base64.replace(
    /^data:image\/[a-z]+;base64,/,
    ""
  );

  return {
    type: "image",
    mediaType,
    data: cleanBase64,
  };
}

文件大小限制

为了防止传输过大的文件，我们设置了 10MB 的文件大小限制：

// src/client/cli.ts

const MAX_IMAGE_SIZE = 10 * 1024 * 1024; // 10MB

async function loadImageFile(filePath: string): Promise<ImageContent> {
  const fs = await import("fs/promises");

  // 检查文件大小
  const stats = await fs.stat(filePath);
  if (stats.size > MAX_IMAGE_SIZE) {
    throw new Error(
      `图像文件过大 (${(stats.size / 1024 / 1024).toFixed(2)}MB)，` +
      `最大支持 ${MAX_IMAGE_SIZE / 1024 / 1024}MB`
    );
  }

  // 读取并转换
  return await readImageAsBase64(filePath);
}

大小限制的原因

传输效率：大文件传输耗时
内存限制：Base64 编码后占用更多内存
API 限制：许多 LLM API 对图像大小有限制
成本控制：大图像消耗更多 token

图像验证

格式验证

/**
 * 验证图像内容
 */
export function validateImageContent(content: ImageContent): boolean {
  // 检查类型
  if (content.type !== "image") {
    return false;
  }

  // 检查媒体类型
  const validTypes: ImageMediaType[] = [
    "image/png",
    "image/jpeg",
    "image/gif",
    "image/webp",
  ];
  if (!validTypes.includes(content.mediaType)) {
    return false;
  }

  // 检查 base64 数据
  if (!content.data || typeof content.data !== "string") {
    return false;
  }

  // 检查 base64 格式
  const base64Regex = /^[A-Za-z0-9+/]+={0,2}$/;
  if (!base64Regex.test(content.data)) {
    return false;
  }

  return true;
}

完整的加载流程

async function loadAndValidateImage(
  filePath: string
): Promise<ImageContent> {
  // 1. 检查文件存在
  const fs = await import("fs/promises");
  try {
    await fs.access(filePath);
  } catch {
    throw new Error(`文件不存在: ${filePath}`);
  }

  // 2. 检查文件大小
  const stats = await fs.stat(filePath);
  if (stats.size > MAX_IMAGE_SIZE) {
    throw new Error(`文件过大: ${stats.size} bytes`);
  }

  // 3. 读取并转换
  const imageContent = await readImageAsBase64(filePath);

  // 4. 验证格式
  if (!validateImageContent(imageContent)) {
    throw new Error("图像格式无效");
  }

  return imageContent;
}

图像处理流程

CLI 端处理流程

用户输入: image: /path/to/image.png
   ↓
解析命令，提取路径
   ↓
检查文件存在
   ↓
检查文件大小 (≤10MB)
   ↓
读取文件为 Buffer
   ↓
根据扩展名确定媒体类型
   ↓
转换为 Base64
   ↓
创建 ImageContent 对象
   ↓
包装为 MultimodalMessage
   ↓
发送到服务器

Web 端处理流程

用户点击上传按钮
   ↓
选择文件
   ↓
浏览器 File API 读取
   ↓
验证文件类型 (image/*)
   ↓
验证文件大小 (≤10MB)
   ↓
FileReader.readAsDataURL()
   ↓
获得 Data URL
   ↓
解析为 ImageContent
   ↓
显示预览
   ↓
用户点击发送
   ↓
通过 WebSocket 发送

实用工具函数

检查消息是否包含图像

/**
 * 检查消息是否包含图像
 */
export function hasImage(message: MultimodalMessage): boolean {
  const content = message.content;

  if (Array.isArray(content)) {
    return content.some((item) => item.type === "image");
  }

  return content.type === "image";
}

获取图像数量

/**
 * 获取消息中的图像数量
 */
export function getImageCount(message: MultimodalMessage): number {
  const content = message.content;

  if (Array.isArray(content)) {
    return content.filter((item) => item.type === "image").length;
  }

  return content.type === "image" ? 1 : 0;
}

提取所有图像

/**
 * 从消息中提取所有图像
 */
export function extractImages(
  message: MultimodalMessage
): ImageContent[] {
  const content = message.content;

  if (Array.isArray(content)) {
    return content.filter((item) => item.type === "image") as ImageContent[];
  }

  if (content.type === "image") {
    return [content as ImageContent];
  }

  return [];
}

小结

本节介绍了图像处理与传输的关键实现：

多模态类型定义
支持四种图像格式
Base64 编码传输
文件大小限制和验证
完整的图像处理流程

架构师