2026-03-08

11.6 CLI 图像命令

命令设计

CLI 客户端使用特殊的 image: 命令来发送图像：

1	> image: /path/to/image.png

这个命令设计简洁直观，易于记忆和使用。

命令格式

1	image: <文件路径>

示例：

# 绝对路径
> image: /Users/zhenl/Pictures/screenshot.png

# 相对路径
> image: ./images/chart.jpg

# Home 目录
> image: ~/Downloads/photo.png

实现解析

命令检测

// src/client/cli.ts

/**
 * 检查是否为图像命令
 */
function isImageCommand(input: string): boolean {
  return input.trim().startsWith("image:");
}

/**
 * 提取图像路径
 */
function extractImagePath(input: string): string {
  const match = input.match(/image:\s*(.+)/);
  if (!match) {
    throw new Error("无效的图像命令格式");
  }
  return match[1].trim();
}

路径展开

// src/client/cli.ts

import * as path from "path";
import * as os from "os";

/**
 * 展开文件路径
 */
function expandPath(filePath: string): string {
  // 展开 ~ 为 Home 目录
  if (filePath.startsWith("~")) {
    return path.join(os.homedir(), filePath.slice(1));
  }

  // 展开相对路径
  return path.resolve(filePath);
}

图像加载流程

完整实现

// src/client/cli.ts

import * as fs from "fs/promises";
import { readImageAsBase64 } from "../types/multodal.js";

const MAX_IMAGE_SIZE = 10 * 1024 * 1024; // 10MB

/**
 * 处理图像命令
 */
async function handleImageCommand(
  input: string
): Promise<MultimodalMessage> {
  // 1. 提取路径
  const filePath = extractImagePath(input);

  // 2. 展开路径
  const expandedPath = expandPath(filePath);

  // 3. 检查文件存在
  try {
    await fs.access(expandedPath);
  } catch {
    throw new Error(`文件不存在: ${expandedPath}`);
  }

  // 4. 检查文件大小
  const stats = await fs.stat(expandedPath);
  if (stats.size > MAX_IMAGE_SIZE) {
    const sizeMB = (stats.size / 1024 / 1024).toFixed(2);
    const maxMB = (MAX_IMAGE_SIZE / 1024 / 1024).toFixed(0);
    throw new Error(
      `图像文件过大 (${sizeMB}MB)，最大支持 ${maxMB}MB`
    );
  }

  // 5. 读取并转换为 base64
  const imageContent = await readImageAsBase64(expandedPath);

  // 6. 构建多模态消息
  return {
    role: "user",
    content: imageContent,
  };
}

错误处理

// src/client/cli.ts

/**
 * 安全处理图像命令
 */
async function safeHandleImageCommand(
  input: string,
  onImage: (message: MultimodalMessage) => void,
  onError: (error: string) => void
): Promise<void> {
  try {
    const message = await handleImageCommand(input);
    onImage(message);
  } catch (error) {
    const errorMessage = error instanceof Error
      ? error.message
      : "未知错误";
    onError(errorMessage);
  }
}

CLI 主循环集成

修改后的输入处理

// src/client/cli.ts

import * as readline from "readline";

export class CLIClient {
  private rl: readline.Interface;
  private pendingImage: MultimodalMessage | null = null;

  async start() {
    this.rl = readline.createInterface({
      input: process.stdin,
      output: process.stdout,
    });

    // 设置提示符
    this.setPrompt();

    // 监听输入
    this.rl.on("line", async (input) => {
      await this.handleInput(input.trim());
      this.setPrompt();
    });
  }

  /**
   * 处理用户输入
   */
  private async handleInput(input: string): Promise<void> {
    if (!input) {
      return;
    }

    // 特殊命令
    if (input === "/exit" || input === "/quit") {
      this.shutdown();
      return;
    }

    // 图像命令
    if (isImageCommand(input)) {
      await safeHandleImageCommand(
        input,
        (message) => {
          this.pendingImage = message;
          console.log("✓ 图像已加载，可以添加文本描述或直接按回车发送");
        },
        (error) => {
          console.error(`✗ ${error}`);
        }
      );
      return;
    }

    // 构建消息
    let message: MultimodalMessage;

    if (this.pendingImage) {
      // 图像 + 文本
      message = {
        role: "user",
        content: [
          this.pendingImage.content,
          { type: "text", text: input },
        ],
      };
      this.pendingImage = null;
    } else {
      // 纯文本
      message = {
        role: "user",
        content: { type: "text", text: input },
      };
    }

    // 发送到服务器
    await this.sendMessage(message);
  }

  /**
   * 设置提示符
   */
  private setPrompt(): void {
    if (this.pendingImage) {
      this.rl.setPrompt("🖼️  > ");
    } else {
      this.rl.setPrompt("> ");
    }
    this.rl.prompt();
  }
}

使用示例

场景 1：发送纯图像

> image: ./screenshot.png
✓ 图像已加载，可以添加文本描述或直接按回车发送
🖼️  >
[按回车]
Agent: 我看到这是一个代码编辑器的截图...

场景 2：发送图像 + 文本

> image: ~/Downloads/chart.png
✓ 图像已加载，可以添加文本描述或直接按回车发送
🖼️  > 请解释这个图表
Agent: 这个折线图显示了 2024 年的销售趋势...

场景 3：取消图像

> image: ./photo.jpg
✓ 图像已加载，可以添加文本描述或直接按回车发送
🖼️  > /cancel
> _图像已取消

交互增强

进度提示

/**
 * 带进度提示的图像加载
 */
async function loadImageWithProgress(
  filePath: string
): Promise<ImageContent> {
  console.log(`正在加载图像: ${filePath}`);

  const expandedPath = expandPath(filePath);

  // 检查文件
  const stats = await fs.stat(expandedPath);
  const sizeMB = (stats.size / 1024 / 1024).toFixed(2);
  console.log(`  文件大小: ${sizeMB}MB`);

  // 读取文件
  const imageContent = await readImageAsBase64(expandedPath);

  console.log(`  格式: ${imageContent.mediaType}`);
  console.log("✓ 图像加载完成");

  return imageContent;
}

颜色输出

// 使用颜色增强输出

console.log("\x1b[32m%s\x1b[0m", "✓ 图像已加载");  // 绿色
console.log("\x1b[31m%s\x1b[0m", "✗ 加载失败");  // 红色
console.log("\x1b[33m%s\x1b[0m", "⚠ 文件过大");  // 黄色

自动完成（可选）

使用 readline 实现简单的路径补全：

// 自动完成函数
function completer(line: string): [string[], string] {
  const completions: string[] = [];
  const hits: string[] = [];

  // 如果是图像命令
  if (line.startsWith("image:")) {
    const prefix = line.slice(6).trim();
    const dir = path.dirname(prefix) || ".";

    try {
      const files = await fs.readdir(dir);
      files.forEach((file) => {
        if (file.startsWith(path.basename(prefix))) {
          hits.push(`image: ${path.join(dir, file)}`);
        }
      });
    } catch {
      // 忽略错误
    }
  }

  return [hits, line];
}

// 设置自动完成
this.rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
  completer: completer,
});

命令历史

图像命令会保存在命令历史中，方便重复使用：

> history
1  image: ./screenshot.png
2  请描述这个
3  image: ~/Downloads/chart.png
4  分析数据

> !1
> image: ./screenshot.png

错误场景处理

文件不存在

1 2	> image: ./nonexistent.png ✗ 文件不存在: /path/to/nonexistent.png

文件过大

1 2	> image: ./large-photo.tif ✗ 图像文件过大 (15.23MB)，最大支持 10MB

不支持的格式

1 2	> image: ./document.pdf ✗ 不支持的文件格式: .pdf（仅支持 png, jpg, gif, webp）

读取失败

1 2	> image: /protected/photo.png ✗ 无法读取文件: Permission denied

小结

本节介绍了 CLI 图像命令的完整实现：

image: 命令格式和解析
文件路径展开和验证
图像加载和转换流程
CLI 主循环集成
交互增强功能
错误处理和场景

第十一章总结

本章详细介绍了多模态 AI Agent 的实现：

11.1 多模态 AI 概述

多模态 AI 的定义和优势
视觉能力在 Agent 中的重要性
主流视觉模型介绍

11.2 视觉模型集成

LLMClient 视觉模型支持
自动模型选择逻辑
消息格式转换

11.3 图像处理与传输

多模态类型定义
图像格式支持
Base64 编码传输
文件大小限制

11.4 多模态消息协议

WebSocket 协议扩展
消息验证
服务器端处理

11.5 Web UI 图像功能

图像上传和预览
文件验证
多模态消息发送

11.6 CLI 图像命令

image: 命令实现
路径处理
CLI 集成

通过本章的学习，你已经掌握了如何在 AI Agent 中集成视觉能力，使 Agent 能够理解和分析图像内容。

下一章： 第十二章：联网搜索

架构师