2026-06-26 06:13:33

前端 + AI 进阶 Day6: 图片标注与 AI 视觉分析（完整版）

前端 + AI 进阶学习路线｜Week 1-2：流式体验优化

Day 6：图片标注与 AI 视觉分析

学习时间：2025年12月30日（星期二）

关键词：Canvas 标注、画框、圈选、多模态 AI、LLaVA、Base64、视觉理解

📁 项目文件结构

day06-image-annotation/

├── src/

│ ├── components/

│ │ ├── ImageUpload.jsx # 复用 Day 5 的上传组件

│ │ └── ImageAnnotator.jsx # 新增：Canvas 图片标注组件

│ ├── lib/

│ │ └── visualAIClient.js # Ollama LLaVA 流式客户端（模拟）

│ └── App.jsx # 主应用集成（含标注 + 提问）

├── package.json # 需添加 proxy 配置（如对接真实 Ollama）

└── public/

💡 本日核心：前端标注 + 视觉 AI 语义理解，为多模态交互闭环打下基础

🎯 今日学习目标

在 Canvas 上实现矩形框选（Bounding Box）和自由圈选（Lasso）

支持在图片上添加文字标注

将标注区域信息与图片一起发送给多模态 AI（如 LLaVA）

构建“上传 → 标注 → 提问 → AI 视觉回答”完整流程

💡 为什么需要前端图片标注？

用户不仅想问“这张图是什么”，更想问：

“红框里的按钮是干什么的？”

“圈出的部分为什么报错？”

“这张设计稿的字体是什么？”

✅ 前端标注 = 精准视觉上下文 → 提升 AI 回答准确性

📌 标注信息（坐标、区域、标签）需与图片一同传给 AI

📚 核心技术栈

功能

技术

图片绘制与交互

标注数据结构

`{ type: 'rect'

图片转 Base64

canvas.toDataURL('image/jpeg', 0.8)（Day 6 模拟，Day 7 真实）

多模态 AI 输入

图片（Base64） + 文本提示（含标注描述）

⚠️ 注意：Ollama 的 LLaVA 模型支持 Base64 图片输入（需 ollama run llava）

🔧 动手实践：构建可标注的图片分析组件

步骤 1：创建项目并安装依赖

npx create-react-app day06-image-annotation

cd day06-image-annotation

# 本日无需新 npm 包，纯原生 Canvas 实现

💡 如需对接真实 Ollama，后续可添加 @microsoft/fetch-event-source

步骤 2：复用 Day 5 的上传组件

// src/components/ImageUpload.jsx

import { useState, useRef, useCallback, useEffect } from 'react';

const ImageUpload = ({ onFileSelect }) => {

const [previewUrl, setPreviewUrl] = useState(null);

const [isDragging, setIsDragging] = useState(false);

const fileInputRef = useRef(null);

useEffect(() => {

return () => {

if (previewurl) URL.revokeObjectURL(previewUrl);

};

}, [previewUrl]);

const handleFile = useCallback((file) => {

if (!file || !file.type.startsWith('image/')) {

alert('请上传图片文件（PNG/JPG/GIF）');

return;

}

const url = URL.createObjectURL(file);

setPreviewUrl(url);

onFileSelect?.(file);

}, [onFileSelect]);

const handleSelectClick = () => fileInputRef.current?.click();

const handleFileChange = (e) => handleFile(e.target.files?.[0]);

const handleDragOver = (e) => { e.preventDefault(); setIsDragging(true); };

const handleDragLeave = () => setIsDragging(false);

const handleDrop = (e) => { e.preventDefault(); setIsDragging(false); handleFile(e.dataTransfer.files?.[0]); };

return (

style={{

padding: '20px',

border: '2px dashed #ccc',

borderRadius: '8px',

textAlign: 'center',

backgroundColor: isDragging ? '#f0f9ff' : '#fafafa',

cursor: 'pointer',

}}

onDragOver={handleDragOver}

onDragLeave={handleDragLeave}

onDrop={handleDrop}

onClick={handleSelectClick}

tabIndex={0}

{previewUrl ? (

) : (

📸 拖拽图片到此处，或点击选择

)}

);

};

export default ImageUpload;

步骤 3：创建图片标注组件

// src/components/ImageAnnotator.jsx

import { useState, useRef, useEffect } from 'react';

const ImageAnnotator = ({ imageFile, onAnnotated }) => {

const canvasRef = useRef(null);

const [isDrawing, setIsDrawing] = useState(false);

const [annotations, setAnnotations] = useState([]);

const [mode, setMode] = useState('rect'); // 'rect' | 'lasso'

const [tempPoints, setTempPoints] = useState([]);

const imgRef = useRef(null);

// 加载图片到内存

useEffect(() => {

if (!imageFile) return;

const img = new Image();

img.onload = () => {

imgRef.current = img;

drawImageAndAnnotations(img, []);

};

img.src = URL.createObjectURL(imageFile);

return () => URL.revokeObjectURL(img.src);

}, [imageFile]);

const drawImageAndAnnotations = (img, anns) => {

const canvas = canvasRef.current;

if (!canvas || !img) return;

const ctx = canvas.getContext('2d');

const maxWidth = 800;

const scale = Math.min(maxWidth / img.width, 1);

const w = img.width * scale;

const h = img.height * scale;

canvas.width = w;

canvas.height = h;

ctx.clearRect(0, 0, w, h);

ctx.drawImage(img, 0, 0, w, h);

// 绘制已有标注

anns.forEach((ann) => {

ctx.strokeStyle = ann.type === 'rect' ? '#1890ff' : '#f5222d';

ctx.lineWidth = 2;

ctx.beginPath();

if (ann.type === 'rect' && ann.points.length === 2) {

const [start, end] = ann.points;

ctx.rect(start.x, start.y, end.x - start.x, end.y - start.y);

} else if (ann.type === 'lasso' && ann.points.length > 1) {

ctx.moveTo(ann.points[0].x, ann.points[0].y);

ann.points.slice(1).forEach((p) => ctx.lineTo(p.x, p.y));

ctx.closePath();

}

ctx.stroke();

// 绘制标签

if (ann.label) {

ctx.fillStyle = '#1890ff';

ctx.font = '14px sans-serif';

ctx.fillText(ann.label, ann.points[0].x + 5, ann.points[0].y - 5);

}

});

};

const getMousePos = (e) => {

const canvas = canvasRef.current;

const rect = canvas.getBoundingClientRect();

const scaleX = canvas.width / rect.width;

const scaleY = canvas.height / rect.height;

return {

x: (e.clientX - rect.left) * scaleX,

y: (e.clientY - rect.top) * scaleY,

};

const handleMouseDown = (e) => {

if (!imgRef.current) return;

setIsDrawing(true);

const pos = getMousePos(e);

if (mode === 'rect') {

setTempPoints([pos, pos]);

} else if (mode === 'lasso') {

setTempPoints([pos]);

}

};

const handleMouseMove = (e) => {

if (!isDrawing || !imgRef.current) return;

const pos = getMousePos(e);

if (mode === 'rect') {

setTempPoints(([start]) => [start, pos]);

} else if (mode === 'lasso') {

setTempPoints((prev) => [...prev, pos]);

}

};

const handleMouseUp = () => {

if (!isDrawing) return;

setIsDrawing(false);

const label = prompt('请输入标注标签（如“错误弹窗”）：', '');

if (tempPoints.length > 0) {

const newAnn = {

id: Date.now(),

type: mode,

points: [...tempPoints],

label: label || '未命名',

};

const updated = [...annotations, newAnn];

setAnnotations(updated);

drawImageAndAnnotations(imgRef.current, updated);

setTempPoints([]);

onAnnotated?.(updated);

}

};

// 重绘临时图形

useEffect(() => {

if (isDrawing && imgRef.current) {

const allAnns = [...annotations, { type: mode, points: tempPoints }];

drawImageAndAnnotations(imgRef.current, allAnns);

}

}, [tempPoints, isDrawing]);

return (

onClick={() => setMode('rect')}

style={{

marginRight: '8px',

backgroundColor: mode === 'rect' ? '#1890ff' : '#f0f0f0',

color: mode === 'rect' ? 'white' : 'black',

border: '1px solid #ccc',

padding: '4px 8px',

borderRadius: '4px',

}}

🖱️ 矩形框选

onClick={() => setMode('lasso')}

style={{

backgroundColor: mode === 'lasso' ? '#f5222d' : '#f0f0f0',

color: mode === 'lasso' ? 'white' : 'black',

border: '1px solid #ccc',

padding: '4px 8px',

borderRadius: '4px',

}}

✏️ 自由圈选

ref={canvasRef}

onMouseDown={handleMouseDown}

onMouseMove={handleMouseMove}

onMouseUp={handleMouseUp}

onMouseLeave={handleMouseUp}

style={{

border: '1px solid #ddd',

borderRadius: '4px',

cursor: mode === 'rect' ? 'crosshair' : 'cell',

maxWidth: '100%',

backgroundColor: '#fafafa',

}}

{annotations.length > 0 && (

已标注区域 ({annotations.length} 个)：

{annotations.map((ann) => (

{ann.type === 'rect' ? '矩形' : '圈选'}: “{ann.label}”

))}

)}

);

};

export default ImageAnnotator;

步骤 4：创建视觉 AI 客户端（模拟流式）

// src/lib/visualAIClient.js

/**

* 模拟 LLaVA 视觉分析（真实版见 Day 6 扩展）

* 返回预设响应以演示流程

export const streamVisualAnalysis = async ({ prompt, onToken, onComplete }) => {

const mockResponse = `根据你的标注，我分析如下：

- 图中包含一个用户界面截图

- **“错误弹窗”** 区域显示了一个红色警告图标和“网络连接失败”文本

- **“提交按钮”** 是一个蓝色矩形按钮，带有白色“提交”文字

建议：检查网络设置或重试操作。`;

let index = 0;

const interval = setInterval(() => {

if (index < mockResponse.length) {

onToken(mockResponse[index]);

index++;

} else {

clearInterval(interval);

onComplete();

}

}, 30);

};

步骤 5：在 App 中集成完整流程

// src/App.jsx

import { useState } from 'react';

import ImageUpload from './components/ImageUpload';

import ImageAnnotator from './components/ImageAnnotator';

import { streamVisualAnalysis } from './lib/visualAIClient';

function App() {

const [imageFile, setImageFile] = useState(null);

const [annotations, setAnnotations] = useState([]);

const [aiResponse, setAiResponse] = useState('');

const [isAnalyzing, setIsAnalyzing] = useState(false);

const [userQuestion, setUserQuestion] = useState('');

const handleAnnotated = (anns) => {

setAnnotations(anns);

};

const sendToVisualAI = async () => {

if (!imageFile || !userQuestion.trim()) {

alert('请先上传图片并输入问题');

return;

}

// 构建带标注的提示词

let fullPrompt = userQuestion;

if (annotations.length > 0) {

fullPrompt += '\n\n用户特别标注了以下区域：\n';

annotations.forEach((ann, i) => {

fullPrompt += `${i + 1}. ${ann.label}（${ann.type === 'rect' ? '矩形区域' : '圈选区域'}）\n`;

});

}

setAiResponse('');

setIsAnalyzing(true);

await streamVisualAnalysis({

prompt: fullPrompt,

onToken: (token) => {

setAiResponse(prev => prev + token);

onComplete: () => {

setIsAnalyzing(false);

}

});

};

return (

多模态分析：上传 + 标注 + AI 视觉理解

支持矩形框选与自由圈选，精准提问

{!imageFile ? (

) : (

type="text"

value={userQuestion}

onChange={(e) => setUserQuestion(e.target.value)}

placeholder="请输入你的问题（如：红框里是什么？）"

style={{

width: '100%',

padding: '10px',

fontSize: '16px',

borderRadius: '4px',

border: '1px solid #ccc',

marginBottom: '10px',

}}

onClick={sendToVisualAI}

disabled={isAnalyzing}

style={{

padding: '10px 20px',

backgroundColor: isAnalyzing ? '#b0b0b0' : '#52c41a',

color: 'white',

border: 'none',

borderRadius: '4px',

fontSize: '16px',

cursor: isAnalyzing ? 'not-allowed' : 'pointer',

}}

{isAnalyzing ? '🤖 分析中...' : '发送给视觉 AI'}

{aiResponse && (

style={{

marginTop: '16px',

padding: '16px',

backgroundColor: '#f9f9f9',

borderRadius: '8px',

whiteSpace: 'pre-wrap',

lineHeight: 1.6,

border: '1px solid #eee',

}}

{aiResponse}

)}

);

}

export default App;

✅ 效果验证

✅ 上传图片 → 在 Canvas 上绘制

✅ 切换“矩形框选” → 拖拽画框 → 输入标签 → 保存标注

✅ 切换“自由圈选” → 鼠标绘制任意形状 → 输入标签

✅ 标注列表实时更新

✅ 输入问题（如“红框是什么？”）→ 点击“发送给视觉 AI” → 查看流式回答

🤔 思考与延伸

真实 LLaVA 调用：如何将 Base64 图片发送给 Ollama？

// Ollama LLaVA 请求体

{

model: "llava",

prompt: "What's in this image?",

images: [""]

}

性能优化：大图转 Base64 慢？

→ 可压缩图片（Canvas 缩放后 toDataURL）

标注导出：能否导出为 COCO 或 YOLO 格式？

→ 需实现坐标归一化与格式转换

💡 扩展建议：在 visualAIClient.js 中替换为真实 fetchEventSource 调用（参考 Day 4），即可对接本地 Ollama LLaVA。

📅 明日预告

Day 7：批量上传与进度管理

支持多文件拖拽/选择

显示上传进度条（模拟或真实）

构建“批量图片分析”工作流

✍️ 小结

今天，我们赋予了用户“指哪问哪”的能力！通过前端标注，AI 不再盲目猜测，而是聚焦用户关心的区域。视觉 + 语言 + 交互，三位一体的多模态体验，正在成型。

💬 实践提示：真实 LLaVA 调用需先运行 ollama run llava，并确保图片 Base64 不超过模型输入限制。欢迎分享你的标注交互设计！

2026-06-26 06:13:33

前端 + AI 进阶 Day6: 图片标注与 AI 视觉分析（完整版）

多模态分析：上传 + 标注 + AI 视觉理解

Copyright © 2022 星辰幻想游戏活动专区 All Rights Reserved.