Files

Files

将图像、PDF、音频、视频和文本文件传递给你的 agents，以进行多模态处理。

概览

CrewAI 支持原生多模态文件输入，允许你将图像、PDF、音频、视频和文本文件直接传递给 agents。系统会根据各个 LLM 提供商的 API 要求，自动完成文件格式化。

文件支持需要安装可选的 crewai-files 包。安装方式如下：

bash
uv add 'crewai[file-processing]'

文件处理 API 目前仍处于抢先体验阶段。

文件类型

CrewAI 支持五种特定文件类型，以及一个可自动检测类型的通用 File 类：

类型	类	使用场景
Image	`ImageFile`	照片、截图、示意图、图表
PDF	`PDFFile`	文档、报告、论文
Audio	`AudioFile`	语音录音、播客、会议录音
Video	`VideoFile`	屏幕录制、演示视频
Text	`TextFile`	代码文件、日志、数据文件
Generic	`File`	根据内容自动检测类型

from crewai_files import File, ImageFile, PDFFile, AudioFile, VideoFile, TextFile
image = ImageFile(source="screenshot.png")
pdf = PDFFile(source="report.pdf")
audio = AudioFile(source="meeting.mp3")
video = VideoFile(source="demo.mp4")
text = TextFile(source="data.csv")
file = File(source="document.pdf")

文件来源

source 参数支持多种输入类型，并会自动检测合适的处理方式：

从路径加载

from crewai_files import ImageFile
image = ImageFile(source="./images/chart.png")

从 URL 加载

from crewai_files import ImageFile
image = ImageFile(source="https://example.com/image.png")

从字节数据加载

from crewai_files import ImageFile, FileBytes
image_bytes = download_image_from_api()
image = ImageFile(source=FileBytes(data=image_bytes, filename="downloaded.png"))
image = ImageFile(source=image_bytes)

使用文件

文件可以在多个层级上传递，越具体的层级优先级越高。

在 Crews 中使用

在启动 crew 时传入文件：

from crewai import Crew
from crewai_files import ImageFile
crew = Crew(agents=[analyst], tasks=[analysis_task])
result = crew.kickoff(
    inputs={"topic": "Q4 Sales"},
    input_files={
        "chart": ImageFile(source="sales_chart.png"),
        "report": PDFFile(source="quarterly_report.pdf"),
    }
)

在 Tasks 中使用

将文件附加到特定任务：

from crewai import Task
from crewai_files import ImageFile
task = Task(
    description="Analyze the sales chart and identify trends in {chart}",
    expected_output="A summary of key trends",
    input_files={
        "chart": ImageFile(source="sales_chart.png"),
    }
)

在 Flows 中使用

将文件传递给 flow，文件会自动继承到 crews：

from crewai.flow.flow import Flow, start
from crewai_files import ImageFile
class AnalysisFlow(Flow):
    @start()
    def analyze(self):
        return self.analysis_crew.kickoff()
flow = AnalysisFlow()
result = flow.kickoff(
    input_files={"image": ImageFile(source="data.png")}
)

在独立 Agents 中使用

将文件直接传给 agent 的 kickoff：

from crewai import Agent
from crewai_files import ImageFile
agent = Agent(
    role="Image Analyst",
    goal="Analyze images",
    backstory="Expert at visual analysis",
    llm="gpt-4o",
)
result = agent.kickoff(
    messages="What's in this image?",
    input_files={"photo": ImageFile(source="photo.jpg")},
)

文件优先级

当文件在多个层级传入时，越具体的层级会覆盖更宽泛层级中的同名文件：

Flow input_files < Crew input_files < Task input_files

例如，如果 Flow 和 Task 都定义了名为 "chart" 的文件，则会使用 Task 中的版本。

提供商支持

不同提供商支持的文件类型不同。CrewAI 会自动根据各个提供商的 API 格式要求处理文件。

Provider	Image	PDF	Audio	Video	Text
OpenAI （completions API）	✓
OpenAI （responses API）	✓	✓	✓
Anthropic （claude-3.x）	✓	✓
Google Gemini （gemini-1.5, 2.0, 2.5）	✓	✓	✓	✓	✓
AWS Bedrock （claude-3）	✓	✓
Azure OpenAI （gpt-4o）	✓		✓

Google Gemini 模型支持所有文件类型，包括视频（最长 1 小时，最大 2 GB）。如果你需要处理视频内容，建议使用 Gemini。如果你传入了提供商不支持的文件类型（例如把视频传给 OpenAI），你会收到 UnsupportedFileTypeError。请根据你要处理的文件类型选择对应的提供商。

文件是如何发送的

CrewAI 会自动为每个提供商选择最佳的文件传输方式：

方式	描述	使用场景
Inline Base64	文件直接嵌入请求中	小文件（通常 < 5 MB）
File Upload API	文件单独上传，再通过 ID 引用	超过阈值的大文件
URL Reference	将文件 URL 直接传给模型	文件来源本身就是 URL

提供商传输方式

Provider	Inline Base64	File Upload API	URL References
OpenAI	✓	✓ （> 5 MB）	✓
Anthropic	✓	✓ （> 5 MB）	✓
Google Gemini	✓	✓ （> 20 MB）	✓
AWS Bedrock	✓		✓ （S3 URI）
Azure OpenAI	✓		✓

你无需自行管理这些细节。CrewAI 会根据文件大小和提供商能力，自动选择最高效的方式。对于不支持文件上传 API 的提供商，所有文件都会使用 inline base64 方式传输。

文件处理模式

当文件超过提供商限制时，你可以控制其处理方式：

from crewai_files import ImageFile, PDFFile
image = ImageFile(source="large.png", mode="strict")
image = ImageFile(source="large.png", mode="auto")
image = ImageFile(source="large.png", mode="warn")
pdf = PDFFile(source="large.pdf", mode="chunk")

提供商限制

每个提供商对文件大小和尺寸都有特定限制：

OpenAI

Images：最大 20 MB，每次请求最多 10 张图片
PDFs：最大 32 MB，最多 100 页
Audio：最大 25 MB，最长 25 分钟

Anthropic

Images：最大 5 MB，最大 8000x8000 像素，最多 100 张图片
PDFs：最大 32 MB，最多 100 页

Google Gemini

Images：最大 100 MB
PDFs：最大 50 MB
Audio：最大 100 MB，最长 9.5 小时
Video：最大 2 GB，最长 1 小时

AWS Bedrock

Images：最大 4.5 MB，最大 8000x8000 像素
PDFs：最大 3.75 MB，最多 100 页

在提示词中引用文件

在任务描述中使用文件的键名来引用对应文件：

task = Task(
    description="""
    Analyze the provided materials:
    1. Review the chart in {sales_chart}
    2. Cross-reference with data in {quarterly_report}
    3. Summarize key findings
    """,
    expected_output="Analysis summary with key insights",
    input_files={
        "sales_chart": ImageFile(source="chart.png"),
        "quarterly_report": PDFFile(source="report.pdf"),
    }
)

使用 Mintlify 构建。