Knowledge

Knowledge

CrewAI 中的 Knowledge 是什么，以及如何使用它。

概览

CrewAI 中的 Knowledge 是一个强大的系统，它允许 AI Agent 在执行任务时访问并利用外部信息源。你可以把它理解为：为你的 Agent 配备了一个可以在工作时随时查阅的参考资料库。

使用 Knowledge 的主要优势： 用领域特定信息增强 Agent 能力 用真实世界数据支持决策 在多轮对话中保持上下文 让回答建立在事实信息之上

快速开始示例

对于基于文件的 Knowledge Source，请确保将文件放在项目根目录下的 knowledge 目录中。
同时，在创建 source 时，请使用相对于 knowledge 目录的相对路径。

向量存储（RAG）客户端配置

CrewAI 提供了一个与 provider 无关的 RAG 客户端抽象，用于对接向量存储。默认 provider 是 ChromaDB，同时也支持 Qdrant。你可以通过配置工具切换 provider。

当前支持：

ChromaDB（默认）
Qdrant

from crewai.rag.config.utils import set_rag_config, get_rag_client, clear_rag_config
# ChromaDB (default)
from crewai.rag.chromadb.config import ChromaDBConfig
set_rag_config(ChromaDBConfig())
chromadb_client = get_rag_client()
# Qdrant
from crewai.rag.qdrant.config import QdrantConfig
set_rag_config(QdrantConfig())
qdrant_client = get_rag_client()
# Example operations (same API for any provider)
client = qdrant_client  # or chromadb_client
client.create_collection(collection_name="docs")
client.add_documents(
    collection_name="docs",
    documents=[{"id": "1", "content": "CrewAI enables collaborative AI agents."}],
)
results = client.search(collection_name="docs", query="collaborative agents", limit=3)
clear_rag_config()  # optional reset

这个 RAG 客户端独立于 Knowledge 内置的存储系统。当你需要直接控制向量存储，或构建自定义检索管道时，可以使用它。

基础字符串 Knowledge 示例

from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create a knowledge source
content = "Users name is John. He is 30 years old and lives in San Francisco."
string_source = StringKnowledgeSource(content=content)
# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)
# Create an agent with the knowledge store
agent = Agent(
    role="About User",
    goal="You know everything about the user.",
    backstory="You are a master at understanding people and their preferences.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the user: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)
crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[string_source], # Enable knowledge by adding the sources here
)
result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

Web 内容 Knowledge 示例

要让下面的示例正常工作，你需要安装 docling：uv add docling

from crewai import LLM, Agent, Crew, Process, Task
from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
# Create a knowledge source from web content
content_source = CrewDoclingSource(
    file_paths=[
        "https://lilianweng.github.io/posts/2024-11-28-reward-hacking",
        "https://lilianweng.github.io/posts/2024-07-07-hallucination",
    ],
)
# Create an LLM with a temperature of 0 to ensure deterministic outputs
llm = LLM(model="gpt-4o-mini", temperature=0)
# Create an agent with the knowledge store
agent = Agent(
    role="About papers",
    goal="You know everything about the papers.",
    backstory="You are a master at understanding papers and their content.",
    verbose=True,
    allow_delegation=False,
    llm=llm,
)
task = Task(
    description="Answer the following questions about the papers: {question}",
    expected_output="An answer to the question.",
    agent=agent,
)
crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
    process=Process.sequential,
    knowledge_sources=[content_source],
)
result = crew.kickoff(
    inputs={"question": "What is the reward hacking paper about? Be sure to provide sources."}
)

支持的 Knowledge Source

CrewAI 原生支持多种类型的 Knowledge Source：

原始字符串 文本文件（.txt）
PDF 文档
CSV 文件
Excel 电子表格 JSON 文档

文本文件 Knowledge Source

from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
text_source = TextFileKnowledgeSource(
    file_paths=["document.txt", "another.txt"]
)

PDF Knowledge Source

from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
pdf_source = PDFKnowledgeSource(
    file_paths=["document.pdf", "another.pdf"]
)

CSV Knowledge Source

from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource
csv_source = CSVKnowledgeSource(
    file_paths=["data.csv"]
)

Excel Knowledge Source

from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource
excel_source = ExcelKnowledgeSource(
    file_paths=["spreadsheet.xlsx"]
)

JSON Knowledge Source

from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource
json_source = JSONKnowledgeSource(
    file_paths=["data.json"]
)

请确保你已经创建了 ./knowledge 文件夹。所有源文件（例如 .txt、.pdf、.xlsx、.json）都应放在这个文件夹中，以便集中管理。

Agent 与 Crew 级别 Knowledge：完整指南

理解 Knowledge 层级：CrewAI 同时支持 Agent 级别和 Crew 级别的 Knowledge。本节会明确说明两者各自如何工作、何时初始化，并澄清关于依赖关系的常见误解。

Knowledge 初始化的实际工作方式

下面是使用 Knowledge 时实际发生的过程：

Agent 级别 Knowledge（独立）

from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Agent with its own knowledge - NO crew knowledge needed
specialist_knowledge = StringKnowledgeSource(
    content="Specialized technical information for this agent only"
)
specialist_agent = Agent(
    role="Technical Specialist",
    goal="Provide technical expertise",
    backstory="Expert in specialized technical domains",
    knowledge_sources=[specialist_knowledge]  # Agent-specific knowledge
)
task = Task(
    description="Answer technical questions",
    agent=specialist_agent,
    expected_output="Technical answer"
)
# No crew-level knowledge required
crew = Crew(
    agents=[specialist_agent],
    tasks=[task]
)
result = crew.kickoff()  # Agent knowledge works independently

在 `crew.kickoff()` 期间会发生什么

当你调用 crew.kickoff() 时，执行顺序如下：

# During kickoff
for agent in self.agents:
    agent.crew = self  # Agent gets reference to crew
    agent.set_knowledge(crew_embedder=self.embedder)  # Agent knowledge initialized
    agent.create_agent_executor()

存储独立性

每个 Knowledge 层级都使用独立的存储集合：

# Agent knowledge storage
agent_collection_name = agent.role  # e.g., "Technical Specialist"
# Crew knowledge storage  
crew_collection_name = "crew"
# Both stored in same ChromaDB instance but different collections
# Path: ~/.local/share/CrewAI/{project}/knowledge/
#   ├── crew/                    # Crew knowledge collection
#   ├── Technical Specialist/    # Agent knowledge collection
#   └── Another Agent Role/      # Another agent's collection

完整可运行示例

示例 1：仅 Agent Knowledge

from crewai import Agent, Task, Crew
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Agent-specific knowledge
agent_knowledge = StringKnowledgeSource(
    content="Agent-specific information that only this agent needs"
)
agent = Agent(
    role="Specialist",
    goal="Use specialized knowledge",
    backstory="Expert with specific knowledge",
    knowledge_sources=[agent_knowledge],
    embedder={  # Agent can have its own embedder
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"}
    }
)
task = Task(
    description="Answer using your specialized knowledge",
    agent=agent,
    expected_output="Answer based on agent knowledge"
)
# No crew knowledge needed
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()  # Works perfectly

示例 2：同时使用 Agent 和 Crew Knowledge

# Crew-wide knowledge (shared by all agents)
crew_knowledge = StringKnowledgeSource(
    content="Company policies and general information for all agents"
)
# Agent-specific knowledge
specialist_knowledge = StringKnowledgeSource(
    content="Technical specifications only the specialist needs"
)
specialist = Agent(
    role="Technical Specialist",
    goal="Provide technical expertise",
    backstory="Technical expert",
    knowledge_sources=[specialist_knowledge]  # Agent-specific
)
generalist = Agent(
    role="General Assistant", 
    goal="Provide general assistance",
    backstory="General helper"
    # No agent-specific knowledge
)
crew = Crew(
    agents=[specialist, generalist],
    tasks=[...],
    knowledge_sources=[crew_knowledge]  # Crew-wide knowledge
)
# Result:
# - specialist gets: crew_knowledge + specialist_knowledge
# - generalist gets: crew_knowledge only

示例 3：多个 Agent 使用不同 Knowledge

# Different knowledge for different agents
sales_knowledge = StringKnowledgeSource(content="Sales procedures and pricing")
tech_knowledge = StringKnowledgeSource(content="Technical documentation")
support_knowledge = StringKnowledgeSource(content="Support procedures")
sales_agent = Agent(
    role="Sales Representative",
    knowledge_sources=[sales_knowledge],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
)
tech_agent = Agent(
    role="Technical Expert", 
    knowledge_sources=[tech_knowledge],
    embedder={"provider": "ollama", "config": {"model": "mxbai-embed-large"}}
)
support_agent = Agent(
    role="Support Specialist",
    knowledge_sources=[support_knowledge]
    # Will use crew embedder as fallback
)
crew = Crew(
    agents=[sales_agent, tech_agent, support_agent],
    tasks=[...],
    embedder={  # Fallback embedder for agents without their own
        "provider": "google-generativeai",
        "config": {"model_name": "gemini-embedding-001"}
    }
)
# Each agent gets only their specific knowledge
# Each can use different embedding providers

与通过工具从向量数据库中检索不同，预加载了 Knowledge 的 Agent 不需要额外的 retrieval persona 或 task。
只需将 Agent 或 crew 正常运行所需的相关 Knowledge Source 添加进去即可。 Knowledge Source 可以加在 Agent 级别，也可以加在 Crew 级别。 Crew 级别的 Knowledge Source 会被 crew 中所有 Agent 使用。 Agent 级别的 Knowledge Source 只会被 预加载该 Knowledge 的特定 Agent 使用。

Knowledge 配置

你可以为 crew 或 agent 配置 Knowledge 参数。

from crewai.knowledge.knowledge_config import KnowledgeConfig
knowledge_config = KnowledgeConfig(results_limit=10, score_threshold=0.5)
agent = Agent(
    ...
    knowledge_config=knowledge_config
)

results_limit：返回的相关文档数量。默认是 3。
score_threshold：文档被认为相关所需达到的最低分数。默认是 0.35。

支持的 Knowledge 参数

提供内容以供存储和查询的 Knowledge Source 列表。可以包括 PDF、CSV、Excel、JSON、文本文件或字符串内容。存储 Knowledge 的集合名称，用于区分不同知识集。如果未提供，默认是 "knowledge"。自定义存储配置，用于管理 Knowledge 的存储与检索方式。如果未提供，则会自动创建默认存储。

Knowledge 存储透明性

理解 Knowledge 存储：CrewAI 会自动将 Knowledge Source 存储在与平台相关的目录中，并使用 ChromaDB 作为向量存储。了解这些默认位置和行为有助于生产部署、调试和存储管理。

CrewAI 将 Knowledge 文件存储在哪里

默认情况下，CrewAI 使用与 memory 相同的存储系统，并将 Knowledge 存储在与平台相关的目录中：

各平台默认存储位置

macOS：

~/Library/Application Support/CrewAI/{project_name}/
└── knowledge/                    # Knowledge ChromaDB files
    ├── chroma.sqlite3           # ChromaDB metadata
    ├── {collection_id}/         # Vector embeddings
    └── knowledge_{collection}/  # Named collections

Linux：

~/.local/share/CrewAI/{project_name}/
└── knowledge/
    ├── chroma.sqlite3
    ├── {collection_id}/
    └── knowledge_{collection}/

Windows：

C:\Users\{username}\AppData\Local\CrewAI\{project_name}\
└── knowledge\
    ├── chroma.sqlite3
    ├── {collection_id}\
    └── knowledge_{collection}\

查找你的 Knowledge 存储位置

要查看 CrewAI 具体将 Knowledge 文件存储在哪里：

from crewai.utilities.paths import db_storage_path
import os
# Get the knowledge storage path
knowledge_path = os.path.join(db_storage_path(), "knowledge")
print(f"Knowledge storage location: {knowledge_path}")
# List knowledge collections and files
if os.path.exists(knowledge_path):
    print("\nKnowledge storage contents:")
    for item in os.listdir(knowledge_path):
        item_path = os.path.join(knowledge_path, item)
        if os.path.isdir(item_path):
            print(f"📁 Collection: {item}/")
            # Show collection contents
            try:
                for subitem in os.listdir(item_path):
                    print(f"   └── {subitem}")
            except PermissionError:
                print(f"   └── (permission denied)")
        else:
            print(f"📄 {item}")
else:
    print("No knowledge storage found yet.")

控制 Knowledge 存储位置

方案 1：环境变量（推荐）

import os
from crewai import Crew
# Set custom storage location for all CrewAI data
os.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"
# All knowledge will now be stored in ./my_project_storage/knowledge/
crew = Crew(
    agents=[...],
    tasks=[...],
    knowledge_sources=[...]
)

方案 2：自定义 Knowledge 存储

from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create custom storage with specific embedder
custom_storage = KnowledgeStorage(
    embedder={
        "provider": "ollama",
        "config": {"model": "mxbai-embed-large"}
    },
    collection_name="my_custom_knowledge"
)
# Use with knowledge sources
knowledge_source = StringKnowledgeSource(
    content="Your knowledge content here"
)
knowledge_source.storage = custom_storage

方案 3：项目级 Knowledge 存储

import os
from pathlib import Path
# Store knowledge in project directory
project_root = Path(__file__).parent
knowledge_dir = project_root / "knowledge_storage"
os.environ["CREWAI_STORAGE_DIR"] = str(knowledge_dir)
# Now all knowledge will be stored in your project directory

默认 Embedding Provider 行为

默认 Embedding Provider：CrewAI 默认使用 OpenAI 的 embeddings（text-embedding-3-small）来存储 Knowledge，即使你使用的是其他 LLM provider。你可以很方便地将其改成与你当前方案一致的配置。

理解默认行为

from crewai import Agent, Crew, LLM
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# When using Claude as your LLM...
agent = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    llm=LLM(provider="anthropic", model="claude-3-sonnet")  # Using Claude
)
# CrewAI will still use OpenAI embeddings by default for knowledge
# This ensures consistency but may not match your LLM provider preference
knowledge_source = StringKnowledgeSource(content="Research data...")
crew = Crew(
    agents=[agent],
    tasks=[...],
    knowledge_sources=[knowledge_source]
    # Default: Uses OpenAI embeddings even with Claude LLM
)

自定义 Knowledge Embedding Provider

# Option 1: Use Voyage AI (recommended by Anthropic for Claude users)
crew = Crew(
    agents=[agent],
    tasks=[...],
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "voyageai",  # Recommended for Claude users
        "config": {
            "api_key": "your-voyage-api-key",
            "model": "voyage-3"  # or "voyage-3-large" for best quality
        }
    }
)
# Option 2: Use local embeddings (no external API calls)
crew = Crew(
    agents=[agent],
    tasks=[...],
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "ollama",
        "config": {
            "model": "mxbai-embed-large",
            "url": "http://localhost:11434/api/embeddings"
        }
    }
)
# Option 3: Agent-level embedding customization
agent = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "google-generativeai",
        "config": {
            "model_name": "gemini-embedding-001",
            "api_key": "your-google-key"
        }
    }
)

配置 Azure OpenAI Embeddings

当使用 Azure OpenAI embeddings 时：

请先确保你已经在 Azure 平台中部署了 embedding 模型
然后使用如下配置：

agent = Agent(
    role="Researcher",
    goal="Research topics",
    backstory="Expert researcher",
    knowledge_sources=[knowledge_source],
    embedder={
        "provider": "azure",
        "config": {
            "api_key": "your-azure-api-key",
            "model": "text-embedding-ada-002", # change to the model you are using and is deployed in Azure
            "api_base": "https://your-azure-endpoint.openai.azure.com/",
            "api_version": "2024-02-01"
        }
    }
)

高级功能

查询改写

CrewAI 实现了一种智能查询改写机制，用于优化 Knowledge 检索。当 Agent 需要在 Knowledge Source 中搜索时，原始任务 Prompt 会被自动转换为更有效的检索查询。

查询改写如何工作

当 Agent 执行任务且存在 Knowledge Source 可用时，会触发 _get_knowledge_search_query 方法
Agent 的 LLM 会将原始任务 Prompt 转换为优化后的检索查询
然后使用这个优化后的查询，从 Knowledge Source 中检索相关信息

查询改写的优势

通过聚焦关键概念并移除无关内容，查询改写可以帮助检索到更相关的信息。改写后的查询在向量数据库检索时会更具体，也更具上下文感知能力。

示例

# Original task prompt
task_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."
# Behind the scenes, this might be rewritten as:
rewritten_query = "What movies did John watch last week?"

改写后的查询更聚焦于核心信息需求，并去掉了与输出格式相关的无关说明。

这一机制是全自动的，用户无需任何配置。查询改写由 Agent 的 LLM 完成，因此使用能力更强的 LLM 往往能提升改写质量。

Knowledge 事件

CrewAI 会在 Knowledge 检索过程中发出事件，你可以通过事件系统监听这些事件。这使你能够监控、调试和分析 Agent 是如何检索和使用 Knowledge 的。

可用的 Knowledge 事件

KnowledgeRetrievalStartedEvent：当 Agent 开始从 Knowledge Source 中检索信息时触发
KnowledgeRetrievalCompletedEvent：当 Knowledge 检索完成时触发，包含所使用的查询和检索到的内容
KnowledgeQueryStartedEvent：当对 Knowledge Source 发起查询时触发
KnowledgeQueryCompletedEvent：当查询成功完成时触发
KnowledgeQueryFailedEvent：当对 Knowledge Source 的查询失败时触发
KnowledgeSearchQueryFailedEvent：当搜索查询失败时触发

示例：监控 Knowledge 检索

from crewai.events import (
    KnowledgeRetrievalStartedEvent,
    KnowledgeRetrievalCompletedEvent,
    BaseEventListener,
)
class KnowledgeMonitorListener(BaseEventListener):
    def setup_listeners(self, crewai_event_bus):
        @crewai_event_bus.on(KnowledgeRetrievalStartedEvent)
        def on_knowledge_retrieval_started(source, event):
            print(f"Agent '{event.agent.role}' started retrieving knowledge")
        @crewai_event_bus.on(KnowledgeRetrievalCompletedEvent)
        def on_knowledge_retrieval_completed(source, event):
            print(f"Agent '{event.agent.role}' completed knowledge retrieval")
            print(f"Query: {event.query}")
            print(f"Retrieved {len(event.retrieved_knowledge)} knowledge chunks")
# Create an instance of your listener
knowledge_monitor = KnowledgeMonitorListener()

更多关于事件使用的信息，请参考 Event Listeners 文档。

自定义 Knowledge Source

CrewAI 允许你通过继承 BaseKnowledgeSource 类，为任意类型的数据创建自定义 Knowledge Source。下面我们用一个实际示例来说明：抓取并处理太空新闻文章。

太空新闻 Knowledge Source 示例

python
  from crewai import Agent, Task, Crew, Process, LLM
  from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
  import requests
  from datetime import datetime
  from typing import Dict, Any
  from pydantic import BaseModel, Field

class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
"""Knowledge source that fetches data from Space News API."""

apiendpoint: str = Field(description=”API endpoint URL”) limit: int = Field(default=10, description=”Number of articles to fetch”) def load_content(self) -> Dict[Any, str]: “””Fetch and format space news articles.””” try: response = requests.get( f”{self.api_endpoint}?limit={self.limit}” ) response.raise_for_status() data = response.json() articles = data.get(‘results’, []) formatted_data = self.validate_content(articles) return {self.api_endpoint: formatted_data} except Exception as e: raise ValueError(f”Failed to fetch space news: {str(e)}”) def validate_content(self, articles: list) -> str: “””Format articles into readable text.””” formatted = “Space News Articles:\n\n” for article in articles: formatted += f””” Title: {article[‘title’]} Published: {article[‘published_at’]} Summary: {article[‘summary’]} News Site: {article[‘news_site’]} URL: {article[‘url’]} —————————-“”” return formatted def add(self) -> None: “””Process and store the articles.””” content = self.load_content() for , text in content.items(): chunks = self._chunk_text(text) self.chunks.extend(chunks) self._save_documents()

# Create knowledge source

recent_news = SpaceNewsKnowledgeSource(
api_endpoint="[https://api.spaceflightnewsapi.net/v4/articles](https://api.spaceflightnewsapi.net/v4/articles)",
limit=10,
)

# Create specialized agent

space_analyst = Agent(
role="Space News Analyst",
goal="Answer questions about space news accurately and comprehensively",
backstory="""You are a space industry analyst with expertise in space exploration,
satellite technology, and space industry trends. You excel at answering questions
about space news and providing detailed, accurate information.""",
knowledge_sources=[recent_news],
llm=LLM(model="gpt-4", temperature=0.0)
)

# Create task that handles user questions

analysis_task = Task(
description="Answer this question about space news: {user_question}",
expected_output="A detailed answer based on the recent space news articles",
agent=space_analyst
)

# Create and run the crew

crew = Crew(
agents=[space_analyst],
tasks=[analysis_task],
verbose=True,
process=Process.sequential
)

# Example usage

result = crew.kickoff(
inputs={"user_question": "What are the latest developments in space exploration?"}
)

````

output # Agent: Space News Analyst ## Task: Answer this question about space news: What are the latest developments in space exploration? # Agent: Space News Analyst ## Final Answer: The latest developments in space exploration, based on recent space news articles, include the following: 1. SpaceX has received the final regulatory approvals to proceed with the second integrated Starship/Super Heavy launch, scheduled for as soon as the morning of Nov. 17, 2023. This is a significant step in SpaceX’s ambitious plans for space exploration and colonization. Source: SpaceNews 2. SpaceX has also informed the US Federal Communications Commission (FCC) that it plans to begin launching its first next-generation Starlink Gen2 satellites. This represents a major upgrade to the Starlink satellite internet service, which aims to provide high-speed internet access worldwide. Source: Teslarati 3. AI startup Synthetaic has raised $15 million in Series B funding. The company uses artificial intelligence to analyze data from space and air sensors, which could have significant applications in space exploration and satellite technology. Source: SpaceNews 4. The Space Force has formally established a unit within the U.S. Indo-Pacific Command, marking a permanent presence in the Indo-Pacific region. This could have significant implications for space security and geopolitics. Source: SpaceNews 5. Slingshot Aerospace, a space tracking and data analytics company, is expanding its network of ground-based optical telescopes to increase coverage of low Earth orbit. This could improve our ability to track and analyze objects in low Earth orbit, including satellites and space debris. Source: SpaceNews 6. The National Natural Science Foundation of China has outlined a five-year project for researchers to study the assembly of ultra-large spacecraft. This could lead to significant advancements in spacecraft technology and space exploration capabilities. Source: SpaceNews 7. The Center for AEroSpace Autonomy Research (CAESAR) at Stanford University is focusing on spacecraft autonomy. The center held a kickoff event on May 22, 2024, to highlight the industry, academia, and government collaboration it seeks to foster. This could lead to significant advancements in autonomous spacecraft technology. Source: SpaceNews ````

调试与故障排查

调试 Knowledge 问题

检查 Agent Knowledge 初始化

from crewai import Agent, Crew, Task
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
knowledge_source = StringKnowledgeSource(content="Test knowledge")
agent = Agent(
    role="Test Agent",
    goal="Test knowledge",
    backstory="Testing",
    knowledge_sources=[knowledge_source]
)
crew = Crew(agents=[agent], tasks=[Task(...)])
# Before kickoff - knowledge not initialized
print(f"Before kickoff - Agent knowledge: {getattr(agent, 'knowledge', None)}")
crew.kickoff()
# After kickoff - knowledge initialized
print(f"After kickoff - Agent knowledge: {agent.knowledge}")
print(f"Agent knowledge collection: {agent.knowledge.storage.collection_name}")
print(f"Number of sources: {len(agent.knowledge.sources)}")

验证 Knowledge 存储位置

import os
from crewai.utilities.paths import db_storage_path
# Check storage structure
storage_path = db_storage_path()
knowledge_path = os.path.join(storage_path, "knowledge")
if os.path.exists(knowledge_path):
    print("Knowledge collections found:")
    for collection in os.listdir(knowledge_path):
        collection_path = os.path.join(knowledge_path, collection)
        if os.path.isdir(collection_path):
            print(f"  - {collection}/")
            # Show collection contents
            for item in os.listdir(collection_path):
                print(f"    └── {item}")

测试 Knowledge 检索

# Test agent knowledge retrieval
if hasattr(agent, 'knowledge') and agent.knowledge:
    test_query = ["test query"]
    results = agent.knowledge.query(test_query)
    print(f"Agent knowledge results: {len(results)} documents found")
    # Test crew knowledge retrieval (if exists)
    if hasattr(crew, 'knowledge') and crew.knowledge:
        crew_results = crew.query_knowledge(test_query)
        print(f"Crew knowledge results: {len(crew_results)} documents found")

检查 Knowledge 集合

import chromadb
from crewai.utilities.paths import db_storage_path
import os
# Connect to CrewAI's knowledge ChromaDB
knowledge_path = os.path.join(db_storage_path(), "knowledge")
if os.path.exists(knowledge_path):
    client = chromadb.PersistentClient(path=knowledge_path)
    collections = client.list_collections()
    print("Knowledge Collections:")
    for collection in collections:
        print(f"  - {collection.name}: {collection.count()} documents")
        # Sample a few documents to verify content
        if collection.count() > 0:
            sample = collection.peek(limit=2)
            print(f"    Sample content: {sample['documents'][0][:100]}...")
else:
    print("No knowledge storage found")

检查 Knowledge 处理过程

from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
# Create a test knowledge source
test_source = StringKnowledgeSource(
    content="Test knowledge content for debugging",
    chunk_size=100,  # Small chunks for testing
    chunk_overlap=20
)
# Check chunking behavior
print(f"Original content length: {len(test_source.content)}")
print(f"Chunk size: {test_source.chunk_size}")
print(f"Chunk overlap: {test_source.chunk_overlap}")
# Process and inspect chunks
test_source.add()
print(f"Number of chunks created: {len(test_source.chunks)}")
for i, chunk in enumerate(test_source.chunks[:3]):  # Show first 3 chunks
    print(f"Chunk {i+1}: {chunk[:50]}...")

常见 Knowledge 存储问题

“File not found” 错误：

# Ensure files are in the correct location
from crewai.utilities.constants import KNOWLEDGE_DIRECTORY
import os
knowledge_dir = KNOWLEDGE_DIRECTORY  # Usually "knowledge"
file_path = os.path.join(knowledge_dir, "your_file.pdf")
if not os.path.exists(file_path):
    print(f"File not found: {file_path}")
    print(f"Current working directory: {os.getcwd()}")
    print(f"Expected knowledge directory: {os.path.abspath(knowledge_dir)}")

“Embedding dimension mismatch” 错误：

# This happens when switching embedding providers
# Reset knowledge storage to clear old embeddings
crew.reset_memories(command_type='knowledge')
# Or use consistent embedding providers
crew = Crew(
    agents=[...],
    tasks=[...],
    knowledge_sources=[...],
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
)

“ChromaDB permission denied” 错误：

# Fix storage permissions
chmod -R 755 ~/.local/share/CrewAI/

Knowledge 在多次运行之间没有持久化：

# Verify storage location consistency
import os
from crewai.utilities.paths import db_storage_path
print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
print("Computed storage path:", db_storage_path())
print("Knowledge path:", os.path.join(db_storage_path(), "knowledge"))

Knowledge 重置命令

# Reset only agent-specific knowledge
crew.reset_memories(command_type='agent_knowledge')
# Reset both crew and agent knowledge  
crew.reset_memories(command_type='knowledge')
# CLI commands
# crewai reset-memories --agent-knowledge  # Agent knowledge only
# crewai reset-memories --knowledge        # All knowledge

清除 Knowledge

如果你需要清除 CrewAI 中存储的 Knowledge，可以使用带 --knowledge 选项的 crewai reset-memories 命令。

crewai reset-memories --knowledge

当你更新了 Knowledge Source，并希望确保 Agent 使用的是最新信息时，这会非常有用。

知识库