Knowledge
CrewAI 中的 Knowledge 是什么,以及如何使用它。
概览
CrewAI 中的 Knowledge 是一个强大的系统,它允许 AI Agent 在执行任务时访问并利用外部信息源。 你可以把它理解为:为你的 Agent 配备了一个可以在工作时随时查阅的参考资料库。
快速开始示例
knowledge 目录中。同时,在创建 source 时,请使用相对于
knowledge 目录的相对路径。
向量存储(RAG)客户端配置
CrewAI 提供了一个与 provider 无关的 RAG 客户端抽象,用于对接向量存储。默认 provider 是 ChromaDB,同时也支持 Qdrant。你可以通过配置工具切换 provider。
当前支持:
- ChromaDB(默认)
- Qdrant
from crewai.rag.config.utils import set_rag_config, get_rag_client, clear_rag_config# ChromaDB (default)from crewai.rag.chromadb.config import ChromaDBConfigset_rag_config(ChromaDBConfig())chromadb_client = get_rag_client()# Qdrantfrom crewai.rag.qdrant.config import QdrantConfigset_rag_config(QdrantConfig())qdrant_client = get_rag_client()# Example operations (same API for any provider)client = qdrant_client # or chromadb_clientclient.create_collection(collection_name="docs")client.add_documents(collection_name="docs",documents=[{"id": "1", "content": "CrewAI enables collaborative AI agents."}],)results = client.search(collection_name="docs", query="collaborative agents", limit=3)clear_rag_config() # optional reset
这个 RAG 客户端独立于 Knowledge 内置的存储系统。当你需要直接控制向量存储,或构建自定义检索管道时,可以使用它。
基础字符串 Knowledge 示例
from crewai import Agent, Task, Crew, Process, LLMfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Create a knowledge sourcecontent = "Users name is John. He is 30 years old and lives in San Francisco."string_source = StringKnowledgeSource(content=content)# Create an LLM with a temperature of 0 to ensure deterministic outputsllm = LLM(model="gpt-4o-mini", temperature=0)# Create an agent with the knowledge storeagent = Agent(role="About User",goal="You know everything about the user.",backstory="You are a master at understanding people and their preferences.",verbose=True,allow_delegation=False,llm=llm,)task = Task(description="Answer the following questions about the user: {question}",expected_output="An answer to the question.",agent=agent,)crew = Crew(agents=[agent],tasks=[task],verbose=True,process=Process.sequential,knowledge_sources=[string_source], # Enable knowledge by adding the sources here)result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
Web 内容 Knowledge 示例
docling:uv add docling
from crewai import LLM, Agent, Crew, Process, Taskfrom crewai.knowledge.source.crew_docling_source import CrewDoclingSource# Create a knowledge source from web contentcontent_source = CrewDoclingSource(file_paths=["https://lilianweng.github.io/posts/2024-11-28-reward-hacking","https://lilianweng.github.io/posts/2024-07-07-hallucination",],)# Create an LLM with a temperature of 0 to ensure deterministic outputsllm = LLM(model="gpt-4o-mini", temperature=0)# Create an agent with the knowledge storeagent = Agent(role="About papers",goal="You know everything about the papers.",backstory="You are a master at understanding papers and their content.",verbose=True,allow_delegation=False,llm=llm,)task = Task(description="Answer the following questions about the papers: {question}",expected_output="An answer to the question.",agent=agent,)crew = Crew(agents=[agent],tasks=[task],verbose=True,process=Process.sequential,knowledge_sources=[content_source],)result = crew.kickoff(inputs={"question": "What is the reward hacking paper about? Be sure to provide sources."})
支持的 Knowledge Source
CrewAI 原生支持多种类型的 Knowledge Source:
PDF 文档
Excel 电子表格 JSON 文档
文本文件 Knowledge Source
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSourcetext_source = TextFileKnowledgeSource(file_paths=["document.txt", "another.txt"])
PDF Knowledge Source
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSourcepdf_source = PDFKnowledgeSource(file_paths=["document.pdf", "another.pdf"])
CSV Knowledge Source
from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSourcecsv_source = CSVKnowledgeSource(file_paths=["data.csv"])
Excel Knowledge Source
from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSourceexcel_source = ExcelKnowledgeSource(file_paths=["spreadsheet.xlsx"])
JSON Knowledge Source
from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSourcejson_source = JSONKnowledgeSource(file_paths=["data.json"])
./knowledge 文件夹。所有源文件(例如 .txt、.pdf、.xlsx、.json)都应放在这个文件夹中,以便集中管理。
Agent 与 Crew 级别 Knowledge:完整指南
Knowledge 初始化的实际工作方式
下面是使用 Knowledge 时实际发生的过程:
Agent 级别 Knowledge(独立)
from crewai import Agent, Task, Crewfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Agent with its own knowledge - NO crew knowledge neededspecialist_knowledge = StringKnowledgeSource(content="Specialized technical information for this agent only")specialist_agent = Agent(role="Technical Specialist",goal="Provide technical expertise",backstory="Expert in specialized technical domains",knowledge_sources=[specialist_knowledge] # Agent-specific knowledge)task = Task(description="Answer technical questions",agent=specialist_agent,expected_output="Technical answer")# No crew-level knowledge requiredcrew = Crew(agents=[specialist_agent],tasks=[task])result = crew.kickoff() # Agent knowledge works independently
在 crew.kickoff() 期间会发生什么
当你调用 crew.kickoff() 时,执行顺序如下:
# During kickofffor agent in self.agents:agent.crew = self # Agent gets reference to crewagent.set_knowledge(crew_embedder=self.embedder) # Agent knowledge initializedagent.create_agent_executor()
存储独立性
每个 Knowledge 层级都使用独立的存储集合:
# Agent knowledge storageagent_collection_name = agent.role # e.g., "Technical Specialist"# Crew knowledge storagecrew_collection_name = "crew"# Both stored in same ChromaDB instance but different collections# Path: ~/.local/share/CrewAI/{project}/knowledge/# ├── crew/ # Crew knowledge collection# ├── Technical Specialist/ # Agent knowledge collection# └── Another Agent Role/ # Another agent's collection
完整可运行示例
示例 1:仅 Agent Knowledge
from crewai import Agent, Task, Crewfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Agent-specific knowledgeagent_knowledge = StringKnowledgeSource(content="Agent-specific information that only this agent needs")agent = Agent(role="Specialist",goal="Use specialized knowledge",backstory="Expert with specific knowledge",knowledge_sources=[agent_knowledge],embedder={ # Agent can have its own embedder"provider": "openai","config": {"model": "text-embedding-3-small"}})task = Task(description="Answer using your specialized knowledge",agent=agent,expected_output="Answer based on agent knowledge")# No crew knowledge neededcrew = Crew(agents=[agent], tasks=[task])result = crew.kickoff() # Works perfectly
示例 2:同时使用 Agent 和 Crew Knowledge
# Crew-wide knowledge (shared by all agents)crew_knowledge = StringKnowledgeSource(content="Company policies and general information for all agents")# Agent-specific knowledgespecialist_knowledge = StringKnowledgeSource(content="Technical specifications only the specialist needs")specialist = Agent(role="Technical Specialist",goal="Provide technical expertise",backstory="Technical expert",knowledge_sources=[specialist_knowledge] # Agent-specific)generalist = Agent(role="General Assistant",goal="Provide general assistance",backstory="General helper"# No agent-specific knowledge)crew = Crew(agents=[specialist, generalist],tasks=[...],knowledge_sources=[crew_knowledge] # Crew-wide knowledge)# Result:# - specialist gets: crew_knowledge + specialist_knowledge# - generalist gets: crew_knowledge only
示例 3:多个 Agent 使用不同 Knowledge
# Different knowledge for different agentssales_knowledge = StringKnowledgeSource(content="Sales procedures and pricing")tech_knowledge = StringKnowledgeSource(content="Technical documentation")support_knowledge = StringKnowledgeSource(content="Support procedures")sales_agent = Agent(role="Sales Representative",knowledge_sources=[sales_knowledge],embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}})tech_agent = Agent(role="Technical Expert",knowledge_sources=[tech_knowledge],embedder={"provider": "ollama", "config": {"model": "mxbai-embed-large"}})support_agent = Agent(role="Support Specialist",knowledge_sources=[support_knowledge]# Will use crew embedder as fallback)crew = Crew(agents=[sales_agent, tech_agent, support_agent],tasks=[...],embedder={ # Fallback embedder for agents without their own"provider": "google-generativeai","config": {"model_name": "gemini-embedding-001"}})# Each agent gets only their specific knowledge# Each can use different embedding providers
只需将 Agent 或 crew 正常运行所需的相关 Knowledge Source 添加进去即可。 Knowledge Source 可以加在 Agent 级别,也可以加在 Crew 级别。 Crew 级别的 Knowledge Source 会被 crew 中所有 Agent 使用。 Agent 级别的 Knowledge Source 只会被 预加载该 Knowledge 的特定 Agent 使用。
Knowledge 配置
你可以为 crew 或 agent 配置 Knowledge 参数。
from crewai.knowledge.knowledge_config import KnowledgeConfigknowledge_config = KnowledgeConfig(results_limit=10, score_threshold=0.5)agent = Agent(...knowledge_config=knowledge_config)
results_limit:返回的相关文档数量。默认是 3。score_threshold:文档被认为相关所需达到的最低分数。默认是 0.35。
支持的 Knowledge 参数
"knowledge"。
Knowledge 存储透明性
CrewAI 将 Knowledge 文件存储在哪里
默认情况下,CrewAI 使用与 memory 相同的存储系统,并将 Knowledge 存储在与平台相关的目录中:
各平台默认存储位置
macOS:
~/Library/Application Support/CrewAI/{project_name}/└── knowledge/ # Knowledge ChromaDB files├── chroma.sqlite3 # ChromaDB metadata├── {collection_id}/ # Vector embeddings└── knowledge_{collection}/ # Named collections
Linux:
~/.local/share/CrewAI/{project_name}/└── knowledge/├── chroma.sqlite3├── {collection_id}/└── knowledge_{collection}/
Windows:
C:\Users\{username}\AppData\Local\CrewAI\{project_name}\└── knowledge\├── chroma.sqlite3├── {collection_id}\└── knowledge_{collection}\
查找你的 Knowledge 存储位置
要查看 CrewAI 具体将 Knowledge 文件存储在哪里:
from crewai.utilities.paths import db_storage_pathimport os# Get the knowledge storage pathknowledge_path = os.path.join(db_storage_path(), "knowledge")print(f"Knowledge storage location: {knowledge_path}")# List knowledge collections and filesif os.path.exists(knowledge_path):print("\nKnowledge storage contents:")for item in os.listdir(knowledge_path):item_path = os.path.join(knowledge_path, item)if os.path.isdir(item_path):print(f"📁 Collection: {item}/")# Show collection contentstry:for subitem in os.listdir(item_path):print(f" └── {subitem}")except PermissionError:print(f" └── (permission denied)")else:print(f"📄 {item}")else:print("No knowledge storage found yet.")
控制 Knowledge 存储位置
方案 1:环境变量(推荐)
import osfrom crewai import Crew# Set custom storage location for all CrewAI dataos.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"# All knowledge will now be stored in ./my_project_storage/knowledge/crew = Crew(agents=[...],tasks=[...],knowledge_sources=[...])
方案 2:自定义 Knowledge 存储
from crewai.knowledge.storage.knowledge_storage import KnowledgeStoragefrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Create custom storage with specific embeddercustom_storage = KnowledgeStorage(embedder={"provider": "ollama","config": {"model": "mxbai-embed-large"}},collection_name="my_custom_knowledge")# Use with knowledge sourcesknowledge_source = StringKnowledgeSource(content="Your knowledge content here")knowledge_source.storage = custom_storage
方案 3:项目级 Knowledge 存储
import osfrom pathlib import Path# Store knowledge in project directoryproject_root = Path(__file__).parentknowledge_dir = project_root / "knowledge_storage"os.environ["CREWAI_STORAGE_DIR"] = str(knowledge_dir)# Now all knowledge will be stored in your project directory
默认 Embedding Provider 行为
text-embedding-3-small)来存储 Knowledge,即使你使用的是其他 LLM provider。你可以很方便地将其改成与你当前方案一致的配置。
理解默认行为
from crewai import Agent, Crew, LLMfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# When using Claude as your LLM...agent = Agent(role="Researcher",goal="Research topics",backstory="Expert researcher",llm=LLM(provider="anthropic", model="claude-3-sonnet") # Using Claude)# CrewAI will still use OpenAI embeddings by default for knowledge# This ensures consistency but may not match your LLM provider preferenceknowledge_source = StringKnowledgeSource(content="Research data...")crew = Crew(agents=[agent],tasks=[...],knowledge_sources=[knowledge_source]# Default: Uses OpenAI embeddings even with Claude LLM)
自定义 Knowledge Embedding Provider
# Option 1: Use Voyage AI (recommended by Anthropic for Claude users)crew = Crew(agents=[agent],tasks=[...],knowledge_sources=[knowledge_source],embedder={"provider": "voyageai", # Recommended for Claude users"config": {"api_key": "your-voyage-api-key","model": "voyage-3" # or "voyage-3-large" for best quality}})# Option 2: Use local embeddings (no external API calls)crew = Crew(agents=[agent],tasks=[...],knowledge_sources=[knowledge_source],embedder={"provider": "ollama","config": {"model": "mxbai-embed-large","url": "http://localhost:11434/api/embeddings"}})# Option 3: Agent-level embedding customizationagent = Agent(role="Researcher",goal="Research topics",backstory="Expert researcher",knowledge_sources=[knowledge_source],embedder={"provider": "google-generativeai","config": {"model_name": "gemini-embedding-001","api_key": "your-google-key"}})
配置 Azure OpenAI Embeddings
当使用 Azure OpenAI embeddings 时:
- 请先确保你已经在 Azure 平台中部署了 embedding 模型
- 然后使用如下配置:
agent = Agent(role="Researcher",goal="Research topics",backstory="Expert researcher",knowledge_sources=[knowledge_source],embedder={"provider": "azure","config": {"api_key": "your-azure-api-key","model": "text-embedding-ada-002", # change to the model you are using and is deployed in Azure"api_base": "https://your-azure-endpoint.openai.azure.com/","api_version": "2024-02-01"}})
高级功能
查询改写
CrewAI 实现了一种智能查询改写机制,用于优化 Knowledge 检索。当 Agent 需要在 Knowledge Source 中搜索时,原始任务 Prompt 会被自动转换为更有效的检索查询。
查询改写如何工作
- 当 Agent 执行任务且存在 Knowledge Source 可用时,会触发
_get_knowledge_search_query方法 - Agent 的 LLM 会将原始任务 Prompt 转换为优化后的检索查询
- 然后使用这个优化后的查询,从 Knowledge Source 中检索相关信息
查询改写的优势
示例
# Original task prompttask_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."# Behind the scenes, this might be rewritten as:rewritten_query = "What movies did John watch last week?"
改写后的查询更聚焦于核心信息需求,并去掉了与输出格式相关的无关说明。
Knowledge 事件
CrewAI 会在 Knowledge 检索过程中发出事件,你可以通过事件系统监听这些事件。这使你能够监控、调试和分析 Agent 是如何检索和使用 Knowledge 的。
可用的 Knowledge 事件
- KnowledgeRetrievalStartedEvent:当 Agent 开始从 Knowledge Source 中检索信息时触发
- KnowledgeRetrievalCompletedEvent:当 Knowledge 检索完成时触发,包含所使用的查询和检索到的内容
- KnowledgeQueryStartedEvent:当对 Knowledge Source 发起查询时触发
- KnowledgeQueryCompletedEvent:当查询成功完成时触发
- KnowledgeQueryFailedEvent:当对 Knowledge Source 的查询失败时触发
- KnowledgeSearchQueryFailedEvent:当搜索查询失败时触发
示例:监控 Knowledge 检索
from crewai.events import (KnowledgeRetrievalStartedEvent,KnowledgeRetrievalCompletedEvent,BaseEventListener,)class KnowledgeMonitorListener(BaseEventListener):def setup_listeners(self, crewai_event_bus):@crewai_event_bus.on(KnowledgeRetrievalStartedEvent)def on_knowledge_retrieval_started(source, event):print(f"Agent '{event.agent.role}' started retrieving knowledge")@crewai_event_bus.on(KnowledgeRetrievalCompletedEvent)def on_knowledge_retrieval_completed(source, event):print(f"Agent '{event.agent.role}' completed knowledge retrieval")print(f"Query: {event.query}")print(f"Retrieved {len(event.retrieved_knowledge)} knowledge chunks")# Create an instance of your listenerknowledge_monitor = KnowledgeMonitorListener()
更多关于事件使用的信息,请参考 Event Listeners 文档。
自定义 Knowledge Source
CrewAI 允许你通过继承 BaseKnowledgeSource 类,为任意类型的数据创建自定义 Knowledge Source。下面我们用一个实际示例来说明:抓取并处理太空新闻文章。
太空新闻 Knowledge Source 示例
python
from crewai import Agent, Task, Crew, Process, LLM
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
import requests
from datetime import datetime
from typing import Dict, Any
from pydantic import BaseModel, Field
class SpaceNewsKnowledgeSource(BaseKnowledgeSource):
"""Knowledge source that fetches data from Space News API."""
apiendpoint: str = Field(description=”API endpoint URL”)
limit: int = Field(default=10, description=”Number of articles to fetch”)
def load_content(self) -> Dict[Any, str]:
“””Fetch and format space news articles.”””
try:
response = requests.get(
f”{self.api_endpoint}?limit={self.limit}”
)
response.raise_for_status()
data = response.json()
articles = data.get(‘results’, [])
formatted_data = self.validate_content(articles)
return {self.api_endpoint: formatted_data}
except Exception as e:
raise ValueError(f”Failed to fetch space news: {str(e)}”)
def validate_content(self, articles: list) -> str:
“””Format articles into readable text.”””
formatted = “Space News Articles:\n\n”
for article in articles:
formatted += f”””
Title: {article[‘title’]}
Published: {article[‘published_at’]}
Summary: {article[‘summary’]}
News Site: {article[‘news_site’]}
URL: {article[‘url’]}
—————————-“””
return formatted
def add(self) -> None:
“””Process and store the articles.”””
content = self.load_content()
for , text in content.items():
chunks = self._chunk_text(text)
self.chunks.extend(chunks)
self._save_documents()
# Create knowledge source
recent_news = SpaceNewsKnowledgeSource(
api_endpoint="[https://api.spaceflightnewsapi.net/v4/articles](https://api.spaceflightnewsapi.net/v4/articles)",
limit=10,
)
# Create specialized agent
space_analyst = Agent(
role="Space News Analyst",
goal="Answer questions about space news accurately and comprehensively",
backstory="""You are a space industry analyst with expertise in space exploration,
satellite technology, and space industry trends. You excel at answering questions
about space news and providing detailed, accurate information.""",
knowledge_sources=[recent_news],
llm=LLM(model="gpt-4", temperature=0.0)
)
# Create task that handles user questions
analysis_task = Task(
description="Answer this question about space news: {user_question}",
expected_output="A detailed answer based on the recent space news articles",
agent=space_analyst
)
# Create and run the crew
crew = Crew(
agents=[space_analyst],
tasks=[analysis_task],
verbose=True,
process=Process.sequential
)
# Example usage
result = crew.kickoff(
inputs={"user_question": "What are the latest developments in space exploration?"}
)
````
output
# Agent: Space News Analyst
## Task: Answer this question about space news: What are the latest developments in space exploration?
# Agent: Space News Analyst
## Final Answer:
The latest developments in space exploration, based on recent space news articles, include the following:
1. SpaceX has received the final regulatory approvals to proceed with the second integrated Starship/Super Heavy launch, scheduled for as soon as the morning of Nov. 17, 2023. This is a significant step in SpaceX’s ambitious plans for space exploration and colonization. Source: SpaceNews
2. SpaceX has also informed the US Federal Communications Commission (FCC) that it plans to begin launching its first next-generation Starlink Gen2 satellites. This represents a major upgrade to the Starlink satellite internet service, which aims to provide high-speed internet access worldwide. Source: Teslarati
3. AI startup Synthetaic has raised $15 million in Series B funding. The company uses artificial intelligence to analyze data from space and air sensors, which could have significant applications in space exploration and satellite technology. Source: SpaceNews
4. The Space Force has formally established a unit within the U.S. Indo-Pacific Command, marking a permanent presence in the Indo-Pacific region. This could have significant implications for space security and geopolitics. Source: SpaceNews
5. Slingshot Aerospace, a space tracking and data analytics company, is expanding its network of ground-based optical telescopes to increase coverage of low Earth orbit. This could improve our ability to track and analyze objects in low Earth orbit, including satellites and space debris. Source: SpaceNews
6. The National Natural Science Foundation of China has outlined a five-year project for researchers to study the assembly of ultra-large spacecraft. This could lead to significant advancements in spacecraft technology and space exploration capabilities. Source: SpaceNews
7. The Center for AEroSpace Autonomy Research (CAESAR) at Stanford University is focusing on spacecraft autonomy. The center held a kickoff event on May 22, 2024, to highlight the industry, academia, and government collaboration it seeks to foster. This could lead to significant advancements in autonomous spacecraft technology. Source: SpaceNews
````
调试与故障排查
调试 Knowledge 问题
检查 Agent Knowledge 初始化
from crewai import Agent, Crew, Taskfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSourceknowledge_source = StringKnowledgeSource(content="Test knowledge")agent = Agent(role="Test Agent",goal="Test knowledge",backstory="Testing",knowledge_sources=[knowledge_source])crew = Crew(agents=[agent], tasks=[Task(...)])# Before kickoff - knowledge not initializedprint(f"Before kickoff - Agent knowledge: {getattr(agent, 'knowledge', None)}")crew.kickoff()# After kickoff - knowledge initializedprint(f"After kickoff - Agent knowledge: {agent.knowledge}")print(f"Agent knowledge collection: {agent.knowledge.storage.collection_name}")print(f"Number of sources: {len(agent.knowledge.sources)}")
验证 Knowledge 存储位置
import osfrom crewai.utilities.paths import db_storage_path# Check storage structurestorage_path = db_storage_path()knowledge_path = os.path.join(storage_path, "knowledge")if os.path.exists(knowledge_path):print("Knowledge collections found:")for collection in os.listdir(knowledge_path):collection_path = os.path.join(knowledge_path, collection)if os.path.isdir(collection_path):print(f" - {collection}/")# Show collection contentsfor item in os.listdir(collection_path):print(f" └── {item}")
测试 Knowledge 检索
# Test agent knowledge retrievalif hasattr(agent, 'knowledge') and agent.knowledge:test_query = ["test query"]results = agent.knowledge.query(test_query)print(f"Agent knowledge results: {len(results)} documents found")# Test crew knowledge retrieval (if exists)if hasattr(crew, 'knowledge') and crew.knowledge:crew_results = crew.query_knowledge(test_query)print(f"Crew knowledge results: {len(crew_results)} documents found")
检查 Knowledge 集合
import chromadbfrom crewai.utilities.paths import db_storage_pathimport os# Connect to CrewAI's knowledge ChromaDBknowledge_path = os.path.join(db_storage_path(), "knowledge")if os.path.exists(knowledge_path):client = chromadb.PersistentClient(path=knowledge_path)collections = client.list_collections()print("Knowledge Collections:")for collection in collections:print(f" - {collection.name}: {collection.count()} documents")# Sample a few documents to verify contentif collection.count() > 0:sample = collection.peek(limit=2)print(f" Sample content: {sample['documents'][0][:100]}...")else:print("No knowledge storage found")
检查 Knowledge 处理过程
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Create a test knowledge sourcetest_source = StringKnowledgeSource(content="Test knowledge content for debugging",chunk_size=100, # Small chunks for testingchunk_overlap=20)# Check chunking behaviorprint(f"Original content length: {len(test_source.content)}")print(f"Chunk size: {test_source.chunk_size}")print(f"Chunk overlap: {test_source.chunk_overlap}")# Process and inspect chunkstest_source.add()print(f"Number of chunks created: {len(test_source.chunks)}")for i, chunk in enumerate(test_source.chunks[:3]): # Show first 3 chunksprint(f"Chunk {i+1}: {chunk[:50]}...")
常见 Knowledge 存储问题
“File not found” 错误:
# Ensure files are in the correct locationfrom crewai.utilities.constants import KNOWLEDGE_DIRECTORYimport osknowledge_dir = KNOWLEDGE_DIRECTORY # Usually "knowledge"file_path = os.path.join(knowledge_dir, "your_file.pdf")if not os.path.exists(file_path):print(f"File not found: {file_path}")print(f"Current working directory: {os.getcwd()}")print(f"Expected knowledge directory: {os.path.abspath(knowledge_dir)}")
“Embedding dimension mismatch” 错误:
# This happens when switching embedding providers# Reset knowledge storage to clear old embeddingscrew.reset_memories(command_type='knowledge')# Or use consistent embedding providerscrew = Crew(agents=[...],tasks=[...],knowledge_sources=[...],embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}})
“ChromaDB permission denied” 错误:
# Fix storage permissionschmod -R 755 ~/.local/share/CrewAI/
Knowledge 在多次运行之间没有持久化:
# Verify storage location consistencyimport osfrom crewai.utilities.paths import db_storage_pathprint("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))print("Computed storage path:", db_storage_path())print("Knowledge path:", os.path.join(db_storage_path(), "knowledge"))
Knowledge 重置命令
# Reset only agent-specific knowledgecrew.reset_memories(command_type='agent_knowledge')# Reset both crew and agent knowledgecrew.reset_memories(command_type='knowledge')# CLI commands# crewai reset-memories --agent-knowledge # Agent knowledge only# crewai reset-memories --knowledge # All knowledge
清除 Knowledge
如果你需要清除 CrewAI 中存储的 Knowledge,可以使用带 --knowledge 选项的 crewai reset-memories 命令。
crewai reset-memories --knowledge
当你更新了 Knowledge Source,并希望确保 Agent 使用的是最新信息时,这会非常有用。
