Knowledge

CrewAI 中的 Knowledge 是什么,以及如何使用它。

概览

CrewAI 中的 Knowledge 是一个强大的系统,它允许 AI Agent 在执行任务时访问并利用外部信息源。 你可以把它理解为:为你的 Agent 配备了一个可以在工作时随时查阅的参考资料库。

使用 Knowledge 的主要优势: 用领域特定信息增强 Agent 能力 用真实世界数据支持决策 在多轮对话中保持上下文 让回答建立在事实信息之上

快速开始示例

对于基于文件的 Knowledge Source,请确保将文件放在项目根目录下的 knowledge 目录中。
同时,在创建 source 时,请使用相对于 knowledge 目录的相对路径。

向量存储(RAG)客户端配置

CrewAI 提供了一个与 provider 无关的 RAG 客户端抽象,用于对接向量存储。默认 provider 是 ChromaDB,同时也支持 Qdrant。你可以通过配置工具切换 provider。

当前支持:

  • ChromaDB(默认)
  • Qdrant
  1. from crewai.rag.config.utils import set_rag_config, get_rag_client, clear_rag_config
  2. # ChromaDB (default)
  3. from crewai.rag.chromadb.config import ChromaDBConfig
  4. set_rag_config(ChromaDBConfig())
  5. chromadb_client = get_rag_client()
  6. # Qdrant
  7. from crewai.rag.qdrant.config import QdrantConfig
  8. set_rag_config(QdrantConfig())
  9. qdrant_client = get_rag_client()
  10. # Example operations (same API for any provider)
  11. client = qdrant_client # or chromadb_client
  12. client.create_collection(collection_name="docs")
  13. client.add_documents(
  14. collection_name="docs",
  15. documents=[{"id": "1", "content": "CrewAI enables collaborative AI agents."}],
  16. )
  17. results = client.search(collection_name="docs", query="collaborative agents", limit=3)
  18. clear_rag_config() # optional reset

这个 RAG 客户端独立于 Knowledge 内置的存储系统。当你需要直接控制向量存储,或构建自定义检索管道时,可以使用它。

基础字符串 Knowledge 示例

  1. from crewai import Agent, Task, Crew, Process, LLM
  2. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  3. # Create a knowledge source
  4. content = "Users name is John. He is 30 years old and lives in San Francisco."
  5. string_source = StringKnowledgeSource(content=content)
  6. # Create an LLM with a temperature of 0 to ensure deterministic outputs
  7. llm = LLM(model="gpt-4o-mini", temperature=0)
  8. # Create an agent with the knowledge store
  9. agent = Agent(
  10. role="About User",
  11. goal="You know everything about the user.",
  12. backstory="You are a master at understanding people and their preferences.",
  13. verbose=True,
  14. allow_delegation=False,
  15. llm=llm,
  16. )
  17. task = Task(
  18. description="Answer the following questions about the user: {question}",
  19. expected_output="An answer to the question.",
  20. agent=agent,
  21. )
  22. crew = Crew(
  23. agents=[agent],
  24. tasks=[task],
  25. verbose=True,
  26. process=Process.sequential,
  27. knowledge_sources=[string_source], # Enable knowledge by adding the sources here
  28. )
  29. result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})

Web 内容 Knowledge 示例

要让下面的示例正常工作,你需要安装 doclinguv add docling
  1. from crewai import LLM, Agent, Crew, Process, Task
  2. from crewai.knowledge.source.crew_docling_source import CrewDoclingSource
  3. # Create a knowledge source from web content
  4. content_source = CrewDoclingSource(
  5. file_paths=[
  6. "https://lilianweng.github.io/posts/2024-11-28-reward-hacking",
  7. "https://lilianweng.github.io/posts/2024-07-07-hallucination",
  8. ],
  9. )
  10. # Create an LLM with a temperature of 0 to ensure deterministic outputs
  11. llm = LLM(model="gpt-4o-mini", temperature=0)
  12. # Create an agent with the knowledge store
  13. agent = Agent(
  14. role="About papers",
  15. goal="You know everything about the papers.",
  16. backstory="You are a master at understanding papers and their content.",
  17. verbose=True,
  18. allow_delegation=False,
  19. llm=llm,
  20. )
  21. task = Task(
  22. description="Answer the following questions about the papers: {question}",
  23. expected_output="An answer to the question.",
  24. agent=agent,
  25. )
  26. crew = Crew(
  27. agents=[agent],
  28. tasks=[task],
  29. verbose=True,
  30. process=Process.sequential,
  31. knowledge_sources=[content_source],
  32. )
  33. result = crew.kickoff(
  34. inputs={"question": "What is the reward hacking paper about? Be sure to provide sources."}
  35. )

支持的 Knowledge Source

CrewAI 原生支持多种类型的 Knowledge Source:

原始字符串 文本文件(.txt)
PDF 文档
CSV 文件
Excel 电子表格 JSON 文档

文本文件 Knowledge Source

  1. from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
  2. text_source = TextFileKnowledgeSource(
  3. file_paths=["document.txt", "another.txt"]
  4. )

PDF Knowledge Source

  1. from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
  2. pdf_source = PDFKnowledgeSource(
  3. file_paths=["document.pdf", "another.pdf"]
  4. )

CSV Knowledge Source

  1. from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource
  2. csv_source = CSVKnowledgeSource(
  3. file_paths=["data.csv"]
  4. )

Excel Knowledge Source

  1. from crewai.knowledge.source.excel_knowledge_source import ExcelKnowledgeSource
  2. excel_source = ExcelKnowledgeSource(
  3. file_paths=["spreadsheet.xlsx"]
  4. )

JSON Knowledge Source

  1. from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSource
  2. json_source = JSONKnowledgeSource(
  3. file_paths=["data.json"]
  4. )
请确保你已经创建了 ./knowledge 文件夹。所有源文件(例如 .txt.pdf.xlsx.json)都应放在这个文件夹中,以便集中管理。

Agent 与 Crew 级别 Knowledge:完整指南

理解 Knowledge 层级:CrewAI 同时支持 Agent 级别和 Crew 级别的 Knowledge。本节会明确说明两者各自如何工作、何时初始化,并澄清关于依赖关系的常见误解。

Knowledge 初始化的实际工作方式

下面是使用 Knowledge 时实际发生的过程:

Agent 级别 Knowledge(独立)

  1. from crewai import Agent, Task, Crew
  2. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  3. # Agent with its own knowledge - NO crew knowledge needed
  4. specialist_knowledge = StringKnowledgeSource(
  5. content="Specialized technical information for this agent only"
  6. )
  7. specialist_agent = Agent(
  8. role="Technical Specialist",
  9. goal="Provide technical expertise",
  10. backstory="Expert in specialized technical domains",
  11. knowledge_sources=[specialist_knowledge] # Agent-specific knowledge
  12. )
  13. task = Task(
  14. description="Answer technical questions",
  15. agent=specialist_agent,
  16. expected_output="Technical answer"
  17. )
  18. # No crew-level knowledge required
  19. crew = Crew(
  20. agents=[specialist_agent],
  21. tasks=[task]
  22. )
  23. result = crew.kickoff() # Agent knowledge works independently

crew.kickoff() 期间会发生什么

当你调用 crew.kickoff() 时,执行顺序如下:

  1. # During kickoff
  2. for agent in self.agents:
  3. agent.crew = self # Agent gets reference to crew
  4. agent.set_knowledge(crew_embedder=self.embedder) # Agent knowledge initialized
  5. agent.create_agent_executor()

存储独立性

每个 Knowledge 层级都使用独立的存储集合:

  1. # Agent knowledge storage
  2. agent_collection_name = agent.role # e.g., "Technical Specialist"
  3. # Crew knowledge storage
  4. crew_collection_name = "crew"
  5. # Both stored in same ChromaDB instance but different collections
  6. # Path: ~/.local/share/CrewAI/{project}/knowledge/
  7. # ├── crew/ # Crew knowledge collection
  8. # ├── Technical Specialist/ # Agent knowledge collection
  9. # └── Another Agent Role/ # Another agent's collection

完整可运行示例

示例 1:仅 Agent Knowledge

  1. from crewai import Agent, Task, Crew
  2. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  3. # Agent-specific knowledge
  4. agent_knowledge = StringKnowledgeSource(
  5. content="Agent-specific information that only this agent needs"
  6. )
  7. agent = Agent(
  8. role="Specialist",
  9. goal="Use specialized knowledge",
  10. backstory="Expert with specific knowledge",
  11. knowledge_sources=[agent_knowledge],
  12. embedder={ # Agent can have its own embedder
  13. "provider": "openai",
  14. "config": {"model": "text-embedding-3-small"}
  15. }
  16. )
  17. task = Task(
  18. description="Answer using your specialized knowledge",
  19. agent=agent,
  20. expected_output="Answer based on agent knowledge"
  21. )
  22. # No crew knowledge needed
  23. crew = Crew(agents=[agent], tasks=[task])
  24. result = crew.kickoff() # Works perfectly

示例 2:同时使用 Agent 和 Crew Knowledge

  1. # Crew-wide knowledge (shared by all agents)
  2. crew_knowledge = StringKnowledgeSource(
  3. content="Company policies and general information for all agents"
  4. )
  5. # Agent-specific knowledge
  6. specialist_knowledge = StringKnowledgeSource(
  7. content="Technical specifications only the specialist needs"
  8. )
  9. specialist = Agent(
  10. role="Technical Specialist",
  11. goal="Provide technical expertise",
  12. backstory="Technical expert",
  13. knowledge_sources=[specialist_knowledge] # Agent-specific
  14. )
  15. generalist = Agent(
  16. role="General Assistant",
  17. goal="Provide general assistance",
  18. backstory="General helper"
  19. # No agent-specific knowledge
  20. )
  21. crew = Crew(
  22. agents=[specialist, generalist],
  23. tasks=[...],
  24. knowledge_sources=[crew_knowledge] # Crew-wide knowledge
  25. )
  26. # Result:
  27. # - specialist gets: crew_knowledge + specialist_knowledge
  28. # - generalist gets: crew_knowledge only

示例 3:多个 Agent 使用不同 Knowledge

  1. # Different knowledge for different agents
  2. sales_knowledge = StringKnowledgeSource(content="Sales procedures and pricing")
  3. tech_knowledge = StringKnowledgeSource(content="Technical documentation")
  4. support_knowledge = StringKnowledgeSource(content="Support procedures")
  5. sales_agent = Agent(
  6. role="Sales Representative",
  7. knowledge_sources=[sales_knowledge],
  8. embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
  9. )
  10. tech_agent = Agent(
  11. role="Technical Expert",
  12. knowledge_sources=[tech_knowledge],
  13. embedder={"provider": "ollama", "config": {"model": "mxbai-embed-large"}}
  14. )
  15. support_agent = Agent(
  16. role="Support Specialist",
  17. knowledge_sources=[support_knowledge]
  18. # Will use crew embedder as fallback
  19. )
  20. crew = Crew(
  21. agents=[sales_agent, tech_agent, support_agent],
  22. tasks=[...],
  23. embedder={ # Fallback embedder for agents without their own
  24. "provider": "google-generativeai",
  25. "config": {"model_name": "gemini-embedding-001"}
  26. }
  27. )
  28. # Each agent gets only their specific knowledge
  29. # Each can use different embedding providers
与通过工具从向量数据库中检索不同,预加载了 Knowledge 的 Agent 不需要额外的 retrieval persona 或 task。
只需将 Agent 或 crew 正常运行所需的相关 Knowledge Source 添加进去即可。 Knowledge Source 可以加在 Agent 级别,也可以加在 Crew 级别。 Crew 级别的 Knowledge Source 会被 crew 中所有 Agent 使用。 Agent 级别的 Knowledge Source 只会被 预加载该 Knowledge 的特定 Agent 使用。

Knowledge 配置

你可以为 crew 或 agent 配置 Knowledge 参数。

  1. from crewai.knowledge.knowledge_config import KnowledgeConfig
  2. knowledge_config = KnowledgeConfig(results_limit=10, score_threshold=0.5)
  3. agent = Agent(
  4. ...
  5. knowledge_config=knowledge_config
  6. )
results_limit:返回的相关文档数量。默认是 3。
score_threshold:文档被认为相关所需达到的最低分数。默认是 0.35。

支持的 Knowledge 参数

提供内容以供存储和查询的 Knowledge Source 列表。可以包括 PDF、CSV、Excel、JSON、文本文件或字符串内容。 存储 Knowledge 的集合名称,用于区分不同知识集。如果未提供,默认是 "knowledge" 自定义存储配置,用于管理 Knowledge 的存储与检索方式。如果未提供,则会自动创建默认存储。

Knowledge 存储透明性

理解 Knowledge 存储:CrewAI 会自动将 Knowledge Source 存储在与平台相关的目录中,并使用 ChromaDB 作为向量存储。了解这些默认位置和行为有助于生产部署、调试和存储管理。

CrewAI 将 Knowledge 文件存储在哪里

默认情况下,CrewAI 使用与 memory 相同的存储系统,并将 Knowledge 存储在与平台相关的目录中:

各平台默认存储位置

macOS:

  1. ~/Library/Application Support/CrewAI/{project_name}/
  2. └── knowledge/ # Knowledge ChromaDB files
  3. ├── chroma.sqlite3 # ChromaDB metadata
  4. ├── {collection_id}/ # Vector embeddings
  5. └── knowledge_{collection}/ # Named collections

Linux:

  1. ~/.local/share/CrewAI/{project_name}/
  2. └── knowledge/
  3. ├── chroma.sqlite3
  4. ├── {collection_id}/
  5. └── knowledge_{collection}/

Windows:

  1. C:\Users\{username}\AppData\Local\CrewAI\{project_name}\
  2. └── knowledge\
  3. ├── chroma.sqlite3
  4. ├── {collection_id}\
  5. └── knowledge_{collection}\

查找你的 Knowledge 存储位置

要查看 CrewAI 具体将 Knowledge 文件存储在哪里:

  1. from crewai.utilities.paths import db_storage_path
  2. import os
  3. # Get the knowledge storage path
  4. knowledge_path = os.path.join(db_storage_path(), "knowledge")
  5. print(f"Knowledge storage location: {knowledge_path}")
  6. # List knowledge collections and files
  7. if os.path.exists(knowledge_path):
  8. print("\nKnowledge storage contents:")
  9. for item in os.listdir(knowledge_path):
  10. item_path = os.path.join(knowledge_path, item)
  11. if os.path.isdir(item_path):
  12. print(f"📁 Collection: {item}/")
  13. # Show collection contents
  14. try:
  15. for subitem in os.listdir(item_path):
  16. print(f" └── {subitem}")
  17. except PermissionError:
  18. print(f" └── (permission denied)")
  19. else:
  20. print(f"📄 {item}")
  21. else:
  22. print("No knowledge storage found yet.")

控制 Knowledge 存储位置

方案 1:环境变量(推荐)

  1. import os
  2. from crewai import Crew
  3. # Set custom storage location for all CrewAI data
  4. os.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"
  5. # All knowledge will now be stored in ./my_project_storage/knowledge/
  6. crew = Crew(
  7. agents=[...],
  8. tasks=[...],
  9. knowledge_sources=[...]
  10. )

方案 2:自定义 Knowledge 存储

  1. from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
  2. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  3. # Create custom storage with specific embedder
  4. custom_storage = KnowledgeStorage(
  5. embedder={
  6. "provider": "ollama",
  7. "config": {"model": "mxbai-embed-large"}
  8. },
  9. collection_name="my_custom_knowledge"
  10. )
  11. # Use with knowledge sources
  12. knowledge_source = StringKnowledgeSource(
  13. content="Your knowledge content here"
  14. )
  15. knowledge_source.storage = custom_storage

方案 3:项目级 Knowledge 存储

  1. import os
  2. from pathlib import Path
  3. # Store knowledge in project directory
  4. project_root = Path(__file__).parent
  5. knowledge_dir = project_root / "knowledge_storage"
  6. os.environ["CREWAI_STORAGE_DIR"] = str(knowledge_dir)
  7. # Now all knowledge will be stored in your project directory

默认 Embedding Provider 行为

默认 Embedding Provider:CrewAI 默认使用 OpenAI 的 embeddings(text-embedding-3-small)来存储 Knowledge,即使你使用的是其他 LLM provider。你可以很方便地将其改成与你当前方案一致的配置。

理解默认行为

  1. from crewai import Agent, Crew, LLM
  2. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  3. # When using Claude as your LLM...
  4. agent = Agent(
  5. role="Researcher",
  6. goal="Research topics",
  7. backstory="Expert researcher",
  8. llm=LLM(provider="anthropic", model="claude-3-sonnet") # Using Claude
  9. )
  10. # CrewAI will still use OpenAI embeddings by default for knowledge
  11. # This ensures consistency but may not match your LLM provider preference
  12. knowledge_source = StringKnowledgeSource(content="Research data...")
  13. crew = Crew(
  14. agents=[agent],
  15. tasks=[...],
  16. knowledge_sources=[knowledge_source]
  17. # Default: Uses OpenAI embeddings even with Claude LLM
  18. )

自定义 Knowledge Embedding Provider

  1. # Option 1: Use Voyage AI (recommended by Anthropic for Claude users)
  2. crew = Crew(
  3. agents=[agent],
  4. tasks=[...],
  5. knowledge_sources=[knowledge_source],
  6. embedder={
  7. "provider": "voyageai", # Recommended for Claude users
  8. "config": {
  9. "api_key": "your-voyage-api-key",
  10. "model": "voyage-3" # or "voyage-3-large" for best quality
  11. }
  12. }
  13. )
  14. # Option 2: Use local embeddings (no external API calls)
  15. crew = Crew(
  16. agents=[agent],
  17. tasks=[...],
  18. knowledge_sources=[knowledge_source],
  19. embedder={
  20. "provider": "ollama",
  21. "config": {
  22. "model": "mxbai-embed-large",
  23. "url": "http://localhost:11434/api/embeddings"
  24. }
  25. }
  26. )
  27. # Option 3: Agent-level embedding customization
  28. agent = Agent(
  29. role="Researcher",
  30. goal="Research topics",
  31. backstory="Expert researcher",
  32. knowledge_sources=[knowledge_source],
  33. embedder={
  34. "provider": "google-generativeai",
  35. "config": {
  36. "model_name": "gemini-embedding-001",
  37. "api_key": "your-google-key"
  38. }
  39. }
  40. )

配置 Azure OpenAI Embeddings

当使用 Azure OpenAI embeddings 时:

  1. 请先确保你已经在 Azure 平台中部署了 embedding 模型
  2. 然后使用如下配置:
  1. agent = Agent(
  2. role="Researcher",
  3. goal="Research topics",
  4. backstory="Expert researcher",
  5. knowledge_sources=[knowledge_source],
  6. embedder={
  7. "provider": "azure",
  8. "config": {
  9. "api_key": "your-azure-api-key",
  10. "model": "text-embedding-ada-002", # change to the model you are using and is deployed in Azure
  11. "api_base": "https://your-azure-endpoint.openai.azure.com/",
  12. "api_version": "2024-02-01"
  13. }
  14. }
  15. )

高级功能

查询改写

CrewAI 实现了一种智能查询改写机制,用于优化 Knowledge 检索。当 Agent 需要在 Knowledge Source 中搜索时,原始任务 Prompt 会被自动转换为更有效的检索查询。

查询改写如何工作

  1. 当 Agent 执行任务且存在 Knowledge Source 可用时,会触发 _get_knowledge_search_query 方法
  2. Agent 的 LLM 会将原始任务 Prompt 转换为优化后的检索查询
  3. 然后使用这个优化后的查询,从 Knowledge Source 中检索相关信息

查询改写的优势

通过聚焦关键概念并移除无关内容,查询改写可以帮助检索到更相关的信息。 改写后的查询在向量数据库检索时会更具体,也更具上下文感知能力。

示例

  1. # Original task prompt
  2. task_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."
  3. # Behind the scenes, this might be rewritten as:
  4. rewritten_query = "What movies did John watch last week?"

改写后的查询更聚焦于核心信息需求,并去掉了与输出格式相关的无关说明。

这一机制是全自动的,用户无需任何配置。查询改写由 Agent 的 LLM 完成,因此使用能力更强的 LLM 往往能提升改写质量。

Knowledge 事件

CrewAI 会在 Knowledge 检索过程中发出事件,你可以通过事件系统监听这些事件。这使你能够监控、调试和分析 Agent 是如何检索和使用 Knowledge 的。

可用的 Knowledge 事件

  • KnowledgeRetrievalStartedEvent:当 Agent 开始从 Knowledge Source 中检索信息时触发
  • KnowledgeRetrievalCompletedEvent:当 Knowledge 检索完成时触发,包含所使用的查询和检索到的内容
  • KnowledgeQueryStartedEvent:当对 Knowledge Source 发起查询时触发
  • KnowledgeQueryCompletedEvent:当查询成功完成时触发
  • KnowledgeQueryFailedEvent:当对 Knowledge Source 的查询失败时触发
  • KnowledgeSearchQueryFailedEvent:当搜索查询失败时触发

示例:监控 Knowledge 检索

  1. from crewai.events import (
  2. KnowledgeRetrievalStartedEvent,
  3. KnowledgeRetrievalCompletedEvent,
  4. BaseEventListener,
  5. )
  6. class KnowledgeMonitorListener(BaseEventListener):
  7. def setup_listeners(self, crewai_event_bus):
  8. @crewai_event_bus.on(KnowledgeRetrievalStartedEvent)
  9. def on_knowledge_retrieval_started(source, event):
  10. print(f"Agent '{event.agent.role}' started retrieving knowledge")
  11. @crewai_event_bus.on(KnowledgeRetrievalCompletedEvent)
  12. def on_knowledge_retrieval_completed(source, event):
  13. print(f"Agent '{event.agent.role}' completed knowledge retrieval")
  14. print(f"Query: {event.query}")
  15. print(f"Retrieved {len(event.retrieved_knowledge)} knowledge chunks")
  16. # Create an instance of your listener
  17. knowledge_monitor = KnowledgeMonitorListener()

更多关于事件使用的信息,请参考 Event Listeners 文档。

自定义 Knowledge Source

CrewAI 允许你通过继承 BaseKnowledgeSource 类,为任意类型的数据创建自定义 Knowledge Source。下面我们用一个实际示例来说明:抓取并处理太空新闻文章。

太空新闻 Knowledge Source 示例

python from crewai import Agent, Task, Crew, Process, LLM from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource import requests from datetime import datetime from typing import Dict, Any from pydantic import BaseModel, Field class SpaceNewsKnowledgeSource(BaseKnowledgeSource): """Knowledge source that fetches data from Space News API.""" apiendpoint: str = Field(description=”API endpoint URL”) limit: int = Field(default=10, description=”Number of articles to fetch”) def load_content(self) -> Dict[Any, str]: “””Fetch and format space news articles.””” try: response = requests.get( f”{self.api_endpoint}?limit={self.limit}” ) response.raise_for_status() data = response.json() articles = data.get(‘results’, []) formatted_data = self.validate_content(articles) return {self.api_endpoint: formatted_data} except Exception as e: raise ValueError(f”Failed to fetch space news: {str(e)}”) def validate_content(self, articles: list) -> str: “””Format articles into readable text.””” formatted = “Space News Articles:\n\n” for article in articles: formatted += f””” Title: {article[‘title’]} Published: {article[‘published_at’]} Summary: {article[‘summary’]} News Site: {article[‘news_site’]} URL: {article[‘url’]} —————————-“”” return formatted def add(self) -> None: “””Process and store the articles.””” content = self.load_content() for , text in content.items(): chunks = self._chunk_text(text) self.chunks.extend(chunks) self._save_documents() # Create knowledge source recent_news = SpaceNewsKnowledgeSource( api_endpoint="[https://api.spaceflightnewsapi.net/v4/articles](https://api.spaceflightnewsapi.net/v4/articles)", limit=10, ) # Create specialized agent space_analyst = Agent( role="Space News Analyst", goal="Answer questions about space news accurately and comprehensively", backstory="""You are a space industry analyst with expertise in space exploration, satellite technology, and space industry trends. You excel at answering questions about space news and providing detailed, accurate information.""", knowledge_sources=[recent_news], llm=LLM(model="gpt-4", temperature=0.0) ) # Create task that handles user questions analysis_task = Task( description="Answer this question about space news: {user_question}", expected_output="A detailed answer based on the recent space news articles", agent=space_analyst ) # Create and run the crew crew = Crew( agents=[space_analyst], tasks=[analysis_task], verbose=True, process=Process.sequential ) # Example usage result = crew.kickoff( inputs={"user_question": "What are the latest developments in space exploration?"} ) ```` output # Agent: Space News Analyst ## Task: Answer this question about space news: What are the latest developments in space exploration? # Agent: Space News Analyst ## Final Answer: The latest developments in space exploration, based on recent space news articles, include the following: 1. SpaceX has received the final regulatory approvals to proceed with the second integrated Starship/Super Heavy launch, scheduled for as soon as the morning of Nov. 17, 2023. This is a significant step in SpaceX’s ambitious plans for space exploration and colonization. Source: SpaceNews 2. SpaceX has also informed the US Federal Communications Commission (FCC) that it plans to begin launching its first next-generation Starlink Gen2 satellites. This represents a major upgrade to the Starlink satellite internet service, which aims to provide high-speed internet access worldwide. Source: Teslarati 3. AI startup Synthetaic has raised $15 million in Series B funding. The company uses artificial intelligence to analyze data from space and air sensors, which could have significant applications in space exploration and satellite technology. Source: SpaceNews 4. The Space Force has formally established a unit within the U.S. Indo-Pacific Command, marking a permanent presence in the Indo-Pacific region. This could have significant implications for space security and geopolitics. Source: SpaceNews 5. Slingshot Aerospace, a space tracking and data analytics company, is expanding its network of ground-based optical telescopes to increase coverage of low Earth orbit. This could improve our ability to track and analyze objects in low Earth orbit, including satellites and space debris. Source: SpaceNews 6. The National Natural Science Foundation of China has outlined a five-year project for researchers to study the assembly of ultra-large spacecraft. This could lead to significant advancements in spacecraft technology and space exploration capabilities. Source: SpaceNews 7. The Center for AEroSpace Autonomy Research (CAESAR) at Stanford University is focusing on spacecraft autonomy. The center held a kickoff event on May 22, 2024, to highlight the industry, academia, and government collaboration it seeks to foster. This could lead to significant advancements in autonomous spacecraft technology. Source: SpaceNews ````

调试与故障排查

调试 Knowledge 问题

检查 Agent Knowledge 初始化

  1. from crewai import Agent, Crew, Task
  2. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  3. knowledge_source = StringKnowledgeSource(content="Test knowledge")
  4. agent = Agent(
  5. role="Test Agent",
  6. goal="Test knowledge",
  7. backstory="Testing",
  8. knowledge_sources=[knowledge_source]
  9. )
  10. crew = Crew(agents=[agent], tasks=[Task(...)])
  11. # Before kickoff - knowledge not initialized
  12. print(f"Before kickoff - Agent knowledge: {getattr(agent, 'knowledge', None)}")
  13. crew.kickoff()
  14. # After kickoff - knowledge initialized
  15. print(f"After kickoff - Agent knowledge: {agent.knowledge}")
  16. print(f"Agent knowledge collection: {agent.knowledge.storage.collection_name}")
  17. print(f"Number of sources: {len(agent.knowledge.sources)}")

验证 Knowledge 存储位置

  1. import os
  2. from crewai.utilities.paths import db_storage_path
  3. # Check storage structure
  4. storage_path = db_storage_path()
  5. knowledge_path = os.path.join(storage_path, "knowledge")
  6. if os.path.exists(knowledge_path):
  7. print("Knowledge collections found:")
  8. for collection in os.listdir(knowledge_path):
  9. collection_path = os.path.join(knowledge_path, collection)
  10. if os.path.isdir(collection_path):
  11. print(f" - {collection}/")
  12. # Show collection contents
  13. for item in os.listdir(collection_path):
  14. print(f" └── {item}")

测试 Knowledge 检索

  1. # Test agent knowledge retrieval
  2. if hasattr(agent, 'knowledge') and agent.knowledge:
  3. test_query = ["test query"]
  4. results = agent.knowledge.query(test_query)
  5. print(f"Agent knowledge results: {len(results)} documents found")
  6. # Test crew knowledge retrieval (if exists)
  7. if hasattr(crew, 'knowledge') and crew.knowledge:
  8. crew_results = crew.query_knowledge(test_query)
  9. print(f"Crew knowledge results: {len(crew_results)} documents found")

检查 Knowledge 集合

  1. import chromadb
  2. from crewai.utilities.paths import db_storage_path
  3. import os
  4. # Connect to CrewAI's knowledge ChromaDB
  5. knowledge_path = os.path.join(db_storage_path(), "knowledge")
  6. if os.path.exists(knowledge_path):
  7. client = chromadb.PersistentClient(path=knowledge_path)
  8. collections = client.list_collections()
  9. print("Knowledge Collections:")
  10. for collection in collections:
  11. print(f" - {collection.name}: {collection.count()} documents")
  12. # Sample a few documents to verify content
  13. if collection.count() > 0:
  14. sample = collection.peek(limit=2)
  15. print(f" Sample content: {sample['documents'][0][:100]}...")
  16. else:
  17. print("No knowledge storage found")

检查 Knowledge 处理过程

  1. from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
  2. # Create a test knowledge source
  3. test_source = StringKnowledgeSource(
  4. content="Test knowledge content for debugging",
  5. chunk_size=100, # Small chunks for testing
  6. chunk_overlap=20
  7. )
  8. # Check chunking behavior
  9. print(f"Original content length: {len(test_source.content)}")
  10. print(f"Chunk size: {test_source.chunk_size}")
  11. print(f"Chunk overlap: {test_source.chunk_overlap}")
  12. # Process and inspect chunks
  13. test_source.add()
  14. print(f"Number of chunks created: {len(test_source.chunks)}")
  15. for i, chunk in enumerate(test_source.chunks[:3]): # Show first 3 chunks
  16. print(f"Chunk {i+1}: {chunk[:50]}...")

常见 Knowledge 存储问题

“File not found” 错误:

  1. # Ensure files are in the correct location
  2. from crewai.utilities.constants import KNOWLEDGE_DIRECTORY
  3. import os
  4. knowledge_dir = KNOWLEDGE_DIRECTORY # Usually "knowledge"
  5. file_path = os.path.join(knowledge_dir, "your_file.pdf")
  6. if not os.path.exists(file_path):
  7. print(f"File not found: {file_path}")
  8. print(f"Current working directory: {os.getcwd()}")
  9. print(f"Expected knowledge directory: {os.path.abspath(knowledge_dir)}")

“Embedding dimension mismatch” 错误:

  1. # This happens when switching embedding providers
  2. # Reset knowledge storage to clear old embeddings
  3. crew.reset_memories(command_type='knowledge')
  4. # Or use consistent embedding providers
  5. crew = Crew(
  6. agents=[...],
  7. tasks=[...],
  8. knowledge_sources=[...],
  9. embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}}
  10. )

“ChromaDB permission denied” 错误:

  1. # Fix storage permissions
  2. chmod -R 755 ~/.local/share/CrewAI/

Knowledge 在多次运行之间没有持久化:

  1. # Verify storage location consistency
  2. import os
  3. from crewai.utilities.paths import db_storage_path
  4. print("CREWAI_STORAGE_DIR:", os.getenv("CREWAI_STORAGE_DIR"))
  5. print("Computed storage path:", db_storage_path())
  6. print("Knowledge path:", os.path.join(db_storage_path(), "knowledge"))

Knowledge 重置命令

  1. # Reset only agent-specific knowledge
  2. crew.reset_memories(command_type='agent_knowledge')
  3. # Reset both crew and agent knowledge
  4. crew.reset_memories(command_type='knowledge')
  5. # CLI commands
  6. # crewai reset-memories --agent-knowledge # Agent knowledge only
  7. # crewai reset-memories --knowledge # All knowledge

清除 Knowledge

如果你需要清除 CrewAI 中存储的 Knowledge,可以使用带 --knowledge 选项的 crewai reset-memories 命令。

  1. crewai reset-memories --knowledge

当你更新了 Knowledge Source,并希望确保 Agent 使用的是最新信息时,这会非常有用。