Goals 目标
This guide shows the different ways you can import data from a relational database to Neo4j. Completing this guide will give you the tools to choose how to import your relational data and transform it to the graph.
本指南展示了从关系数据库导入数据到Neo4j的不同方式。
完成本指南将为您提供选择如何导入关系数据并将其转换为图的工具。
Prerequisites 前提条件
You should be familiar with graph database concepts and the property graph model. It is also helpful to know the differences between relational and graph concepts and their data models.
你应该熟悉图形数据库概念和属性图模型。
了解关系和图形概念和他们的数据模型之间的区别也很有帮助。
Importing Data from a Relational Database
Often, when in a company setting, you have existing data in a system that will need transferred or manipulated for a new project. It is rare to have cases where some or all of the data for a new project is not already captured somewhere. In order to get existing data where you need it for the new process, application, or system, you will need to perform an extract-transform-load (ETL) process. Very simply, you will need to export data from the existing system(s), handle any necessary manipulations on the data for the new structure, and then import the transformed data to the new data store.
通常,在工作环境中,我们需要为新项目传输或操作系统中的现有数据。很少有新项目的部分或全部数据不需要初始化的情况。为了在新流程、应用程序或系统需要的地方获取现有数据,需要执行 提取-转换-加载 (ETL: extract-transform-load) 流程。非常简单地说,您将需要从现有系统中导出数据,处理对新结构的数据的任何必要操作,然后将转换后的数据导入到新的数据存储中。
Depending on the particular environment you are working in, different tools for importing relational to graph may provide better or faster solutions than others. In this guide, we want to discuss all of the options and why you can or should choose some over others for your use case.
根据您工作的特定环境,用于导入关系图的不同工具可能比其他工具提供更好或更快的解决方案。
在本指南中,我们想讨论所有选项,以及为什么您可以或应该为您的用例选择一些选项而不是其他选项。
Relational to Graph Import Tools
There are 3 main approaches to moving relational data to a graph. We will briefly cover how each operates on this page, but more detailed walkthroughs are in the linked pages.
将关系数据转移到图数据有三种主要方法。
我们将在此页面上简要介绍每个操作方式,但更详细的演练在链接页面中。
1) LOAD CSV: possibly the simplest way to import data from your relational database. Requires a dump of individual entity-tables and join-tables formatted as CSV files.
可能是从关系数据库导入数据的最简单方法。需要转储格式化为CSV文件的单个实体表和联接表。
2) APOC: Awesome Procedures on Cypher. Created as an extension library to provide common procedures and functions to developers. This library is especially helpful for complex transformations and data manipulations. Useful procedures include apoc.load.jdbc, apoc.load.json, and others.
非常棒的 Cypher 程序 。为开发人员提供通用存储过程和函数的扩展库。该库对于复杂的转换和数据操作特别有用。有用的存储过程包括 apoc.load.jdbc、apoc.load.json 等。
3) ETL Tool: internally-built UI tool that translates relational to graph from a JDBC connection. Allows bulk data import for large data sets with a fast performance and simple user experience.
可将关系型数据从 JDBC 连接转换为图数据的内部构建的UI工具。
允许以快速的性能和简单的用户体验为大型数据集进行批量数据导入。
4) Kettle: open-source tool for enterprise-scale data export and import. Handles a variety of data sources and large data sets easily and organizes the data flow process.
用于企业级数据导出和导入的开源工具。轻松处理各种数据源和大型数据集,并组织数据流流程。
5) Other ETL tools: there are also a few vendor and community tools available for similar etl processes and GUI interaction for getting data in various formats into and out of Neo4j. Some of these tools also can map out the flow and transformation of data through the system.
也有一些供应商和社区工具可用于类似的etl流程和GUI交互,以获取各种格式的数据进出Neo4j。
这些工具中的一些还可以绘制出通过系统的数据流和转换。
6) Programmatic via drivers: ability to retrieve data from a relational database (or other tabular structure) and use the bolt protocol to write it to Neo4j through one of the drivers with your programming language of choice.
能够从关系数据库 (或其他表格结构) 检索数据,并使用 bolt 协议使用您选择的编程语言通过驱动程序之一将其写入Neo4j。
You should create and understand your graph data model before transferring the data from an existing relational structure to a graph. If you do not have a good data model, then jumping into the import can cause frustration on data cleanup later. 你应该创建并理解你的 图数据模型在将数据从现有关系结构传输到图之前。 如果您没有一个好的数据模型,那么跳入导入可能会导致以后数据清理失败。
LOAD CSV
This built-in Cypher function allows users to take existing or exported CSV files and load them into Neo4j with Cypher statements to read, transform, and import the data to the graph database. It allows the user to run statements individually or run them batched in a Cypher script. Because this functionality is provided in Cypher out-of-the-box, you do not need any additional plugins or configuration, and those already familiar with Cypher may prefer this route.
这个内置的 Cypher 函数允许用户获取现有的或导出的CSV文件,并使用 cypher 语句将它们加载到 Neo4j 中,以读取、转换数据并将其导入图形数据库。它允许用户单独运行语句或在 cypher 脚本中批量运行它们。因为此功能是在Cypher 开箱即用中提供的,所以您不需要任何其他插件或配置,并且那些已经熟悉 Cypher 的人可能更喜欢这种方法。
However, certain difficult or complex transformations may not be easily achievable or provided in Cypher. For those cases, you might need to add an APOC procedure to the LOAD CSV statements or use another import tool.
然而,某些困难或复杂的转换可能不容易实现或在 Cypher 中提供。对于这些情况,您可能需要添加一个 APOC 程序到 LOAD CSV 语句或使用其他导入工具。
LOAD CSV Resources
- Cypher Manual: LOAD CSV
- Guide: CSV Import
- Docs Tutorial: LOAD CSV for import
APOC
APOC is Neo4j’s utility library for handling data import, as well as data transformations and manipulations. From converting values to altering the data model, this library can manage it all, allowing you to combine and chain procedures in order to get exactly the results you are looking for.
APOC是Neo4j的实用程序库,用于处理数据导入以及数据转换和操作。从转换值到更改数据模型,该库可以管理所有内容,允许您组合和链接过程,以便准确获得所需的结果。
For data import, APOC offers several options depending on your data source and format. It can import files or data from a URL in CSV, JSON, or XML formats, as well as loading data straight from a database (using JDBC). When you call these procedures, you can pass in the data source and use other procedures to manipulate data or regular Cypher to insert or update to the database. There are also procedures for batching data, adding wait/sleep commands, and handling large data sets or temperamental data sources.
对于数据导入,APOC根据您的数据源和格式提供了多个选项。它可以以CSV、JSON或XML格式从URL导入文件或数据,也可以直接从数据库加载数据 (使用JDBC)。调用这些存储过程时,可以传入数据源,并使用其他存储过程来操作数据或常规密码以插入或更新数据库。也有处理数据、添加等待/睡眠命令以及处理大型数据集或喜怒无常数据源的程序。
The transformation procedures in this library are nearly endless, allowing the developer to process dynamic labels or relationships, correct/skip null or empty values, format dates or other values, generate hashes, and handle other tricky data scenarios. If you are in need of a way for flexible and custom data handling, APOC could be the way to go. The downside to using this library for complicated scenarios is that it may result in many lines of code to handle multiple data transformations.
该库中的转换过程几乎是无穷无尽的,允许开发人员处理动态标签或关系,更正/跳过null或空值,格式化日期或其他值,生成哈希值,并处理其他棘手的数据场景。如果您需要一种灵活且自定义的数据处理方式,APOC可能是一条路。将此库用于复杂场景的缺点是,它可能导致许多代码行处理多个数据转换。
APOC Resources
- Documentation: APOC
- Videos: APOC Video Series
- Source code: Github project
ETL Tool
Neo4j’s ETL tool provides a simple GUI that allows you to load data from nearly any type of relational database to a Neo4j instance. The process has you set up a JDBC connection to nearly any type of relational database, then does some auto-mapping to a graph data model rendered as a visualization that you can edit to your use case. Finally, you can choose whether the load occurs on a running or shutdown Neo4j instance and import the data.
Neo4j的ETL工具提供了一个简单的GUI,允许您将数据从几乎任何类型的关系数据库加载到Neo4j实例。该过程使您与几乎任何类型的关系数据库建立JDBC连接,然后,执行一些自动映射到图形数据模型,该模型呈现为可视化,您可以根据用例进行编辑。最后,您可以选择加载是发生在正在运行还是关闭的Neo4j实例上,然后导入数据。
This tool provides a simple, straightforward process for an initial import from a relational database to Neo4j quickly and efficiently. However, it does not provide the ability at this point in time to handle incremental loads or updates to existing data. It is a community-driven tool, so updates are made as needed and not on a scheduled timeline.
该工具为快速高效地从关系数据库到Neo4j的初始导入提供了一个简单、直接的过程。但是,它目前无法提供处理增量加载或现有数据更新的能力。它是一个社区驱动的工具,因此更新是根据需要进行的,而不是在预定的时间表上进行的。
ETL Tool Resources
- Developer guide: Neo4j ETL Tool
- Blog post: Translating Relational Data to Graph
- Source code: Github project
Kettle
This highly diverse and flexible data loading tool has several connection options to and from Neo4j, as well as capabilities to generate CSV files from other systems to load into your graph database. Its goal is to help you create and manage a simple, self-describing, and maintainable data integration process from beginning to end.
这种高度多样化和灵活的数据加载工具具有与Neo4j之间的多种连接选项,以及从其他系统生成CSV文件以加载到图形数据库的功能。它的目标是帮助您从头到尾创建和管理一个简单的、自我描述的、可维护的数据集成过程。
Kettle builds a data loading process that is self-documenting and transparent. It is especially helpful if the data import requires data retrieval from multiple sources or if there are multiple dependent steps to build or update your graph. If you need to transformation the data coming in or going out, Kettle can handle different kinds of manipulations, including aggregations. Processes that need to log information to Neo4j or flexibility for embedding in various environments also make excellent cases for using Kettle.
Kettle 建立了一个自我记录和透明的数据加载过程。如果数据导入需要从多个来源检索数据,或者如果有多个相关步骤来构建或更新图表,这将特别有用。如果你需要转换进出的数据,凯特尔可以处理不同类型的操作,包括聚合。需要将信息记录到Neo4j或灵活嵌入各种环境的过程也为使用 Kettle 提供了极好的案例。
All of this functionality is bundled out-of-the-box through a simple, yet powerful GUI for your ETL developers. Cooperation with Neo4j simply requires the plugins for our graph data integration.
所有这些功能都是通过一个简单而强大的图形用户界面为您的ETL开发人员捆绑在一起的。与Neo4j的合作只需要我们的图形数据集成的插件。
Kettle Resoures
- Kettle Download: Open-source project on SourceForge
- Neo4j Plugins: Integrate Kettle with Neo4j
- Blog post: Getting Started with Kettle and Neo4j
Other ETL Tools
There are a few other data integration tools provided by other individuals or companies that work well with Neo4j. Open-source options such as Talend or Nifi offer simple processes for simple processes with already-familiar tools.
还有一些其他个人或公司提供的其他数据集成工具,与Neo4j配合良好。开源选项 (如Talend或Nifi) 为使用已经熟悉的工具的简单流程提供了简单的流程。
Other Resources
- Talend: Writing data to Neo4j
- Documentation: Talend Neo4j Connector
- Blog post: Fun with music, Talend, and Neo4j
- Source code: Apache Nifi / Neo4j Connector
Import Programmatically with Drivers
For importing data using a programming language, you can use the Neo4j driver for your preferred language and execute Cypher statements to/from the database. This process is also helpful if you do not have access to the Cypher shell or if the data is not available as an accessible file.
要使用编程语言导入数据,可以选择相应的偏好的编程语言的 Neo4j 驱动程序,并在数据库中执行Cypher语句。
如果您无法访问Cypher shell或数据不能作为可访问文件使用,此方案也很有帮助。
You can set up the driver connection to Neo4j, and then execute Cypher statements that pass from the application-level through the driver and to the database for various operations - including large amounts of inserts and updates. Using the driver and programming language can be very useful for incremental updates to data passed from other systems into Neo4j.
您可以设置与Neo4j的驱动程序连接,然后执行Cypher语句,这些语句从应用程序级别通过驱动程序传递到数据库以进行各种操作-包括大量插入和更新。使用驱动程序和编程语言对于从其他系统传递到Neo4j的数据的增量更新非常有用。
