1.1 - Why Graphs - 《图机器学习2》

欢迎使用CS224W，《图的机器学习》。
Welcome to CS224W, Machine Learning with Graphs.

我叫Jure Leskovec。
My name is Jure Leskovec.

我是的计算机科学副教授
I’m Associate Professor of Computer Science at

我将成为斯坦福大学的讲师。
Stanford University and I will be your instructor.

在第一堂课中，我要做的是激发和激发您对图形的兴趣，
What I’m going to do in the first lecture is to motivate and get you excited about graph,

嗯，结构化数据，以及如何将新颖的机器学习方法应用于结构化数据？
uh, structured data and how can we apply novel machine learning methods to it?

那么为什么要图呢？
So why graphs?

图是描述和描述-的通用语言
Graphs are a general language for describing and an-

分析具有交互关系的实体。
analyzing entities with the relations in interactions.

这意味着，而不是对世界的思考
This means that rather than thinking of the world

或给定的域作为一组隔离的数据点，
or a given domain as a set of isolated datapoints,

我们实际上是根据这些实体之间的网络和关系来考虑的。
we really think of it in terms of networks and relations between these entities.

这意味着有
This means that there is

实体之间关系的底层关系图，
the underla- ler- underlying graph of relations between the entities,

这些实体是相关的，嗯，
and these entities are related, uh,

彼此，嗯，
to each other, uh,

根据这些连接或图的结构。
according to these connections or the structure of the graph.

而且自然可以有很多类型的数据
And there are many types of data that can naturally be

表示为图形并对这些图形关系进行建模，
represented as graphs and modeling these graphical relations,

基础领域的这些关系结构，
these relational structure of the underlying domain,

恩，让我们可以，
uh, allows us to, uh,

建立更忠实的
build much more faithful,

准确得多，嗯，
much more accurate, uh,

基础模型
models of the underlying,

呃，数据背后的现象。
uh, phenomena underlying data.

例如，我们可以想到计算机网络，疾病路径，
So for example, we can think of a computer networks, disease pathways, uh,

物理学中的粒子网络，呃，
networks of particles in physics, uh,

食物网中的生物网络，
networks of organisms in food webs,

基础设施以及事件都可以表示为图形。
infrastructure, as well as events can all be represented as a graphs.

同样，我们可以想到社交网络，
Similarly, we can think of social networks,

嗯，经济网络，通讯网络，
uh, economic networks, communication networks,

说不同论文之间的病人，
say patients between different papers,

互联网是一个巨大的通讯网络，
Internet as a giant communication network,

以及我们大脑中神经元如何连接的方式。
as well as ways on how neurons in our brain are connected.

同样，所有这些域本质上都是网络或图形。
Again, all these domains are inherently network or graphs.

这种表示方式使我们能够捕获
And that representation allows us to capture

不同对象或实体之间的关系，
the relationships between different objects or entities,

在这些不同的领域中
uh, in these different, uh, domains.

最后，我们可以获取知识并
And last, we can take knowledge and

将事实表示为不同实体之间的关系。
represent facts as relationships between different entities.

我们可以描述细胞中的调节机制，
We can describe the regulatory mechanisms in our cells,

um，由不同实体之间的连接所控制的过程。
um, as processes governed by the connections between different entities.

我们甚至可以从现实世界中拍摄场景并呈现
We can even take scenes from real world and presented them

作为场景中对象之间的关系图。
as graphs of relationships between the objects in the scene.

这些称为场景图。
These are called scene graphs.

我们可以使用计算机代码软件，并将其表示为以下图表：
We can take computer code software and represent it as a graph of, let’s say,

在不同功能之间或作为
calls between different functions or as

代码的结构由抽象语法树捕获。
a structure of the code captures by the abstract syntax tree.

我们也可以自然地吸收由节点组成的分子，
We can also naturally take molecules which are composed of nodes, uh,

作为一组图形的原子和键的数量，um，
of atoms and bonds as a set of graphs, um,

在这里，我们将原子表示为节点，将它们的键表示为它们之间的边。
where we represent atoms as nodes and their bonds as edges between them.

当然，在计算机图形学中，
And of course, in computer graphics,

我们可以采用三维形状和-并将它们表示为um。
we can take three-dimensional shapes and- and represent them, um, as a graphs.

因此，在所有这些领域中，
So in all these domains,

图形结构是
graphical structure is the- is

这是使我们能够为基础领域建模的重要部分，
the important part that allows us to model the under- underlying domain,

潜在的现象。
underlying phenomena in a fateful way.

所以我们要考虑图的方式
So the way we are going to think about graph

此类中的关系数据本质上有两个
relational data in this class is that there are essentially two big,

呃，可以表示为图形的数据的一部分，呃。
uh, parts, uh, of data that can be represented as graphs.

首先是所谓的自然图或网络
First are what is called natural graphs or networks,

基础领域自然可以用图形表示。
where underlying domains can naturally be represented as graphs.

例如，社交网络，
For example, social networks,

社会是70亿个人的集合，他们之间的联系
societies are collection of seven billion individuals and connections between them,

电子设备之间的通信和交易，电话，
communications and transactions between electronic devices, phone calls,

金融交易，都是自然形成的，呃，图。
financial transactions, all naturally form, uh, graphs.

在生物医学中，我们有基因
In biomedicine we have genes,

调节生物过程的蛋白质
proteins regulating biological processes,

我们可以代表之间的互动
and we can represent interactions between

这些不同的生物实体用图表示。
these different biological entities with a graph.

而且-正如我所提到的，
And- and as I mentioned,

我们大脑中神经元之间的联系是
connections between neurons in our brains are,

嗯，本质上是一个由连接组成的网络。
um, essentially a network of, uh, connections.

如果我们要对这些域进行建模，
And if we want to model these domains,

真正将它们呈现为网络。
really present them as networks.

具有关系结构的域的第二个示例，
A second example of domains that also have relational structure,

嗯，在哪里，我们可以用图来表示这种关系结构。
um, where- and we can use graphs to represent that relational structure.

因此，例如，信息和知识被多次组织和链接。
So for example, information and knowledge is many times organized and linked.

软件可以表示为图形。
Software can be represented as a graph.

我们可以忍受很多次
We can many times take, uh,

数据点并连接类似的数据点。
datapoints and connect similar data points.

这将创建我们的图形，
And this will create our graph,

嗯，一个相似性网络。
uh, a similarity network.

我们可以拿其他，嗯，
And we can take other, um, uh,

具有自然关系结构的域（例如分子），
domains that have natural relational structure like molecules,

场景图，3D形状以及
scene graphs, 3D shapes, as well as,

你知道，在物理学中，
you know, in physics,

我们可以采用基于粒子的模拟来模拟
we can take particle-based simulation to simulate how,

嗯，粒子通过以下方式相互关联，
uh, particles are related to each other through,

嗯，他们用图形表示了这一点。
uh, and they represent this with the graph.

因此，这意味着存在许多不同的域，
So this means that there are many different domains, either, uh,

作为自然图或自然网络，
as natural graphs or natural networks,

以及自然可以成为其他领域的其他领域
as well as other domains that can naturally be

建模为图表以捕获关系结构。
modeled as graphs to capture the relational structure.

而这堂课的主要问题是
And the main question for this class that we are

要讲的是谈论我们如何采取
going to address is to talk about how do we take

这种关系结构的优势是可以做出更好，更准确的预测。
advantage of this relational structure to be- to make better, more accurate predictions.

这尤其重要，因为
And this is especially important because

耦合域已达到关系结构，
couplings domains have reached a relational structure,

可以用图形表示
uh, which can be presented, uh, with a graph.

通过显式地建模这些关系，
And by explicitly modeling these relationships,

我们将能够实现，嗯，
we will be able to achieve, uh,

更好的性能，制造更多，嗯，
better performance, build more, uh,

准确的模型，可以做出更准确的预测。
accurate, uh, models, make more accurate predictions.

在深度学习时代，这尤其有趣且重要，
And this is especially interesting and important in the age of deep learning,

今天的深度学习现代工具箱专门用于简单数据类型。
where the- today’s deep learning modern toolbox is specialized for simple data types.

它专门用于简单序列，呃和网格。
It is specialized for simple sequences, uh, and grids.

一个序列是
A sequence is a, uh,

像文字或语音一样具有这种线性结构
like text or speech has this linear structure and there

-已经开发出了惊人的工具来分析这种类型的结构。
has- there are been amazing tools developed to analyze this type of structure.

图像都可以调整大小并具有这种空间位置
Images can all be resized and have this spatial locality so- so

它们可以表示为固定大小的网格或固定大小的标准。
they can be represented as fixed size grids or fixed size standards.

再说一次，深度学习方法非常擅长处理此类问题，
And again, deep learning methodology has been very good at processing this type of,

呃，固定尺寸的图片。
uh, fixed size images.

但是，由于图形，网络更复杂，因此它们很难处理。
However, um, graphs, networks are much harder to process because they are more complex.

首先，它们具有任意大小，任意和复杂的拓扑。
First, they have arbitrary size and arb- and complex topology.

嗯，也没有像网格或文本那样的空间局部性。
Um, and there is also no spatial locality as in grids or as in text.

在文字中我们知道左右，
In text we know left and right,

在网格中，我们上下左右。
in grids we have up and down, uh, left and right.

但是在网络中，没有参考点，
But in networks, there is no reference point,

没有任何想法
there is no notion of,

呃，空间位置。
uh, uh, spatial locality.

第二件事是没有参考点，
The second important thing is there is no reference point,

没有固定的节点顺序允许我们进行操作，
there is no fixed node ordering that would allow us,

呃，要做，
uh, uh, to do, uh,

做深度学习。
to do deep learning.

通常，这些网络是动态的并具有多模型功能。
And often, these networks are dynamic and have multi-model features.

所以在这个过程中
So in this course,

我们真的要去，嗯，
we are really going to, uh,

谈论我们如何开发适用范围更广的神经网络？
talk about how do we develop neural networks that are much more broadly applicable?

我们如何开发适用于图形等复杂数据类型的神经网络？
How do we develop neural networks that are applicable to complex data types like graphs?

实际上，关系数据图才是新的前沿领域，
And really, it is relational data graphs that are the- the new frontier,

深度学习和表示学习的研究。
uh, of deep learning and representation learning, uh, research.

从直觉上讲，我们要做的是
So intuitively, what we would like to do is we would like to do,

嗯，建立神经网络，
uh, build neural networks,

但在输入中，我们将采用
but on the input we’ll take, uh, uh,

我们的图和输出，
our graph and on the output,

他们将能够做出预测。
they will be able to make predictions.

而且，这些预测可以在各个节点的水平上进行，
And, uh, these predictions can be at the level of individual nodes,

可以在节点或链接对的级别，
can be at the level of pairs of nodes or links,

或者它可以是更复杂的东西，例如全新生成的图形，或者，
or it can be something much more complex like a brand new generated graph or, uh,

预测可以表示的给定分子的性质，
prediction of a property of a given molecule that can be represented,

um，作为输入上的图形。
um, as a graph on the input.

问题是
And the question is,

我们如何设计这种神经网络架构
how do we design this neural network architecture

这将使我们能够做到端到端，
that will allow us to do this end to end,

意味着不需要人类特征工程了吗？
meaning there will be no human feature engineering, uh, needed?

所以我的意思是，嗯，
So what I mean by that is that, um,

在传统的机器学习方法中，
in traditional, uh, machine learning approaches,

在设计适当的功能上付出了很多努力，
a lot of effort goes into des- designing proper features,

捕获数据结构的正确方法，以便机器学习模型可以
proper ways to capture the structure of the data so that machine learning models can,

嗯，利用它。
uh, take advantage of it.

所以我们想在这堂课上做些什么，
So what we would like to do in this class,

我们将主要讨论表征学习
we will talk mostly about representation learning

这些功能工程步骤已被删除。
where these feature engineering step is taken away.

基本上，一旦我们有了图表，
And basically, as soon as we have our graph,

嗯，嗯，记录数据，
uh, uh, repr- graph data,

我们可以自动学习图形的良好表示形式，以便将其用于，
we can automatically learn a good representation of the graph so that it can be used for,

嗯，下游机器学习算法。
um, downstream machine learning algorithm.

为了使演示学习能够自动提取或学习功能，
So that a presentation learning is about automatically extracting or learning features,

呃，在图中。
uh, in the graph.

我们可以想到的表征学习的方法是映射
The way we can think of representation learning is to map

我们图的节点到d维嵌入，
nodes of our graph to a d-dimensional embedding,

到d维向量
to the d-dimensional vectors,

这样看起来好像是网络中的节点
such that seeming that are nodes in the network are

紧密地嵌入到嵌入空间中。
embedded close together in the embedding space.

因此，目标是学习将要使用的函数f
So the goal is to learn this function f that will take

节点并将它们映射到这些d维中，
the nodes and map them into these d-dimensional,

嗯，实值向量，
um, real valued vectors,

这个向量会在这里称呼它，呃，表示，嗯，
where this vector will call this, uh, representation, uh,

或给定节点的特征表示或嵌入，
or a feature representation or an embedding of a given node,

整个图的嵌入
an embedding of an entire graph,

给定链接的嵌入，
an embedding of a given link,

嗯，依此类推。
um, and so on.

因此，我们班级的很大一部分，呃，
So a big part of our class we’ll be, uh,

调查和学习最新的演示文稿学习，
investigating and learning about latest presentation learning,

可以应用的深度学习方法，
deep learning approaches that can be applied,

呃，要绘制图表，嗯，结构化数据。
uh, to graph, uh, structured data.

我们要，呃，嗯，
And we are going to, uh, uh,

谈论许多不同的话题
talk about many different topics in

图结构数据的机器学习和表示学习。
machine learning and the representation learning for graph structure data.

首先，我们将讨论传统方法
So first, we’re going to talk about traditional methods

用于机器学习和诸如graphlet和graph内核之类的图。
for machine learning and graphs like graphlets and graph kernels.

然后，我们将讨论生成方法，
We are then going to talk about methods to generate, um,

通用节点嵌入，如DeepWalk和Node2Vec之类的方法。
generic node embeddings, methods like DeepWalk and Node2Vec.

我们将花很多时间谈论
We are going to spend quite a bit of time talking about

图神经网络和流行的图神经网络架构，例如图，
graph neural networks and popular graph neural network architectures like graph,

嗯，卷积神经网络，
uh, convolutional neural network,

GraphSage架构或Graph Attention Network架构。
the GraphSage architecture or Graph Attention Network, uh, architecture.

我们还将研究图神经网络的表达能力，
We are also going to study the expressive power of graph neural networks,

嗯，他们背后的理论，
um, the theory behind them,

以及如何将它们放大到非常大的图形。
and how do we scale them up to very large graphs.

嗯，然后在本课程的第二部分中
Um, and then in the second part of this course,

我们还将讨论异构图，
we are also going to talk about heterogeneous graphs,

知识图和应用程序，
knowledge graphs, and applications,

嗯，要逻辑推理。
uh, to logical reasoning.

我们正在学习TransE和BetaE之类的方法。
We’re learning about methods like TransE and BetaE.

我们还将讨论如何建立深度生成模型以用于
We are also going to talk about how do we build deep generative models for

我们可以想到模型预测的图形
graphs where we can think of the prediction of the model to

是整个新生成的图。
be an entire newly generated graph.

我们还将讨论生物医学的应用，
And we are also going to discuss applications to biomedicine, um,

以及各种科学应用
various scientific applications, as well

作为推荐系统在工业上的应用，
as applications to industry in terms of recommender systems,

欺诈检测等。
fraud detection, and so on.

因此，这是本课程的概述。
So here is the outline of this course.

每周，每周10周，开始吧，
Week by week, 10 weeks, starting, uh,

从今天开始一直到三月中旬，
starting today and all the way to the middle of March,

嗯，在哪儿，课程将会结束。
um, where, uh, the- the course will finish.

我们将举办20场讲座，涵盖所有主题，
We will have 20 lectures and we will cover all the topics,

嗯，我已经讨论过了，
uh, that I have discussed,

特别是专注于
and in particular focus on

图神经网络和图的表示学习。
graph neural networks and the representation learning in graphs.