欢迎回到斯坦福244W,
Welcome back to Stanford 244W,
呃,图的机器学习。
uh, Machine Learning with Graphs.
嗯,在这部分的演讲中,
Um, in this part of the lecture,
我将讨论图形机器学习研究的应用及其影响,
I’m going to discuss applications of graph machine learning research and its impact,
在许多不同的应用程序中
uh, across many different, uh, applications.
因此,在马赫图机器学习中,
So in mach- graph machine learning,
我们可以制定不同类型的任务。
we can formulate different types of tasks.
我们可以在单个节点的级别上制定任务。
We can formulate tasks at the level of individual nodes.
我们可以在以下级别制定任务,
We can formulate tasks at the level of,
呃,边缘,嗯,这是成对的节点。
uh, edges, uh, which is pairs of nodes.
我们可以在节点的子图级别识别或定义任务,
We can identify or define tasks at the level of subgraphs of nodes,
以及整个um级的任务
as well as the tasks at the level of the entire, um,
图,例如用于图级别预测或图生成。
graphs like for a graph level prediction or, uh, graph generation.
我要说的是,
And what I’m going to talk, uh,
接下来是完成这些不同级别的任务,然后向您展示,
next is go through these different levels of tasks and show you, uh,
不同,嗯,应用程序,嗯,
different, uh, applications, uh,
以及可以应用此类方法模型的不同领域。
and different domains where this type of methods models can be applied.
因此,对于节点级任务,
So for node level tasks,
我们通常谈论节点分类,
we generally talk about node classification,
我们试图预测节点的属性的地方。
where we are trying to predict a property of a node.
例如,对,
For example, categorize, uh,
在线用户或对项目进行分类。
online users or categorize items.
在链接预测中,我们试图预测是否存在
In link prediction, we tried to predict whether there
缺少一对节点之间的链接。
are missing links between a pair of nodes.
此任务的一个此类示例是知识图完成。
One such example of this task is knowledge graph completion.
在诸如图形分类之类的图形级任务中,
In, uh, graph level task like graph classification,
我们尝试对不同的图进行分类。
we try to categorize different graphs.
嗯,例如,我们可能想将分子表示为
Uh, for example, we may want to represent molecules as,
嗯,先绘制图表,然后预测分子的特性。
uh, graphs and then predict properties of molecules.
这是特别有趣和重要的任务
This is especially interesting and important task
对于我们尝试预测的药物设计,
for drug design where we try to predict,
呃,属性不同,
uh, properties of different,
嗯,分子,不同的药物。
uh, molecules, different drugs.
我们还可以执行聚类或社区检测,
We can also perform clustering or community detection,
目标是确定,嗯,
where the goal is to identify, um,
紧密整齐,呃,图表的各个部分,嗯,
closely neat, uh, subparts of the graph, uh,
节点之间紧密连接或高度连接的地方。
where nodes are densely connected or highly connected with each other.
嗯,这些方法的应用可能是社交圈检测。
Um, and application of these could be social circle detection.
然后还有其他类型的任务。
And- and then there are also other types of tasks.
例如,图生成或图,嗯,嗯,进化,
For example, graph generation or graph, um, uh, evolution,
例如,图生成可以是
where graph generation could be, for example,
用于药物发现以产生新的分子结构。
used for drug discovery to generate novel molecular structures.
图和预测图的演变非常有用,嗯,
And graph- and predicting graph evolution is very useful, uh,
在物理学中,我们想对各种物理现象进行精确的模拟,
in physics where we wanna run accurate simulations of various kinds of physics phenomena,
可以用图表来表示
and that can be represented, um, as a graph.
因此,在所有这些机器学习任务中,我们使用
So in all these machine learning tasks, uh, we use, uh,
我们使用,嗯,图表,嗯,
we use, uh, graphs, uh,
这会导致高影响力的应用程序。
which leads to high, uh, impact applications.
现在我想给你一些例子。
And now I wanna give you some examples of them.
因此,首先,我将为您提供一些示例
So first, I’m going to give you some examples of
节点级机器学习应用程序。
node level machine learning applications.
所以,嗯,最近的一个是12月底宣布的,
So, um, a very recent one announced at the end of December,
嗯,今年是
uh, this year is,
嗯,以下问题。
uh, the following problem.
这就是所谓的蛋白质折叠
It’s called protein folding,
基本上在我们体内
where basically in our bodies,
我们有这些叫做蛋白质的分子
we have these molecules called proteins that
调节各种生物过程,例如,
regulate various biological processes, and for example,
毒品的方式,呃,
the way that drugs, uh,
工作是约束或改变行为
work is to bind or change behavior of
不同的蛋白质然后改变我们体内的生物过程,
different proteins which then then changes the biological processes in our body,
这样,例如,
and this way, uh, for example,
我们-我们-我们得到了治愈或我们得到了治愈。
we- we- we get cured or we we heal.
嗯,蛋白质是由-组成的,
Um, proteins are- are composed,
嗯,是氨基酸。
uh, um, of amino acids.
我们可以将蛋白质视为氨基酸序列。
And we can think of our protein as a sequence of amino acids.
但是,由于磁力和不同种类的力,
However, due magnetic and different kinds of forces,
这些-这些蛋白质不是这些,嗯,嗯,
these- these proteins are not these, um, uh,
链或菌株,但实际上它们实际上是折叠的,
chains or strains, but the- they are actually- they actually fold,
嗯,形状非常复杂。
um, in very complex, uh, shapes.
这是生物学中非常重要的问题之一,非常,嗯,
And one of the very important problems in biology, a very, uh, um,
一个尚未解决的问题被放弃了-氨基酸序列
a problem that hasn’t yet been solved is given up- a sequence of amino acids,
您可以预测潜在蛋白质的3D结构吗?
can you predict the 3D structure of the underlying protein?
所以计算任务,嗯,
So the computational task that, um,
自从呃,呃,
scientists have been running competitions about since, um, uh,
70年代是关于我们如何计算-计算预测
‘70s is about how do we computation- computationally predict
蛋白质的3D结构完全基于其氨基酸序列。
protein’s 3D structure based solely on its amino acid sequence.
嗯,在这里,我告诉你一些,嗯,
Um, and here I show you a few, um, uh,
两种不同蛋白质的三维结构,
the three-dimensional structure of two different proteins,
而且您可以看到的是-这种折叠
and what you can see is that- that this folding of
蛋白质是-根据其蛋白质非常复杂,
a protein is- is very complex based on its,
呃,氨基酸结构。
uh, amino acid structure.
所以问题是
So the question is,
给定氨基酸序列,
given a sequence of amino acids,
我们可以预测三维结构吗
can we predict the three-dimensional structure,
嗯,蛋白质呢?
um, of the protein?
这是最近才解决的问题。
And this is the problem that has been just recently solved.
在2020年12月中旬,
In the middle of December of, uh, 2020, uh,
DeepMind宣布推出Alpa- AlphaFold,可提高性能,
DeepMind announced Alpa- AlphaFold that increased the performance, um,
或准确性
or the accuracy of this,
嗯,蛋白质折叠,
uh, protein folding, uh,
一路攀升30%,直到90年代的最高值。
applications by 30 percent all the way up to the values that are in high 90s.
在这里,我只展示几个,嗯,
And here I just show a couple of, um, uh,
媒体上的文章标题,
titles of articles in media,
嗯,关于多么重要,
uh, about how an important, uh,
成就这个-这是
achievement this- this has been,
它如何永久改变生物学,
how it changed the biology forever,
它是如何解决最大的科学开放性问题之一的,
how it solved the- one of the largest scientific open problems,
以及这将如何加速药物研发以及各种
and how this will turbocharge drug discovery and all kinds of,
嗯,很重要,嗯,这意味着什么。
uh, important, um, implications that this has.
有趣的是,呃,科学的,
And what is interesting in this, uh, scientific, uh,
AI机器学习的突破在于
AI machine learning breakthrough is that the key idea that
使之成为可能是为了代表基础,
made this possible was to represent the underlying,
恩,蛋白质为图表。
uh, protein as a graph.
嗯,在这里他们将其表示为空间图,
Uh, and here they represented it as a spatial graph,
该图中的节点是蛋白质序列中的氨基酸,
where nodes in this graph were amino acids in the protein sequence,
边缘对应于um
and the edges corresponded to, um,
氨基酸在空间上彼此靠近。
ami- to nodes- to amino acids that are spatially close to each other.
因此,这意味着现在给定位置,
So this means that now given the positions, um,
所有氨基酸和它们之间的边缘接近度,
of all the amino acids and the edges proximities between them,
图神经网络
the graph neural network, uh,
经过培训的方法可以预测新职位,
approach was trained that it predicted the new positions,
嗯,是um的氨基酸。
uh, of the- of the, um, amino acids.
通过这种方式,可以模拟蛋白质的折叠,并且-
And this way, uh, the folding of the protein was able to be simulated and the-
并且可以预测分子的最终位置。
and the posi- the final positions of the molecules were able to be, uh, predicted.
因此,完成这项工作的关键要素
So the key ingredient in making this work,
在蛋白质折叠方面取得科学突破是
in making this scientific breakthrough in protein folding was
图表示法和图神经网络的使用,嗯,技术。
the use of graph representation and the graph neural network, uh, technology.
现在,这是在节点级别上,
Now, uh, this was on the level of nodes,
基本上对于图中的每个节点,
where basically for every node in the graph,
我们试图预测它,嗯,
we tried to predict its, um,
嗯,在太空中的位置,
uh, position in space,
这样,嗯,
and this way, uh,
告诉什么是蛋白质的三维组织。
tell what is the three-dimensional organization of a protein.
现在我们要讨论边缘级机器学习任务,
Now we are going to talk about edge-level machine learning task,
我们基本上是在进行链接预测或尝试
where we are basically doing link prediction or trying to
了解不同节点之间的关系。
understand relationship between different nodes.
第一个示例是在推荐系统中,
The first example of this is in recommender systems,
基本上,我们可以在用户与商品互动时将其视为
where basically we can think of these as users interacting with items,
产品,电影,
items being products, movies,
嗯,歌曲,等等。
um, songs, and so on.
节点,嗯,将是-我们将有两种类型的节点。
And nodes, uh, will be- we’ll have two types of nodes.
我们将有用户,也将有项目。
We will have users, and we would have items.
如果用户消费,购买,
And there is an edge between a user and an item if a user consumed, bought,
审查,给定的项目或听了给定的歌曲,或者,
reviewed, uh, a given item or listened to a given song or,
嗯,看了一部电影。
uh, watched a given movie.
根据此图的结构以及用户和项目的属性,
And based on the structure of this graph and the properties of the users and the items,
我们想预测或推荐
we would like to predict or recommend
将来用户还会对哪些其他项目感兴趣?
what other items given users might be interested in, uh, in the future.
因此,我们自然有一个二部图,还有一个图问题。
So we naturally have a bipartite graph and, um, a graph problem.
还有像这样的公司中使用的现代推荐系统
And the modern recommender systems used in companies like, uh,
Pinterest,LinkedIn,呃,Facebook,
Pinterest, LinkedIn, uh, Facebook,
嗯,Instagram,嗯,阿里巴巴,
uh, Instagram, uh, Alibaba,
嗯,其他地方都基于这些图形表示
um, and elsewhere are all based on these graphical representations
并使用图表示学习和图神经网络进行预测。
and use graph representation learning and graph neural networks to make predictions.
而这里的关键见解是,我们基本上可以
And the key insight here is that we can basically
了解如何嵌入或表示节点,
learn how to embed or how to represent nodes,
嗯,这个图,这样相关的节点是
um, of this graph such that related nodes are
嵌入比不相关的节点彼此靠近。
embedded closer to each other than nodes that are not related.
例如,在Pinterest的情况下,
And for example, in case of Pinterest,
我们可以想到,嗯,
we can think of, uh,
Pinterest图片作为图中的节点,
Pinterest images as nodes in the graph,
而目标是嵌入,嗯,
and the goal is to embed, um,
相关的节点-相关的图像
nodes that are related- images that are related
比不相关的图像靠得更近。
closer together than images that are not related.
例如,这个,
For example, this, uh,
毛衣和蛋糕。
sweater and the cake.
而做到这一点的方法是创建这种类型的双向网络,
And the way one can do this is to create this type of bipartite network,
我们在顶部有图像,我们可以有,
where we have the images on the top, and we can have,
例如,底部的用户或Pinterest板。
for example, users or Pinterest boards at the bottom.
然后我们可以定义一个神经网络方法
And then we can define a neural network approach that will take
这些不同引脚的功能信息或属性信息,
the feature information or attribute information of these different pins,
所以基本上是图片的内容
so basically the content of the image,
并将其转化为基础图,以进行强大的嵌入,
and transform it across the underlying graph to come up with a robust embedding,
恩,给定的,恩,图像。
uh, of a given, uh, image.
事实证明,这种方法很有用,
And it turns out that this approach works much,
比仅考虑图像本身要好得多。
much better than if you would just consider images by themselves.
因此图像加上图结构导致
So images plus the graph structure leads to
比图片本身更好的建议。
much better recommendations than the image themselves.
因此,在此任务示例中,
So here in this example of the task,
它是关于了解节点对或节点对之间的关系
it is about understanding relationships between pairs of nodes or pairs of
基本上是说节点是
images by basically saying that nodes that are
相关内容应更紧密地嵌入在一起,
related should be embedded closer together,
它们之间的距离应小于它们之间的距离,
the distance between them should be smaller than the distance between,
嗯,彼此不相关的图像对。
uh, pairs of images that are not related to each other.
嗯,链接级别预测任务的另一个示例非常不同。
Um, another example of a link level prediction task is very different.
这是关于,呃,
This is about, uh,
药物组合的副作用。
drug combination side effects.
嗯,这里的问题是很多病人服用
Uh, the problem here is that many patients take
同时使用多种药物来治疗复杂和并存的疾病。
multiple drugs simultaneously to trick- to treat complex and coexisting diseases.
例如,在美国,基本上,
For example, in the United States, basically,
70岁以上的人中有50%的人同时服用四或
fif- 50 percent of people over 70 years of age simultaneously take four or,
嗯,五种或更多药物。
uh, five or more drugs.
而且有很多病人服用
And there are many patients who take
20-20多种药物可治疗多种复杂的并存疾病。
20- 20 plus drugs to treat many complex coexisting diseases.
例如,某人受苦,
For example, somebody who suffers,
呃,失眠,沮丧,
uh, insomnia, suffers depression,
患有心脏病
and has a heart disease,
所有人都会同时服用多种药物,
all simultaneously will- will take many different drugs,
呃,一共。
uh, altogether at once.
问题是这些药物,呃,
And the problem is that these drugs, uh,
彼此互动,嗯,
interact with each other, um,
并导致新的不良副作用。
and they lead to new adverse side effects.
因此,基本上,药物之间的相互作用会导致额外的
So basically, the interactions between drugs leads to additional, uh,
疾病,嗯,呃或其他问题,
diseases, um, uh, or additional problems,
呃,在那个人中。
uh, in that human.
嗯,当然,不同药物的组合数量太多了,
Uh, and of course, the number of combinations of different drugs is too big,
因此我们无法进行实验或临床试验测试
so we cannot experimentally or in clinical trials test
每种药物的组合,看看会导致什么样的副作用。
every combination of drugs to see what kind of side effects does it lead to.
所以问题是,我们可以建立一个预测引擎
So the question is, can we build up predictive engine that for
任意药物对都会预测这些药物如何相互作用,
an arbitrary pair of drugs will predict how these drugs are going to interact,
以及什么样的不良副作用,
and what kind of adverse side effects,
嗯,它们可能会引起什么?
uh, they may cause?
这也是一个图形问题。
And this is also a graph problem.
因此,让我告诉您我们如何制定它。
So let me tell you how we formulate it.
嗯,我们创建这个,嗯,
Um, we create this, um,
两级异构网络,其中三角形是
two-level heterogeneous network where triangles are the, uh, uh,
不同的药物,嗯
different drugs and, um,
圆圈是我们体内的蛋白质。
circles are proteins in our bodies.
然后,药物起作用的方式是它们靶向不同的蛋白质。
And then the way drugs work is that they target the different proteins.
因此,这些是三角形和圆形之间的边。
So these are the edges between triangles and the circles.
而且,嗯,生物学家一直在计划
And, um, biologists have been mapping out
他们之间的蛋白质-蛋白质相互作用网络
the protein-protein interaction network where they
实验性地测试两种蛋白质是否物理存在
experimentally test whether two proteins physically come
在一起并相互作用以调节给定的生物学过程或功能。
together and interact to regulate a given biological process or function.
所以我们也从实验上知道
So we also know, experimentally,
哪些蛋白质彼此相互作用。
which proteins interact with each other.
这就是所谓的蛋白质-蛋白质相互作用网络,
And this is called a protein-protein interaction network,
或也称为直肠间。
or also called the inter-rectum.
然后是我们拥有的最后一组链接
And then the last set of links we have in
该图是已知的副作用,例如,
this graph are the known side-effects where basically, for example,
节点C和节点M之间的链接
the link between the node C and node M
说,如果您同时服用这两种药物,
says that if you take these two drus- drugs together,
已知会发生R型的副作用。
the side-effect of type R is knowing- known to occur.
当然,众所周知,这种副作用网络并不完整,
Of course, this network up here of side-effects is notoriously incomplete and,
呃,有很多丢失的连接。
uh, has a lot of missing connections.
所以问题就变成了,我们可以推算吗,
So the question becomes, can we impute,
我们可以预测缺失的边缘吗?
can we predict the missing edges,
缺少连接,嗯,在这,嗯,
missing connections, um, in this, uh,
基本上可以告诉我们的网络
network that would basically tell, us um,
如果采取的话,我们可以期待什么样的副作用,
how lay- what kind of side-effects can we expect if we take,
呃,或者一个人同时服用两种药物?
uh, or if a person takes two drugs simultaneously?
所以我们这样想
So the way we think of this,
我们认为它是胶三角节点之间的链接预测,
we think of it as a link prediction between triangular nodes of g- um,
在图表中,基本上是在给出问题的地方,嗯,
in the graph, where basically the question is, given, uh,
这两种药物可能会发生什么样的副作用?
the two drugs, what kind of side effects, uh, may occur?
有趣的是,您可以应用这种方法
And what is interesting is that you can apply this method, um,
非常准确,您可以发现
very accurately and you can discover
过去还不知道的新副作用。
new side effects that haven’t been known, uh, in the past.
例如,在这个,
For example, in this, uh,
在这种情况下,
in this case, um,
这个模型,嗯,
the mo- the model, uh,
输出了最可靠的十大预测,嗯,可以肯定的是,
outputted the top ten predictions it is most, uh, certain about,
基本上,您的阅读方式是说您是否认为这两种药物,
where basically the way you read it is to say if you think these two drugs,
那么很可能会发生这种特殊的副作用。
then this particular side effect is likely to occur.
而且,嗯,这些副作用实际上都不在官方FDA数据库中的da-中。
And, uh, none of these side-effects are actually in the da- in the official FDA database.
所以作者在这里所做的是
So what the authors did here is they took
该模型的前10个预测,然后它们
the top 10 predictions from the model and then they
看了医学文献,
looked in the medical literature and
临床医学笔记,看看是否有,
clinical medical notes to see if there- are there any,
嗯,任何可能的报告,
um, any reports that could,
嗯,告诉我们是否,
uh, tell us whether, uh,
并提供证据证明该特殊情况,
and provide evidence of whether this particular uh,
一对药物可能导致给定的副作用。
pair of drugs could lead to a given side-effect.
然后,实际上,对于前10名中的5名,
Then actually, for the five out of top 10,
我们实际上,嗯,发现,嗯,
we actually, um, found, uh,
有一些研究证据表明这一点,
that there is some research evidence that points that this,
嗯,这个预测,
um, that this predictions,
嗯,实际上可能是对的。
um, might actually be true.
因此,这些是节点对级别的机器学习任务。
So these were the machine learning tasks at the level of pairs of nodes.
因此,我们谈到了推荐系统,也谈到了副作用预测。
So we talked about recommender systems and I talked about the side effect prediction.
现在,我想谈谈子图级机器学习任务。
Now, I wanna talk about the sub-graph level machine learning task.
嗯,这是一个,嗯,
Um, and here is one, um,
最近,我们每天都在使用。
very recent that we are all using every day.
关于流量预测。
It’s about traffic prediction.
例如,如果今天您打开
So for example, if today you open
Google Maps,您说我想开车去斯坦福,呃,
Google Maps and you say I wanna drive- drive from Stanford, uh,
一直到伯克利,嗯,
all the way up to Berkeley, uh,
Google会告诉您要花多长时间
Google will tell you how long it will take you to get
那里,您预计到达的时间是什么。
there and what is your estimated time of arrival.
我不确定你是否知道
And I’m not sure you knew,
但实际上,最后
but actually, uh, in the end,
图机学习用于做出旅行时间的这些预测,
graph machine learning is used to make these predictions of the travel time,
图表的创建方式是节点代表路段,
and the way the graph is created is that nodes represent a road segments and,
嗯,路段之间的连通性
uh, connectivity between road segments,
um,被该网络的边缘捕获。
um, is captured by the edges of this network.
然后,嗯,我们的图神经网络方法是-
And then, um, our graph neural network approach is-
根据条件进行训练,嗯,
is trained that based on the conditions, uh, uh,
和每个路段的交通方式,嗯,
and traffic patterns on each of the road segment, um,
以及来源与目的地之间的路径um,
as well as the path between the source and the destination, um,
嗯,在旅途中,
uh, of the- of the journey, uh,
图神经网络方法经过训练可以预测以下估计值:
the graph neural network approach is trained to predict the estimate that,
呃,到达时间,或者,是旅行时间。
uh, time of arrival or, uh, travel time.
嗯,这个-并宣布实际上,这个,
Um, and this- and it has been announced that actually this, um,
在Google地图的生产中使用基于图的方法,因此,无论何时,
graph-based approach is used in production in Google Maps, so whenever, uh,
你在问路,
you are asking for directions,
实际上有一种基于图机器学习的方法
there is actually a graph machine learning-based approach
告诉你什么时候到达
that tells you when are you going to arrive,
嗯,到给定的位置。
uh, to a given location.
最后,我想谈谈图形级机器学习任务,
And last, I wanna talk about graph-level machine learning tasks,
嗯,还有一些有趣的图形级任务的有影响的应用程序。
uh, and some interesting impactful applications of graph-level tasks.
嗯,最近是关于药物发现的。
Um, one very recent is around drug discovery.
实际上,基于图的图式机器学习是
And actually, graph- graph-based machine learning was
用来发现新药,新抗生素,对吗?
used to discover new drugs, new antibiotics, right?
抗生素是小分子图,我们可以代表分子
Antibiotics are small molecular graphs and we can represent molecules
如图,其中节点是原子,边对应于化学键。
as graphs where the nodes are atoms and edges correspond to chemical bonds.
因此,每个分子都可以表示为图形。
So each molecule can be represented as a graph.
但是后来我们有了这些银行,
But then we ca- we have these banks,
呃,或者数十亿个分子的集合。
uh, or collections of billions of molecules.
问题是
And the question is,
哪些分子可以起到治疗作用。
which molecules could have, uh, therapeutic effect.
因此从本质上讲,应该对哪些分子进行优先排序,以便
So essentially, which molecules should be prioritized so that
生物学家可以将它们通过实验室以进行验证,或者
biologists can pass them in the laboratory to validate or,
嗯,他们的治疗效果。
um, their ther- therapeutic effect.
实际上,麻省理工学院的一个团队正在使用um,graph
And actually, a team at MIT was using, um, graph, uh,
基于深度学习的方法
based deep learning approach for
他们使用图神经网络的抗生素发现,嗯,
antibiotic discovery where they used a graph neural network, uh,
对不同的分子进行分类并从一组分子中预测有前途的分子
to classify different molecules and predict promising molecules from a pool of,
呃,数十亿的候选人。
uh, billions of candidates.
然后这些预测会进一步
And then these predictions would have further,
呃,在实验室里验证过了。
uh, validated, uh, in the lab.
有一个非常令人兴奋的,嗯,
And there is a very exciting, um,
突破性论文发表于
breakthrough paper published in,
呃,日记单元格,
uh, journal cell, uh,
就在今年
just this year about how
这些基于图的方法使我们能够快速有效地发现,
these graph-based approach allows us to efficiently and quickly discover,
嗯,不同的新药和新治疗用途,
uh, new drugs and new therapeutic uses of different,
呃,分子的类型。
uh, types of molecules.
嗯,进一步谈论药物发现,嗯,
Um, to further talk about drug discovery, uh,
我们也可以考虑将图生成作为一种方法
we can think also about graph generation as a way to
发现以前从未合成或考虑过的新分子。
discover new molecules that have never been synthesized or considered, uh, before.
这非常有用,因为它允许我们生成新的结构,
And this is very useful because it allows us to generate new structures,
以各种有针对性的方式开发新分子。
new molecules in various kinds of targeted ways.
例如,我们可以说产生了无毒的新分子,
For example, we can say generate new molecules that are non-toxic,
产生具有高溶解度的新分子,
generate new molecules that have high solubility,
产生具有高度药物相似性的新分子。
generate new molecules that have high drug likeness.
因此,我们现在可以有针对性地将分子生成为图形。甚至没有。
So we can generate now molecules as graphs in a targeted way. Not even that.
第二个用例是我们可以优化
The second use case is that we can optimize
现有分子具有理想的特性。
existing molecules to have a desirable property.
因此,基本上,这里的用例是您拥有分子中的一小部分,
So basically, the use case here is that you have a small part of the molecule that has,
嗯,例如给定的治疗效果。
uh, a given therapeutic effect, for example.
现在我们要完成,嗯,
And now we wanna complete, uh,
其余的分子支架使您可以改善,
the rest of the molecule scaffold so that you improve,
嗯,给定的属性。
um, a given property.
例如-例如,
For example- for example, uh,
溶解度和这种类型的深图,呃,
solubility and this type of deep graph, uh,
生成模型,呃,可以用于任务,
generative models, uh, can be used for tasks,
嗯,就像嗯,分子的产生和优化。
uh, like uh, molecule generation and optimization.
嗯,这是我最后的图形级任务
So, um, and the last graph-level task that I
我想说的是一个现实的,呃,基于物理的模拟。
want to talk about is a realistic, uh, physics-based simulation.
在这种情况下,我们基本上可以使用不同的材料。
In this case, we can basically have different materials.
我们将材料表示为一组粒子,然后
We represent the material as a set of particles and then
我们可以在上面定义一个图
we can have a graph defined on top of,
嗯,这些um,一组粒子,捕获了哪些粒子彼此相互作用。
uh, these, um, set of particles that capture which particles interact with each other.
现在,机器学习的基本任务是说
And now the underlying task for the machine learning is to say,
预测此图在将来将如何发展。
predict how this graph is going to evolve in the future.
这使我们能够预测这种材料将如何变形。
And this allows us to predict how this material is going to deform.
嗯,让我告诉你这是怎么做的。
Um, so let me tell you how this is done.
完成此操作的方式实质上是我们进行迭代,
The way this is done is that essentially we iterate,
嗯,下面的方法。
um, the following approach.
我们采用材料并将其表示为一组粒子。
We take the material and we represent it as a set of particles.
根据邻近度,
Based on the proximities,
粒子之间的相互作用
interactions between the particles,
我们生成了接近图。
we generated the proximity graph.
现在,我们有了这个,接近图,
Now, that we have this, uh, proximity graph,
我们应用图机器学习,
we apply graph machine learning,
图神经网络
a graph neural network,
具有当前属性,即位置,
that takes the current properties, meaning positions,
以及粒子的速度,并预测会是什么,
as well as velocities of the particles and predict what will be the,
嗯,未来的粒子位置和速度。
uh, positions and velocities of the particles in the future.
现在,基于此预测,我们可以进行以下操作:
And now based on this prediction, we can move,
将粒子进化到新位置,然后再次
evolve the particles to their new positions, and then again,
我们进入第一步,现在基于这个新的邻近点,
we go to the first step where now based on this new proximities,
我们创建新图,
we create the new graph,
预测新职位,
predict the new positions,
嗯,移动粒子并保持迭代。
um, move the particles and keep iterating this.
这样就可以进行非常快速,非常准确的基于物理的仿真。
And this allows for very fast and very accurate physics-based simulations.
这些是图表的例子
So these were some examples of graph,
在图形级任务中,
uh, of a graph-level tasks and, uh,
图机器学习在各个领域的重要应用,
important applications of graph machine learning to various domains, um,
横跨,呃,横跨,呃,
across, uh, across, uh,
科学,工业以及不同的消费产品。
sciences, industry, as well as different consumer products.
