Graph as Matrix: PageRank, Random Walks and Embeddings

本节将进行图分析和从矩阵角度学习。将图表示成矩阵,我们可以:

  • 通过 random walk (PageRank),决定节点重要性;
  • 通过 matrix factorization (MF),获得节点的嵌入;
  • 通过其他节点嵌入(如Node2Vec) 作为MF

Random Walk、matrix factorization和节点嵌入密切相关。

PageRank (aka the Google Algorithm)

aka:abbr. 又名,亦称(also known as)
例子:Web as a graph,Node = web pages,Edges = hyperlinks
Web是有向图,所有的web pages不是同样的重要的,在web-graph节点连接中有巨大的多样性,所以在web graph中使用pages排名。我们将介绍以下链接分析方法来计算图中节点的重要性:

  • PageRank
  • Personalized PageRank(PPR)
  • Random Walk with Restarts

当有更多的链接时,页面会更重要。来自重要页面的投票价值更高,每个链接的投票数与其源页面的重要性成正比。若重要性为04 Link Analysis PageRank - 图1的页面04 Link Analysis PageRank - 图204 Link Analysis PageRank - 图3个出去的链接,每个链接有04 Link Analysis PageRank - 图4个投票,页面04 Link Analysis PageRank - 图5的本身重要性04 Link Analysis PageRank - 图6是其进入链接上的投票总数。
image.png image.png image.png
如果一个页面被其他重要页面指向,则该页面是重要的。定义节点04 Link Analysis PageRank - 图10的rank 04 Link Analysis PageRank - 图1104 Link Analysis PageRank - 图12是节点04 Link Analysis PageRank - 图13的out-degree。
假设页面04 Link Analysis PageRank - 图1404 Link Analysis PageRank - 图15个外部链接,若04 Link Analysis PageRank - 图16,则04 Link Analysis PageRank - 图17,其中04 Link Analysis PageRank - 图18是一随机矩阵,其每列的和为1。
rank向量04 Link Analysis PageRank - 图19:每页一条,04 Link Analysis PageRank - 图20是页面04 Link Analysis PageRank - 图21的重要性分数,04 Link Analysis PageRank - 图22,flow equation可写为:04 Link Analysis PageRank - 图23

例子

image.png

Connect to Random Walk

考虑重要一个随机网页冲浪者:

  • 在任何时候04 Link Analysis PageRank - 图25,冲浪者都在某一个页面04 Link Analysis PageRank - 图26上;
  • 04 Link Analysis PageRank - 图27,冲浪者遵循来自04 Link Analysis PageRank - 图28的均匀随机的外部链接(out-links);
  • 从页面04 Link Analysis PageRank - 图29开始,到最后出现在链接的某个页面04 Link Analysis PageRank - 图30上;
  • 这个过程无限期地重复。

设:

  • 04 Link Analysis PageRank - 图31是一个向量,其第04 Link Analysis PageRank - 图32个坐标表示冲浪者在04 Link Analysis PageRank - 图33时刻在页面04 Link Analysis PageRank - 图34上的概率;
  • 因此04 Link Analysis PageRank - 图35是页面上的概率分布。

image.png image.png
冲浪者在04 Link Analysis PageRank - 图38时刻,随机一致地跟随一个链接的概率04 Link Analysis PageRank - 图39。假设随机游走达到一个状态04 Link Analysis PageRank - 图40,则04 Link Analysis PageRank - 图41是一个随机游走的平稳分布。我们原始的rank向量04 Link Analysis PageRank - 图42满足:04 Link Analysis PageRank - 图43,所以04 Link Analysis PageRank - 图44是随机游走的平稳分布。
04 Link Analysis PageRank - 图45是无向图的邻接矩阵,邻接矩阵的特征向量满足04 Link Analysis PageRank - 图46,其中04 Link Analysis PageRank - 图47是特征向量,04 Link Analysis PageRank - 图48是特征值。
image.png
由此,rank向量04 Link Analysis PageRank - 图50是随机邻接矩阵04 Link Analysis PageRank - 图51的一个特征向量,对应的特征值为1。
从任一向量04 Link Analysis PageRank - 图52开始,极限04 Link Analysis PageRank - 图53是冲浪者的长期分布。PageRank=极限分布=M的主要特征向量。若04 Link Analysis PageRank - 图5404 Link Analysis PageRank - 图55的极限,则04 Link Analysis PageRank - 图56满足flow equation 04 Link Analysis PageRank - 图57,所以04 Link Analysis PageRank - 图58是特征值为1的矩阵04 Link Analysis PageRank - 图59的主要特征向量。
接下来,我们可以用幂迭代方法来有效解决04 Link Analysis PageRank - 图60

PageRank: Summary

  • 使用web的链接结构来衡量图中节点的重要性;
  • 使用随机邻接矩阵04 Link Analysis PageRank - 图61对随机上网页冲浪者进行建模;
  • PageRank解决了04 Link Analysis PageRank - 图62问题,04 Link Analysis PageRank - 图63可以看作是04 Link Analysis PageRank - 图64的主特征向量,也可以看作是图上随机游走的平稳分布。

    PageRank: How to solve?

    给定节点04 Link Analysis PageRank - 图65的一个图,我们使用迭代方法:

  • 为每个节点分配一个初始页面排名;

  • 计算每个节点的页面排名04 Link Analysis PageRank - 图66%7D%3D%5Csum%7Bi%5Crightarrow%20j%7D%5Cfrac%7Br_i%5E%7B(t)%7D%7D%7Bd_i%7D%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-72%22%20d%3D%22M21%20287Q22%20290%2023%20295T28%20317T38%20348T53%20381T73%20411T99%20433T132%20442Q161%20442%20183%20430T214%20408T225%20388Q227%20382%20228%20382T236%20389Q284%20441%20347%20441H350Q398%20441%20422%20400Q430%20381%20430%20363Q430%20333%20417%20315T391%20292T366%20288Q346%20288%20334%20299T322%20328Q322%20376%20378%20392Q356%20405%20342%20405Q286%20405%20239%20331Q229%20315%20224%20298T190%20165Q156%2025%20151%2016Q138%20-11%20108%20-11Q95%20-11%2087%20-5T76%207T74%2017Q74%2030%20114%20189T154%20366Q154%20405%20128%20405Q107%20405%2092%20377T68%20316T57%20280Q55%20278%2041%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-74%22%20d%3D%22M26%20385Q19%20392%2019%20395Q19%20399%2022%20411T27%20425Q29%20430%2036%20430T87%20431H140L159%20511Q162%20522%20166%20540T173%20566T179%20586T187%20603T197%20615T211%20624T229%20626Q247%20625%20254%20615T261%20596Q261%20589%20252%20549T232%20470L222%20433Q222%20431%20272%20431H323Q330%20424%20330%20420Q330%20398%20317%20385H210L174%20240Q135%2080%20135%2068Q135%2026%20162%2026Q197%2026%20230%2060T283%20144Q285%20150%20288%20151T303%20153H307Q322%20153%20322%20145Q322%20142%20319%20133Q314%20117%20301%2095T267%2048T216%206T155%20-11Q125%20-11%2098%204T59%2056Q57%2064%2057%2083V101L92%20241Q127%20382%20128%20383Q128%20385%2077%20385H26Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-2B%22%20d%3D%22M56%20237T56%20250T70%20270H369V420L370%20570Q380%20583%20389%20583Q402%20583%20409%20568V270H707Q722%20262%20722%20250T707%20230H409V-68Q401%20-82%20391%20-82H389H387Q375%20-82%20369%20-68V230H70Q56%20237%2056%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-31%22%20d%3D%22M213%20578L200%20573Q186%20568%20160%20563T102%20556H83V602H102Q149%20604%20189%20617T245%20641T273%20663Q275%20666%20285%20666Q294%20666%20302%20660V361L303%2061Q310%2054%20315%2052T339%2048T401%2046H427V0H416Q395%203%20257%203Q121%203%20100%200H88V46H114Q136%2046%20152%2046T177%2047T193%2050T201%2052T207%2057T213%2061V578Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-6A%22%20d%3D%22M297%20596Q297%20627%20318%20644T361%20661Q378%20661%20389%20651T403%20623Q403%20595%20384%20576T340%20557Q322%20557%20310%20567T297%20596ZM288%20376Q288%20405%20262%20405Q240%20405%20220%20393T185%20362T161%20325T144%20293L137%20279Q135%20278%20121%20278H107Q101%20284%20101%20286T105%20299Q126%20348%20164%20391T252%20441Q253%20441%20260%20441T272%20442Q296%20441%20316%20432Q341%20418%20354%20401T367%20348V332L318%20133Q267%20-67%20264%20-75Q246%20-125%20194%20-164T75%20-204Q25%20-204%207%20-183T-12%20-137Q-12%20-110%207%20-91T53%20-71Q70%20-71%2082%20-81T95%20-112Q95%20-148%2063%20-167Q69%20-168%2077%20-168Q111%20-168%20139%20-140T182%20-74L193%20-32Q204%2011%20219%2072T251%20197T278%20308T289%20365Q289%20372%20288%20376Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-3D%22%20d%3D%22M56%20347Q56%20360%2070%20367H707Q722%20359%20722%20347Q722%20336%20708%20328L390%20327H72Q56%20332%2056%20347ZM56%20153Q56%20168%2072%20173H708Q722%20163%20722%20153Q722%20140%20707%20133H70Q56%20140%2056%20153Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJSZ2-2211%22%20d%3D%22M60%20948Q63%20950%20665%20950H1267L1325%20815Q1384%20677%201388%20669H1348L1341%20683Q1320%20724%201285%20761Q1235%20809%201174%20838T1033%20881T882%20898T699%20902H574H543H251L259%20891Q722%20258%20724%20252Q725%20250%20724%20246Q721%20243%20460%20-56L196%20-356Q196%20-357%20407%20-357Q459%20-357%20548%20-357T676%20-358Q812%20-358%20896%20-353T1063%20-332T1204%20-283T1307%20-196Q1328%20-170%201348%20-124H1388Q1388%20-125%201381%20-145T1356%20-210T1325%20-294L1267%20-449L666%20-450Q64%20-450%2061%20-448Q55%20-446%2055%20-439Q55%20-437%2057%20-433L590%20177Q590%20178%20557%20222T452%20366T322%20544L56%20909L55%20924Q55%20945%2060%20948Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-69%22%20d%3D%22M184%20600Q184%20624%20203%20642T247%20661Q265%20661%20277%20649T290%20619Q290%20596%20270%20577T226%20557Q211%20557%20198%20567T184%20600ZM21%20287Q21%20295%2030%20318T54%20369T98%20420T158%20442Q197%20442%20223%20419T250%20357Q250%20340%20236%20301T196%20196T154%2083Q149%2061%20149%2051Q149%2026%20166%2026Q175%2026%20185%2029T208%2043T235%2078T260%20137Q263%20149%20265%20151T282%20153Q302%20153%20302%20143Q302%20135%20293%20112T268%2061T223%2011T161%20-11Q129%20-11%20102%2010T74%2074Q74%2091%2079%20106T122%20220Q160%20321%20166%20341T173%20380Q173%20404%20156%20404H154Q124%20404%2099%20371T61%20287Q60%20286%2059%20284T58%20281T56%20279T53%20278T49%20278T41%20278H27Q21%20284%2021%20287Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-2192%22%20d%3D%22M56%20237T56%20250T70%20270H835Q719%20357%20692%20493Q692%20494%20692%20496T691%20499Q691%20511%20708%20511H711Q720%20511%20723%20510T729%20506T732%20497T735%20481T743%20456Q765%20389%20816%20336T935%20261Q944%20258%20944%20250Q944%20244%20939%20241T915%20231T877%20212Q836%20186%20806%20152T761%2085T740%2035T732%204Q730%20-6%20727%20-8T711%20-11Q691%20-11%20691%200Q691%207%20696%2025Q728%20151%20835%20230H70Q56%20237%2056%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMATHI-64%22%20d%3D%22M366%20683Q367%20683%20438%20688T511%20694Q523%20694%20523%20686Q523%20679%20450%20384T375%2083T374%2068Q374%2026%20402%2026Q411%2027%20422%2035Q443%2055%20463%20131Q469%20151%20473%20152Q475%20153%20483%20153H487H491Q506%20153%20506%20145Q506%20140%20503%20129Q490%2079%20473%2048T445%208T417%20-8Q409%20-10%20393%20-10Q359%20-10%20336%205T306%2036L300%2051Q299%2052%20296%2050Q294%2048%20292%2046Q233%20-10%20172%20-10Q117%20-10%2075%2030T33%20157Q33%20205%2053%20255T101%20341Q148%20398%20195%20420T280%20442Q336%20442%20364%20400Q369%20394%20369%20396Q370%20400%20396%20505T424%20616Q424%20629%20417%20632T378%20637H357Q351%20643%20351%20645T353%20664Q358%20683%20366%20683ZM352%20326Q329%20405%20277%20405Q242%20405%20210%20374T160%20293Q131%20214%20119%20129Q119%20126%20119%20118T118%20106Q118%2061%20136%2044T179%2026Q233%2026%20290%2098L298%20109L352%20326Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-72%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(451%2C521)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%22389%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-2B%22%20x%3D%22751%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-31%22%20x%3D%221529%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%222030%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-6A%22%20x%3D%22638%22%20y%3D%22-430%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-3D%22%20x%3D%222540%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(3596%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJSZ2-2211%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(100%2C-1086)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-2192%22%20x%3D%22345%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-6A%22%20x%3D%221346%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3Cg%20transform%3D%22translate(5040%2C0)%22%3E%0A%3Cg%20transform%3D%22translate(286%2C0)%22%3E%0A%3Crect%20stroke%3D%22none%22%20width%3D%221477%22%20height%3D%2260%22%20x%3D%220%22%20y%3D%22220%22%3E%3C%2Frect%3E%0A%3Cg%20transform%3D%22translate(60%2C831)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-72%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3Cg%20transform%3D%22translate(451%2C521)%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-74%22%20x%3D%22389%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%20x%3D%22751%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%22638%22%20y%3D%22-430%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20transform%3D%22translate(306%2C-715)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMATHI-64%22%20x%3D%220%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMATHI-69%22%20x%3D%22736%22%20y%3D%22-213%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E#card=math&code=r_j%5E%7B%28t%2B1%29%7D%3D%5Csum%7Bi%5Crightarrow%20j%7D%5Cfrac%7Bri%5E%7B%28t%29%7D%7D%7Bd_i%7D&id=yAmUr),04 Link Analysis PageRank - 图67是节点04 Link Analysis PageRank - 图68的出度,重复至聚合:![](https://cdn.nlark.com/yuque/__latex/6391dba51788b3c16bf85b84b21bd4e2.svg#card=math&code=%5Csum%7Bi%7D%7Cr_i%5E%7Bt%2B1%7D-r_i%5Et%7C%3C%5Cepsilon&id=Eqq2P)。

    幂迭代方法

    给定一个有04 Link Analysis PageRank - 图69个节点的网络图,其中节点是页面,边是超链接
    幂迭代:一种简单的迭代方案,

  • 初始化:04 Link Analysis PageRank - 图70

  • 迭代:04 Link Analysis PageRank - 图71
  • 04 Link Analysis PageRank - 图72时,停止。

04 Link Analysis PageRank - 图7304 Link Analysis PageRank - 图74范数。大约50次迭代就足以估计极限解。
image.png
04 Link Analysis PageRank - 图76
04 Link Analysis PageRank - 图77,等价地,04 Link Analysis PageRank - 图78
三个question:

  1. 会聚合吗?
  2. 会聚合到我们期望的结果吗?
  3. 这些结果合理吗?

两个problem:

  1. 有些页面是死胡同,没有外部链接,这样的页面会导致重要性“泄露”;
  2. 蜘蛛陷阱:所有外部链接都在组内,最终蜘蛛陷阱吸收了所有的重要性。

Problem1 蜘蛛陷阱:
image.png

Iteration 0 1 2 3
r_a 1 0 0 0
r_b 0 1 1 1

Problem2 死胡同:
image.png

Iteration 0 1 2 3
r_a 1 0 0 0
r_b 0 1 0 0

Solution 蜘蛛陷阱:
在每个时间步,随机冲浪者有两个选择:

  1. 以概率04 Link Analysis PageRank - 图81随机跟随一个链接;
  2. 以概率04 Link Analysis PageRank - 图82随机跳到一个页面;

04 Link Analysis PageRank - 图83的常用值在0.8到0.9之间。
冲浪者将在几个时间步内传送出蜘蛛陷阱。
image.png

Solution 死胡同:
以1.0的总概率从死胡同跟着随机传送链接。
image.png
为什么死胡同和蜘蛛陷阱是个problem,为什么远程传送解决了这个问题?

  • 蜘蛛陷阱不是problem,但有了陷阱,PageRank分数并不是我们想要的。解决方案:永远不要被困在蜘蛛陷阱中,通过有限的步数从陷阱中传送出来。
  • 死胡同是个problem,矩阵不是列随机的,因此不符合我们的初始假设。解决方案:当无处可去时,总是通过远程传送使矩阵列随机。

    Solution 随机传送

    谷歌的解决方案做到了这一切,在每一步中,随机冲浪者都有两个选项:
  1. 以概率04 Link Analysis PageRank - 图86随机跟随一个链接;
  2. 以概率04 Link Analysis PageRank - 图87随机跳到一个页面;

image.png
这个公式假设04 Link Analysis PageRank - 图89没有死胡同,我们可以对矩阵04 Link Analysis PageRank - 图90进行预处理,移除所有死胡同或明确遵循概率为1.0的随机传送链接。
谷歌矩阵04 Link Analysis PageRank - 图9104 Link Analysis PageRank - 图9204 Link Analysis PageRank - 图93的矩阵,每个元素都是04 Link Analysis PageRank - 图94
我们有一个递归问题:04 Link Analysis PageRank - 图95,幂迭代方法仍然有效。
实践中04 Link Analysis PageRank - 图96(平均跳5步)。
随机传送(04 Link Analysis PageRank - 图97)示例
image.png
04 Link Analysis PageRank - 图99
04 Link Analysis PageRank - 图100
04 Link Analysis PageRank - 图101
image.png

Sovling PageRank: Summary

  • PageRank解决了04 Link Analysis PageRank - 图103,通过幂迭代,能够有效计算随机邻接矩阵04 Link Analysis PageRank - 图104
  • 添加随机均匀传送解决了死胡同和蜘蛛陷阱的问题。

    Random Walk with Restarts and Personalized PageRank

    推荐例子

    给定一个表示用户和物品交互(例如购买)的二分图,目标是图上的接近度(Proximity),我们应该向与物品04 Link Analysis PageRank - 图105交互的用户推荐哪些物品?
    直觉上,如果物品04 Link Analysis PageRank - 图106和物品04 Link Analysis PageRank - 图107被相似的用户交互,当用户与04 Link Analysis PageRank - 图108交互时,推荐物品04 Link Analysis PageRank - 图109
    image.png
    哪个与A,A’或B,B’更相关?
    image.png
    A,A’,B,B’或C,C’哪个更相关?
    image.png

    图上的接近度 proximity

  • PageRank:根据节点重要性排序,以相同的概率传送到网络中的任何节点;

  • Personalized PageRank:将节点04 Link Analysis PageRank - 图113与传送节点的接近程度排序;
  • Proximity on graphs:与物品04 Link Analysis PageRank - 图114最相关的是什么?重启的随机游走,传送回起始节点,04 Link Analysis PageRank - 图115

    随机游走

  1. 每个节点都有一定的重要性;
  2. 重要性在所有边上平均分配,并推送到相邻的边上:

给定一个QUERY_NODES集合,模拟一个随机游走:

  • 向随机邻居迈出一步,并记录访问(访问次数);
  • 使用概率ALPHA,从QUERY_NODES集中的某个节点重新开始行走;
  • 访问次数最高的节点与查询节点的接近度最高。

image.png
为什么这是一个好的解法?因为“相似性”考虑了:

  • 多连接;
  • 多路径;
  • 有向和无向连接;
  • 节点度。

    Summary: Page Rank Variants

    PageRank:

  • 传送到任何节点

  • 节点可以具有相同的冲浪者着陆概率:04 Link Analysis PageRank - 图117

Topic-Specific PageRank aka Personalized PageRank:

  • 传送到一组特定的节点
  • 节点可以有不同的冲浪者着陆概率:04 Link Analysis PageRank - 图118

Random Walk with Restarts:

  • Topic-Specific PageRank,其中总是传送到相同的节点04 Link Analysis PageRank - 图119

图可自然地表示为矩阵,在图上我们定义了一个随机游走过程:

  • 随机冲浪者通过链接和随机传送移动;
  • 随机邻接矩阵04 Link Analysis PageRank - 图120

PageRank=冲浪者位置的极限分布,表示节点的重要性,对应于变换后的邻接矩阵04 Link Analysis PageRank - 图121的主要特征向量。

Matrix Factorization and Node Embeddings

image.png
最简单的节点相似性:若节点04 Link Analysis PageRank - 图123有一条边连接,则其相似。这意味着04 Link Analysis PageRank - 图124,因此04 Link Analysis PageRank - 图125
image.png

Matrix Factorization

  • 嵌入维度04 Link Analysis PageRank - 图12704 Link Analysis PageRank - 图128中的行数) 远小于节点数04 Link Analysis PageRank - 图129
  • 精确分解04 Link Analysis PageRank - 图130一般是不可能的;
  • 但我们可以学习近似04 Link Analysis PageRank - 图131:目标函数04 Link Analysis PageRank - 图132,优化04 Link Analysis PageRank - 图133也即最小化04 Link Analysis PageRank - 图134的L2范数。(在lecture3中使用softmax而不是L2,但目标都是用04 Link Analysis PageRank - 图135来近似04 Link Analysis PageRank - 图136
  • 结论:用边连通度定义节点相似性的内积解码器,等价于矩阵分解04 Link Analysis PageRank - 图137

    Random Walk-based Similarity

    基于随机游走,DeepWalk和node2vec有一个更复杂的的节点相似性定义。DeepWalk的矩阵分解,相当于以下复杂的矩阵表达式:Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec, WSDM 18
    04 Link Analysis PageRank - 图138
    image.png
    Node2vec也可以表示为矩阵分解(尽管是更复杂的矩阵)。

    Limitation

    Limitation1: 基于矩阵分解和随机游走的节点嵌入的局限性,在训练集中不能获取到节点嵌入。
    image.png
    Limitation2: DeepWalk和node2vec不能捕获到结构相似性。
    image.png
    节点1和11在结构上相似,是三角形的一部分,度为2。但是,节点1和11有不同的嵌入,随机游走不太可能从节点1到达节点11。
    Limitation3: 不能使用节点、度和图特征。
    image.png
    这些限制的解决方案:深度表征学习和图神经网络。

    Summary

    PageRank

  • 图中测量节点重要性

  • 通过幂迭代邻接矩阵,能够有效计算节点重要性

Personalized PageRank(PPR)

  • 通过特殊节点或节点集合来测量节点重要性
  • 通过随机游走,能够有效计算节点重要性

Node embeddings

  • 基于随机游走的节点嵌入,可以表示为矩阵分解

将图视为矩阵在所有上述算法中起着关键作用!