简介
1.数据获取和清洗
- 1.1 清洗高校名单数据
用pandas 打开
先把备注中所有的Nan全部替换为公办
删除含Nan的空行
学校表示码转换数据类型成int64
保存文件
看看长啥样
- 1.3 json数据清洗
2.生成图谱
3.本地部署
4.下游任务

文档作者：Armor
资料参考：http://www.openkg.cn/dataset/2020

简介

构造知识图谱是一个复杂的系统工程。其构造和实现方法并不唯一，尚未存在固定的范式。
在算法上不考虑知识图谱的实体抽取、关系抽取、知识消融和嵌入算法。
在数据上不考虑非结构化数据，仅通过百度百科爬取半结构化的数据进行数据源获取。
所以本Demo是假设在已具备数据质量优良的前提下，把数据从Neo4j或其他的图数据库中释放出来，进行web端的可视化展示，并据此开发一些基本功能或下游任务。
参考openKG的开源项目，进行一定程度的修改和适配。
目的有二：

一是作为CQUSTKG的小型的知识图谱展示Demo
二是提供一个简单的知识图谱可视化开发流程

1.数据获取和清洗

从政府公开信息获得全国普通高等学校名单文件获取点我
截至2020年6月30日，全国高等学校共计3005所，其中：普通高等学校2740所，含本科院校1258所、高职（专科）院校1482所；

1.1 清洗高校名单数据

在清洗之前，先打开excel文件删除头两行，再用python代码进行清洗 ```python import numpy as np import pandas as pd

用pandas 打开

df = pd.read_excel(“全国高等学校名单.xls”)

先把备注中所有的Nan全部替换为公办

df[“备注”].fillna(“公办”,inplace=True)

删除含Nan的空行

df.dropna(axis=0,inplace=True)

学校表示码转换数据类型成int64

df[“学校标识码”] = df[“学校标识码”].astype(np.int64)

保存文件

df.to_csv(“School_List_2020.csv”,index=False)

看看长啥样

print(df.shape) df.head()

![image.png](https://cdn.nlark.com/yuque/0/2021/png/2655886/1617184909988-0531a5e6-fccb-47a7-8896-0924ae11a03b.png#align=left&display=inline&height=194&margin=%5Bobject%20Object%5D&name=image.png&originHeight=194&originWidth=529&size=15471&status=done&style=shadow&width=529)<br />🚀OK！得到一个干净的csv文件，接下来进行爬虫。
<a name="E6mWb"></a>
## 1.2 爬虫获取json数据
可以看到高校的百度百科的构成是 `https://baike.baidu.com/item/` + `"高校名称"` <br />所以爬虫遍历的urls链接可以用上面的csv文件来构造:
```python
df = pd.read_csv("School_List_2020.csv")
urls = []
for i in df["学校名称"]:
    url = "https://baike.baidu.com/item/" + str(i)
    urls.append(url)

报错缺什么自己pip ，爬虫完整代码:

import requests
import json
import time
from tqdm import tqdm
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
# 计时
def run_time(start_time):
    current_time = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime())
    print(f"当前时间:{current_time}")
    print("耗时:%.3f sec" %(time.time()-start_time))
# get url 并获取网页内容
def url_open(url):
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'}
    r = requests.get(url, headers = headers)
    return r 
# get the school list
def school_list(filename):
    schools = []
    df = pd.read_csv(filename)
    for i in df["学校名称"]:
        schools.append(i)
    return schools
if __name__ == "__main__":
    school = school_list("School_List_2020.csv")
    # print(school)
    result_data = []
    start_time = time.time()
    for index in tqdm(school):
        url = 'https://baike.baidu.com/item/' + index 
        print(url)
        data = url_open(url)
        soup = BeautifulSoup(data.content, 'html.parser', from_encoding='utf-8')
        name_data = []
        value_data = []
        name_node = soup.find_all('dt', class_='basicInfo-item name')
        # print(name_node)
        for i in range(len(name_node)):
            name_data.append(name_node[i].get_text().replace('\xa0', ''))
            # name_data.append(name_node[i].get_text())
            # print(name_data)
        value_node = soup.find_all('dd', class_='basicInfo-item value')
        for i in range(len(value_node)):
            value_data.append(value_node[i].get_text().replace('\n', ''))
            # print(type(value_node[i].get_text().replace('\n', '')))
            # print(value_node[i].get_text().replace('\n', ''))
            # print(value_data)
            # print(type(value_data))
        result = {'中文名': '无信息', '英文名': '无信息', '简称':'无信息','创办时间': '无信息', '类型': '综合', '主管部门': '无信息'}
        for i in range(len(name_data)):
            if name_data[i] == '中文名':
                result['中文名'] = value_data[i]
            if name_data[i] in ['英文名','外文名']:
                result['英文名'] = value_data[i]
            if name_data[i] == '简称':
                result['简称'] = value_data[i]
            if name_data[i] == '创办时间':
                result['创办时间'] = value_data[i]
            if name_data[i] == '类型':
                result['类型'] = value_data[i]
            if name_data[i] == '主管部门':
                result['主管部门'] = value_data[i]
        result_data.append({'中文名': result['中文名'], '英文名': result['英文名'], '简称': result['简称'], '创办时间': result['创办时间'], '类型': result['类型'], '主管部门': result['主管部门']})
        # print('reading the website...')
        # print(result_data)
    fw = open('all.json', 'w', encoding='utf-8')
    fw.write(json.dumps(result_data, ensure_ascii=False))
    fw.close()
    print('complete!')
    run_time(start_time)

预计等候15分钟左右

1.3 json数据清洗

json数据清洗分为两步骤：特征提取、linknode构造。
特征提取：主要是把节点属性提取为一个一个的txt文本，方便后续构造node-link-node的三元组形式。 ```python import json

with open(‘./spider/all.json’, ‘r’, encoding=’utf-8’) as fr: str_data = fr.read() full_data = json.loads(str_data) # json 解码 fw1 = open(‘./dataprocess/Name.txt’, ‘w’, encoding=’utf-8’) # 名称list fw2 = open(‘./dataprocess/English.txt’, ‘w’, encoding=’utf-8’) # 英文名list fw3 = open(‘./dataprocess/Abbr.txt’, ‘w’, encoding=’utf-8’) # 简称list fw4 = open(‘./dataprocess/Time.txt’, ‘w’, encoding=’utf-8’) # 创办时间list fw5 = open(‘./dataprocess/Type.txt’, ‘w’, encoding=’utf-8’) # 类型list fw6 = open(‘./dataprocess/Admin.txt’, ‘w’, encoding=’utf-8’) # 主管部门list

for i in range(len(full_data)):
    # 傻瓜式遍历
    for key, value in full_data[i].items():
        if key == '中文名':
            fw1.write("{'中文名': '" + value +"'}\n")
        if key == '英文名':
            fw2.write("{'英文名': '" + value +"'}\n")
        if key == '简称':
            fw3.write("{'简称': '" + value +"'}\n")
        if key == '创办时间':
            # fw4.write("{'创办时间': '" + value[0:4] +"年'}\n")
            fw4.write("{'创办时间': '" + value +"'}\n")
        if key == '类型':
            fw5.write("{'类型': '" + value +"'}\n")
        if key == '主管部门':
            fw6.write("{'主管部门': '" + value +"'}\n")

fw1.close() fw2.close() fw3.close() fw4.close() fw5.close() fw6.close()


- linknode构造
```python
import json 
import csv 
nodes = []
links = []
name_list = []
english_list = []
abbr_list = []
time_list = []
type_list = []
admin_list = []
# english2_list = []
# time2_list = []
# abbr2_list = []
# central node
nodes.append({'id': '大学', 'class': 'university', 'group': 0, 'size': 22})
# type node
fr = open('./dataprocess/Type.txt', 'r', encoding='utf-8')
for line in fr.readlines():
    tmp = line.strip('\n')
    for key, value in eval(tmp).items():
        if value not in type_list:
            type_list.append(value)
            nodes.append({'id': value, 'class': 'type', 'group': 5,  'size': 18})
            links.append({'source': '大学', 'target': value, 'value': 3})
            links.append({'source': value, 'target': '大学', 'value': 3})
fr.close()
# english node
fr = open('./dataprocess/English.txt', 'r', encoding='utf-8')
for line in fr.readlines():
    tmp = line.strip('\n')
    for key, value in eval(tmp).items():
        if value not in english_list:
            english_list.append(value)
            nodes.append({'id': value, 'class': 'english', 'group': 2,  'size': 15})
fr.close()
# abbr node
fr = open('./dataprocess/Abbr.txt', 'r', encoding='utf-8')
for line in fr.readlines():
    tmp = line.strip('\n')
    for key, value in eval(tmp).items():
        if value not in abbr_list:
            abbr_list.append(value)
            nodes.append({'id': value, 'class': 'abbr', 'group': 3,  'size': 15})
fr.close()
# time node
fr = open('./dataprocess/Time.txt', 'r', encoding='utf-8')
for line in fr.readlines():
    tmp = line.strip('\n')
    for key, value in eval(tmp).items():
        if value not in time_list:
            time_list.append(value)
            nodes.append({'id': value, 'class': 'time', 'group': 4,  'size': 11})
fr.close()
# admin node
fr = open('./dataprocess/Admin.txt', 'r', encoding='utf-8')
for line in fr.readlines():
    tmp = line.strip('\n')
    for key, value in eval(tmp).items():
        if value not in admin_list:
            admin_list.append(value)
            nodes.append({'id': value, 'class': 'admin', 'group': 6,  'size': 11})
fr.close()
# # english2 node
# fr = open('./dataprocess/English.txt', 'r', encoding='utf-8')
# for line in fr.readlines():
#     tmp = line.strip('\n')
#     for key, value in eval(tmp).items():
#         if value not in english2_list:
#             english2_list.append(value)
#             nodes.append({'id': value, 'class': 'english2', 'group': 7,  'size': 13})
# fr.close()
# # abbr2 node
# fr = open('./dataprocess/Abbr.txt', 'r', encoding='utf-8')
# for line in fr.readlines():
#     tmp = line.strip('\n')
#     for key, value in eval(tmp).items():
#         if value not in abbr2_list:
#             abbr2_list.append(value)
#             nodes.append({'id': value, 'class': 'abbr2', 'group': 8,  'size': 13})
# fr.close()
# # time2 node
# fr = open('./dataprocess/Time.txt', 'r', encoding='utf-8')
# for line in fr.readlines():
#     tmp = line.strip('\n')
#     for key, value in eval(tmp).items():
#         if value not in time2_list:
#             time2_list.append(value)
#             nodes.append({'id': value, 'class': 'time2', 'group': 9,  'size': 13})
# fr.close()
with open('./spider/all.json', 'r', encoding='utf-8') as fr:
    str_data = fr.read()
    full_data = json.loads(str_data)
    for i in range(len(full_data)):
        # for key, value in full_data[i].items():
        # name node
        nodes.append({'id': full_data[i]['中文名'], 'class': 'names', 'group': 1, 'size': 20})
        links.append({'source': full_data[i]['类型'], 'target': full_data[i]['中文名'], 'value': 3})
        links.append({'source': full_data[i]['中文名'], 'target': full_data[i]['类型'], 'value': 3})
        # english node
        links.append({'source': full_data[i]['中文名'], 'target': full_data[i]['英文名'], 'value': 3})
        links.append({'source': full_data[i]['英文名'], 'target': full_data[i]['中文名'], 'value': 3})
        # abbr node
        links.append({'source': full_data[i]['中文名'], 'target': full_data[i]['简称'], 'value': 3})
        links.append({'source': full_data[i]['简称'], 'target': full_data[i]['中文名'], 'value': 3})
        # time node
        links.append({'source': full_data[i]['简称'], 'target': full_data[i]['创办时间'], 'value': 3})
        links.append({'source': full_data[i]['创办时间'], 'target': full_data[i]['简称'], 'value': 3})
        # admin node
        links.append({'source': full_data[i]['简称'], 'target': full_data[i]['主管部门'], 'value': 3})
        links.append({'source': full_data[i]['主管部门'], 'target': full_data[i]['简称'], 'value': 3})
fw = open('./nodes.json', 'w', encoding='utf-8')
fw.write(json.dumps({'nodes': nodes, 'links': links}, ensure_ascii=False))
fw.close()

经过一系列处理得到了含node和link信息的 nodes.json 接下来我们通过D3.js进行图谱可视化生成。

2.生成图谱

图谱生成可以用Echarts或者D3.js，Echarts简单易上手但定制不够灵活，D3.js灵活可定制但难上手。根据参考资料，先用D3.js实现整体图谱的展示，并做适当修改美化。

关于力项导图可参考：https://blog.csdn.net/tengxing007/article/details/59712572

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8"/>
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>2020年中国普通高等学校图谱可视化</title>
    <meta name="description" content=""/>
    <meta name="keywords" content=""/>
    <meta name="author" content=""/>
    <link rel="shortcut icon" href="">
    <script src="http://cdn.bootcss.com/jquery/2.1.4/jquery.min.js"></script>
    <link href="http://cdn.bootcss.com/bootstrap/3.3.4/css/bootstrap.min.css" rel="stylesheet">
    <script src="http://cdn.bootcss.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
    <script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
</head>
<style>
    body {
        background-color: #333333;
        padding: 30px 40px;
        text-align: center;
        font-family: OpenSans-Light, PingFang SC, Hiragino Sans GB, Microsoft Yahei, Microsoft Jhenghei, sans-serif;
    }
    .links line {
        stroke: rgb(240, 240, 240);
        stroke-opacity: 0.8;
    }
    .links line.inactive {
        /*display: none !important;*/
        stroke-opacity: 0;
    }
    .nodes circle {
        stroke: #fff;
        stroke-width: 1.5px;
    }
    .nodes circle:hover {
        cursor: pointer;
    }
    .nodes circle.inactive {
        display: none !important;
    }
    .texts text {
        display: none;
    }
    .texts text:hover {
        cursor: pointer;
    }
    .texts text.inactive {
        display: none !important;
    }
    #indicator {
        position: absolute;
        left: 45px;
        bottom: 50px;
        text-align: left;
        color: #f2f2f2;
        font-size: 20px;
    }
    #indicator > div {
        margin-bottom: 4px;
    }
    #indicator span {
        display: inline-block;
        width: 30px;
        height: 14px;
        position: relative;
        top: 2px;
        margin-right: 8px;
    }
    #mode {
        position: absolute;
        top: 60px;
        left: 45px;
    }
    #mode span {
        display: inline-block;
        border: 1px solid #fff;
        color: #fff;
        padding: 6px 10px;
        border-radius: 4px;
        font-size: 14px;
        transition: color, background-color .3s;
        -o-transition: color, background-color .3s;
        -ms-transition: color, background-color .3s;
        -moz-transition: color, background-color .3s;
        -webkit-transition: color, background-color .3s;
    }
    #mode span.active, #mode span:hover {
        background-color: #fff;
        color: #333;
        cursor: pointer;
    }
    #info {
        position: absolute;
        bottom: 40px;
        right: 30px;
        text-align: right;
        width: 270px;
    }
    #info p {
        color: #fff;
        font-size: 12px;
        margin-bottom: 5px;
        margin-top: 0px;
    }
    #info p span {
        color: #888;
        margin-right: 10px;
    }
    #search input {
        position: absolute;
        top: 100px;
        left: 45px;
        color: #000;
        border: none;
        outline: none;
        box-shadow: none;
        width: 160px;
        background-color: #FFF;
    }
    #svg2 g.row:hover {
        stroke-width: 1px;
        stroke: #fff;
    }
</style>
<body>
    <h1 style="color: #fff;font-size: 32px;text-align: left;margin-left:40px;">2020年中国普通高等学校知识图谱</h1>
    <div style="text-align: center;position: relative;">
        <svg width="1600" height="1200" style="margin-left: 0px;margin-bottom: 0px;" id="svg1"></svg>
        <div id="indicator"></div>
        <div id="mode">
            <span class="active" style="border-top-right-radius: 0;border-bottom-right-radius: 0; ">图形</span>
            <span style="border-top-left-radius: 0;border-bottom-left-radius: 0; position: relative;left: -5px;">文字</span>
        </div>
        <div id="search">
            <input type="text" class="form-control">
        </div>
    </div>
    <div style="text-align: center;position: relative;"></div>
    <div id="info">
        <h4></h4>
    </div>
    <!-- <div id="main" style="width: 600px;height:400px;"></div> -->
</body>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script>
    $(document).ready(function () {
        var svg = d3.select("#svg1"), width = svg.attr('width'), height = svg.attr('height');
        var names = ['大学', '中文名','英文名','简称','创办时间','类型','主管部门'];
        var colors = ['#bd0404','#b7d28d', '#b8f1ed', '#ca635f', '#5153ee','#836FFF', '#f0b631'];
        // 图注
        for (var i = 0; i < names.length; i++) {
            $('#indicator').append("<div><span style='background-color: " + colors[i] + "'></span>" + names[i] + "</div>");
        }
        // 相互作用力,定义鼠标拖拽时的效果
        var simulation = d3.forceSimulation()
            //速度衰减因子，相当于摩擦力，0是无摩擦，1是冻结
            .velocityDecay(0.6)
            // α衰变，借用粒子的放射性的概念，指力的模拟经过一定次数后会逐渐停止；
            // 数值范围也是0-1，如果设为1，经过300次迭代后，模拟就会停止；这里我们设为0，会一直进行模拟。
            .alphaDecay(0)
            //连线间的斥力
            .force("link", d3.forceLink().id(function (d) {
                return d.id;
            }))
            //斥力
            .force("charge", d3.forceManyBody())
            //中心力
            .force("center", d3.forceCenter(width / 2, height / 2));
        //导图设置
        var graph;
        d3.json("nodes.json", function (error, data) {
            if (error) throw error;
            graph = data;
            console.log(graph);
            var link = svg.append("g").attr("class", "links").selectAll("line").data(graph.links).enter().append("line").attr("stroke-width", function (d) {
                return 1;
            });
            var node = svg.append("g").attr("class", "nodes").selectAll("circle").data(graph.nodes).enter().append('circle').attr('r', function (d) {
                return d.size; 
            }).attr("fill", function (d) {
                return colors[d.group];
            }).attr("stroke", "none").attr("name", function (d) {
                return d.id;
            }).call(d3.drag().on("start", dragstarted).on("drag", dragged).on("end", dragended));
            var text =
                svg.append("g").attr("class", "texts").selectAll("text").data(graph.nodes).enter().append('text').attr("font-size", function (d) {
                    return d.size;
                }).attr("fill", function (d) {
                    return colors[d.group];
                }).attr("name", function (d) {
                    return d.id;
                }).text(function (d) {
                    return d.id;
                }).attr("text-anchor", 'middle').call(d3.drag().on("start", dragstarted).on("drag", dragged).on("end", dragended));
            var data = svg.append("g").attr("class", "datas").selectAll("text").data(graph.nodes).enter();
            node.append("title").text(function (d) {
                return d.id;
            });
            print = node.append("title").text(function (d) {
                return d.id;
            });
            print.enter().append("text").style("text-anchor", "middle").text(function (d) {
                return d.name;
            });
            simulation
                .nodes(graph.nodes)
                .on("tick", ticked);
            simulation.force("link")
                .links(graph.links);
            //tick函数的作用：由于力导向图是不断运动的，每一时刻都在发生更新，因此，必须不断更新节点和连线的位置。
            //迭代力项导图位置
            function ticked() {
                link
                    .attr("x1", function (d) {
                        return d.source.x;
                    })
                    .attr("y1", function (d) {
                        return d.source.y;
                    })
                    .attr("x2", function (d) {
                        return d.target.x;
                    })
                    .attr("y2", function (d) {
                        return d.target.y;
                    });
                node
                    .attr("cx", function (d) {
                        return d.x;
                    })
                    .attr("cy", function (d) {
                        return d.y;
                    });
                text.attr('transform', function (d) {
                    return 'translate(' + d.x + ',' + (d.y + d.size / 2) + ')';
                });
            }
        });
        //激活导图函数
        var dragging = false;
        // 起始位置
        function dragstarted(d) {
            if (!d3.event.active) simulation.alphaTarget(0.6).restart();
            d.fx = d.x;
            d.fy = d.y;
            dragging = true;
        }
        // 画图
        function dragged(d) {
            d.fx = d3.event.x;
            d.fy = d3.event.y;
        }
        // 结束位置 alphaTarget也代表衰减因子，如果设置为1迭代位置后定死位置
        function dragended(d) {
            if (!d3.event.active) simulation.alphaTarget(0);
            d.fx = null;
            d.fy = null;
            dragging = false;
        }
        // 图像/文字 按钮
        $('#mode span').click(function (event) {
            $('#mode span').removeClass('active');
            $(this).addClass('active');
            if ($(this).text() == '图形') {
                $('.texts text').hide();
                $('.nodes circle').show();
            }
            else {
                $('.texts text').show();
                $('.nodes circle').show();
            }
        });
        $('#svg1').on('mouseenter', '.nodes circle', function (event) {
            if (!dragging) {
                var name = $(this).attr('name');
                $('#info h4').css('color', $(this).attr('fill')).text(name);
                $('#info p').remove();
                console.log(info[name]);
                for (var key in info[name]) {
                    if (typeof(info[name][key]) == 'object') {
                        continue;
                    }
                    if (key == 'url' || key == 'title' || key == 'name' || key == 'edited' || key == 'created' || key == 'homeworld') {
                        continue;
                    }
                    $('#info').append('<p><span>' + key + '</span>' + info[name][key] + '</p>');
                }
                d3.select("#svg1 .nodes").selectAll('circle').attr('class', function (d) {
                    if (d.id == name) {
                        return '';
                    }
                    for (var i = 0; i < graph.links.length; i++) {
                        if (graph.links[i]['source'].id == name && graph.links[i]['target'].id == d.id) {
                            return '';
                        }
                        if (graph.links[i]['target'].id == name && graph.links[i]['source'].id == d.id) {
                            return '';
                        }
                    }
                    return 'inactive';
                });
                d3.select("#svg1 .links").selectAll('line').attr('class', function (d) {
                    if (d.source.id == name || d.target.id == name) {
                        return '';
                    } else {
                        return 'inactive';
                    }
                });
            }
        });
        $('#svg1').on('mouseleave', '.nodes circle', function (event) {
            if (!dragging) {
                d3.select('#svg1 .nodes').selectAll('circle').attr('class', '');
                d3.select('#svg1 .links').selectAll('line').attr('class', '');
            }
        });
        $('#svg1').on('mouseenter', '.texts text', function (event) {
            if (!dragging) {
                var name = $(this).attr('name');
                $('#info h4').css('color', $(this).attr('fill')).text(name);
                $('#info p').remove();
                for (var key in info[name]) {
                    if (typeof(info[name][key]) == 'object') {
                        continue;
                    }
                    if (key == 'url' || key == 'title' || key == 'name' || key == 'edited' || key == 'created' || key == 'homeworld') {
                        continue;
                    }
                    $('#info').append('<p><span>' + key + '</span>' + info[name][key] + '</p>');
                }
                d3.select('#svg1 .texts').selectAll('text').attr('class', function (d) {
                    if (d.id == name) {
                        return '';
                    }
                    for (var i = 0; i < graph.links.length; i++) {
                        if (graph.links[i]['source'].id == name && graph.links[i]['target'].id == d.id) {
                            return '';
                        }
                        if (graph.links[i]['target'].id == name && graph.links[i]['source'].id == d.id) {
                            return '';
                        }
                    }
                    return 'inactive';
                });
                d3.select("#svg1 .links").selectAll('line').attr('class', function (d) {
                    if (d.source.id == name || d.target.id == name) {
                        return '';
                    } else {
                        return 'inactive';
                    }
                });
            }
        });
        $('#svg1').on('mouseleave', '.texts text', function (event) {
            if (!dragging) {
                d3.select('#svg1 .texts').selectAll('text').attr('class', '');
                d3.select('#svg1 .links').selectAll('line').attr('class', '');
            }
        });
        $('#search input').keyup(function (event) {
            if ($(this).val() == '') {
                d3.select('#svg1 .texts').selectAll('text').attr('class', '');
                d3.select('#svg1 .nodes').selectAll('circle').attr('class', '');
                d3.select('#svg1 .links').selectAll('line').attr('class', '');
            }
            else {
                var name = $(this).val();
                d3.select('#svg1 .nodes').selectAll('circle').attr('class', function (d) {
                    if (d.id.toLowerCase().indexOf(name.toLowerCase()) >= 0) {
                        return '';
                    } else {
                        return 'inactive';
                    }
                });
                d3.select('#svg1 .texts').selectAll('text').attr('class', function (d) {
                    if (d.id.toLowerCase().indexOf(name.toLowerCase()) >= 0) {
                        return '';
                    } else {
                        return 'inactive';
                    }
                });
                d3.select("#svg1 .links").selectAll('line').attr('class', function (d) {
                    return 'inactive';
                });
            }
        });
        var info;
        d3.json("all.json", function (error, data) {
            info = data;
        });
    });
</script>
</html>

3.本地部署

当前目录下的结构如下图：

   all.json
│  creatNodeLink.ipynb
│  getFeature.ipynb
│  index.html
│  nodes.json
│  School_list_2020.csv
│  Spider_school.ipynb
│  全国高等学校名单.xls
│
└─dataprocess
        Abbr.txt
        Admin.txt
        English.txt
        Name.txt
        Time.txt
        Type.txt

移动到当前index.html目录下，在目录下进入cmd命令，输入以下代码，快速假设本地服务器。
python3一行代码搞定服务器
```
python -m http.server 8000
```
打开浏览器 http://localhost:8000/ ，查看自己的图谱吧

Demo预览：http://yzy616.xyz/
接下来，不管是增加下游任务，或是添加事务功能。
不管怎么说，先做增删改查，对于图数据库的内容还不是很熟悉，后续会由实验室的老武同学完善Neo4j相关协助开发。
4.下游任务
待开发

📕Record

[📌Demo]使用结构化信息构建知识图谱

简介

1.数据获取和清洗

1.1 清洗高校名单数据

用pandas 打开

先把备注中所有的Nan全部替换为公办

删除含Nan的空行

学校表示码转换数据类型成int64

保存文件

看看长啥样

1.3 json数据清洗

2.生成图谱

3.本地部署

4.下游任务

[📌Demo]使用结构化信息构建知识图谱

简介

1.数据获取和清洗

1.1 清洗高校名单数据

用pandas 打开

先把备注中所有的Nan全部替换为公办

删除含Nan的空行

学校表示码转换数据类型 成int64

保存文件

看看长啥样

1.3 json数据清洗

2.生成图谱

3.本地部署

4.下游任务

学校表示码转换数据类型成int64