前缀树

前缀树

一. 基础知识

Trie树。又称之为单词查找树或者键树，是一种树形结构。应用于统计和排序大量的字符串。常被搜索引擎系统用于文本词频统计。它的优点：能够最大限度的减少无谓的字符串比较，查询效率比哈希表高。

核心思想是以空间换时间。利用记录字符串公共前缀来降低查询时间的开销。

3个基本性质

根节点不包含字符，除根节点外每一个节点都只包含一个字符。
从根节点到某一节点，路径上经过的字符连接起来，为该节点所对应的字符串
每个节点的所有子节点所包含的字符都不同。
每个节点对应一个前缀，叶节点对应最长前缀，即单词本身。

二. 功能

应该实现查询，插入，前缀查询的功能。

三. 数据结构组成

Trie，又称前缀树或字典树，是一棵有根树，其每个节点包含以下字段：

通过次数 int pass

作为结尾的次数 int end

指向子节点的指针数组children。对于本题而言，数组长度为26，即小写英文字母的数量。此时children[0]对应小写字母 a。
布尔字段isEnd，表示该节点是否为字符串的结尾。

16. 前缀树 - 图1

四. 实现

1. 插入

我们从字典树的根开始，插入字符串。对于当前字符对应的子节点，有两种情况：

子节点存在。沿着指针移动到子节点，继续处理下一个字符。
子节点不存在。创建一个新的子节点，记录在children数组的对应位置上，然后沿着指针移动到子节点，继续搜索下一个字符。

重复以上步骤，直到处理字符串的最后一个字符，然后将当前节点标记为字符串的结尾。

2. 查找前缀

我们从字典树的根开始，查找前缀。对于当前字符对应的子节点，有两种情况：

子节点存在。沿着指针移动到子节点，继续搜索下一个字符。
子节点不存在。说明字典树中不包含该前缀，返回空指针。
重复以上步骤，直到返回空指针或搜索完前缀的最后一个字符。

若搜索到了前缀的末尾，就说明字典树中存在该前缀。此外，若前缀末尾对应节点的isEnd为真，则说明字典树中存在该字符串。

3. 查找

实现了查找前缀的函数，就可以直接调用这个函数，检查返回的node是否不为空且是叶子节点。若是则说明此时的字符串存在，不然就不存在。

构造一:

public static class Node1 {
    public int pass;
    public int end;
    public Node1[] nexts;
    // char tmp = 'b' (tmp - 'a')
    public Node1() {
        pass = 0;
        end = 0;
        // 0 a
        // 1 b
        // 2 c
        // .. ..
        // 25 z
        // nexts[i] == null i方向的路不存在
        // nexts[i] != null i方向的路存在
        // 26个字母26条路
        nexts = new Node1[26];
    }
}
public static class Trie1 {
    private Node1 root;
    // 构造口节点
    public Trie1() {
        root = new Node1();
    }
    // 加入一个word的时候
    public void insert(String word) {
        // 如果word为空直接返回
        if (word == null) {
            return;
        }
        // 将word转为char数组
        char[] str = word.toCharArray();
        // 从头结点开始添加 引用从头结点出发
        Node1 node = root;
        // 头结点被经过pass++
        node.pass++;
        // 准备好一个变量path,获取对应路
        int path = 0;
        for (int i = 0; i < str.length; i++) { // 从左往右遍历字符
            path = str[i] - 'a'; // 由字符，对应成走向哪条路
            // 判断当前方向路上是否是空的
            if (node.nexts[path] == null) {
                // 如果是空的新建一个节点
                node.nexts[path] = new Node1();
            }
            // node指针,跳到新节点上
            // 或者之前建过节点,所以直接复用
            node = node.nexts[path];
            // 新节点被经过pass++
            node.pass++;
        }
        // 如果遍历完成,代表以此节点为结束end++
        node.end++;
    }
    // 删除一个元素
    public void delete(String word) {
        // 先查一下这个字符串加入几次,如果加入次数为0就不删除
        if (search(word) != 0) {
            // word变为char数组
            char[] chs = word.toCharArray();
            // node指针,指向头结点
            Node1 node = root;
            // 通过的每个路径都要减一
            node.pass--;
            // 准备index变量,存储路径
            int path = 0;
            // 遍历char数组
            for (int i = 0; i < chs.length; i++) {
                // 由字符，对应成走向哪条路
                path = chs[i] - 'a';
                // 如果正在删除的过程中有pass变成0,不考虑后面的元素,直接让node指向null,后面的节点冗余,直接释放
                if (--node.nexts[path].pass == 0) {
                    node.nexts[path] = null;
                    return;
                }
                // node指针指向现在的节点
                node = node.nexts[path];
            }
            // end--
            node.end--;
        }
    }
    // word这个单词之前加入过几次
    public int search(String word) {
        // 如果word为空,直接返回0
        if (word == null) {
            return 0;
        }
        // 将word转换成char数组
        char[] chs = word.toCharArray();
        // node指针,指向头结点
        Node1 node = root;
        // 准备index变量,存储路径
        int index = 0;
        // 遍历char数组
        for (int i = 0; i < chs.length; i++) {
            // 由字符，对应成走向哪条路
            index = chs[i] - 'a';
            // 如果对应路径为空,返回0
            if (node.nexts[index] == null) {
                return 0;
            }
            // 如果对应路径不为空,node 指向对应index的节点
            node = node.nexts[index];
        }
        // 返回节点的结束变量
        return node.end;
    }
    // 所有加入的字符串中，有几个是以pre这个字符串作为前缀的
    public int prefixNumber(String pre) {
        // 如果pre为空返回0
        if (pre == null) {
            return 0;
        }
        char[] chs = pre.toCharArray();
        Node1 node = root;
        int index = 0;
        for (int i = 0; i < chs.length; i++) {
            index = chs[i] - 'a';
            if (node.nexts[index] == null) {
                return 0;
            }
            node = node.nexts[index];
        }
        // 返回有多少个字符串通过了该节点
        return node.pass;
    }
}

构造二:

public static class Node2 {
    public int pass;
    public int end;
    public HashMap<Integer, Node2> nexts;

    public Node2() {
        pass = 0;
        end = 0;
        nexts = new HashMap<>();
    }
}

public static class Trie2 {
    private Node2 root;

    public Trie2() {
        root = new Node2();
    }

    public void insert(String word) {
        if (word == null) {
            return;
        }
        char[] chs = word.toCharArray();
        Node2 node = root;
        node.pass++;
        int index = 0;
        for (int i = 0; i < chs.length; i++) {
            index = (int) chs[i];
            if (!node.nexts.containsKey(index)) {
                node.nexts.put(index, new Node2());
            }
            node = node.nexts.get(index);
            node.pass++;
        }
        node.end++;
    }

    public void delete(String word) {
        if (search(word) != 0) {
            char[] chs = word.toCharArray();
            Node2 node = root;
            node.pass--;
            int index = 0;
            for (int i = 0; i < chs.length; i++) {
                index = (int) chs[i];
                if (--node.nexts.get(index).pass == 0) {
                    node.nexts.remove(index);
                    return;
                }
                node = node.nexts.get(index);
            }
            node.end--;
        }
    }

    // word这个单词之前加入过几次
    public int search(String word) {
        if (word == null) {
            return 0;
        }
        char[] chs = word.toCharArray();
        Node2 node = root;
        int index = 0;
        for (int i = 0; i < chs.length; i++) {
            index = (int) chs[i];
            if (!node.nexts.containsKey(index)) {
                return 0;
            }
            node = node.nexts.get(index);
        }
        return node.end;
    }

    // 所有加入的字符串中，有几个是以pre这个字符串作为前缀的
    public int prefixNumber(String pre) {
        if (pre == null) {
            return 0;
        }
        char[] chs = pre.toCharArray();
        Node2 node = root;
        int index = 0;
        for (int i = 0; i < chs.length; i++) {
            index = (int) chs[i];
            if (!node.nexts.containsKey(index)) {
                return 0;
            }
            node = node.nexts.get(index);
        }
        return node.pass;
    }
}

public static class Right {

    private HashMap<String, Integer> box;

    public Right() {
        box = new HashMap<>();
    }

    public void insert(String word) {
        if (!box.containsKey(word)) {
            box.put(word, 1);
        } else {
            box.put(word, box.get(word) + 1);
        }
    }

    public void delete(String word) {
        if (box.containsKey(word)) {
            if (box.get(word) == 1) {
                box.remove(word);
            } else {
                box.put(word, box.get(word) - 1);
            }
        }
    }

    public int search(String word) {
        if (!box.containsKey(word)) {
            return 0;
        } else {
            return box.get(word);
        }
    }

    public int prefixNumber(String pre) {
        int count = 0;
        for (String cur : box.keySet()) {
            if (cur.startsWith(pre)) {
                count += box.get(cur);
            }
        }
        return count;
    }
}

构造三:

package JavaCode.leetcode.DataStructure.Tree;

 class Trie {
     //Trie的两个属性，指向子节点的指针数组和表示该节点是否为结尾的布尔值
     private Trie[] children;
     private boolean isEnd;

     //构造
     public Trie() {
         children = new Trie[26];
         isEnd = false;
     }

     //插入节点。
     public void insert(String word) {
         Trie node = this;//指针指向当前的根
         for (int i = 0; i < word.length(); i++) {
             char ch = word.charAt(i);//待插入的字符
             int index = ch - 'a';//参数
             //当前的节点为null，就新建一个节点
             if (node.children[index] == null) {
                 node.children[index] = new Trie();
             }
             //当前节点不为null,就将node沿指针移动到子节点
             node = node.children[index];
         }
         //完成插入后，就将此时node所指向的节点isEnd置为true
         node.isEnd = true;
     }
     //查询前缀树中是否含有本字符串，使用查询前缀和的函数得到节点node,
     //若返回的node不为null,则说明找到了word的前缀，且如果此时isEnd为true，说明node是叶子
     //则说明此时的word存在于前缀树中。
     public boolean search(String word) {
         Trie node = searchPrefix(word);
         return node != null && node.isEnd;
     }

     //查询前缀
     public boolean startsWith(String prefix) {
         //只要返回值不为null，说明搜索到了前缀的末尾就为true，否则为false
         return searchPrefix(prefix) != null;
     }

     private Trie searchPrefix(String prefix) {
         Trie node = this;//指针指向当前的根
         for (int i = 0; i < prefix.length(); i++) {
             //当前访问的字符及其参数
             char ch = prefix.charAt(i);
             int index = ch - 'a';
             //访问的节点不存在，就返回一个null
             if (node.children[index] == null) {
                 return null;
             }
             //访问的节点存在，就沿着指针指向的节点移动
             node = node.children[index];
         }
         return node;//最后搜索到了末尾就返回这个末尾的节点，说明存在这个前缀
     }
}

暴力:

public static class Right {

        private HashMap<String, Integer> box;

        public Right() {
            box = new HashMap<>();
        }

        public void insert(String word) {
            if (!box.containsKey(word)) {
                box.put(word, 1);
            } else {
                box.put(word, box.get(word) + 1);
            }
        }

        public void delete(String word) {
            if (box.containsKey(word)) {
                if (box.get(word) == 1) {
                    box.remove(word);
                } else {
                    box.put(word, box.get(word) - 1);
                }
            }
        }

        public int search(String word) {
            if (!box.containsKey(word)) {
                return 0;
            } else {
                return box.get(word);
            }
        }

        public int prefixNumber(String pre) {
            int count = 0;
            for (String cur : box.keySet()) {
                if (cur.startsWith(pre)) {
                    count += box.get(cur);
                }
            }
            return count;
        }
    }

题

720. 词典中最长的单词

方法一: hashset模拟 —>三叶题解

class Solution {
    public String longestWord(String[] words) {
        //先创建ans变量存储答案
        String ans = "";
        //创建HashSet集合
        Set<String> set = new HashSet<>();
        //将words所有数元素存到set集合中
        for (String s : words) set.add(s);
        //遍历set集合
        for (String s : set) {
            //n:集合元素的长度 //m: 结果长度
            int n = s.length(), m = ans.length();
            //如果集合元素长度小于结果长度,进入下一循环
            if (n < m) continue;
            //如果集合元素长度等于结果长度并且,内容相同,进入下一循环
            if (n == m && s.compareTo(ans) > 0) continue;
            //判断指针ok预设为true
            boolean ok = true;
            //遍历集合元素长度
            for (int i = 1; i <= n && ok; i++) {
                //sub获取前i个元素
                String sub = s.substring(0, i);
                //如果set中含有sub,设置ok为false
                if (!set.contains(sub)) ok = false;
            }
            //如果ok为true 设置ans = s
            if (ok) ans = s;
        }
        //返回答案
        return ans;
    }
}

方法二: 字典树