title: Text detection meta:

  • name: description content: Easyswoole provides a content detection component based on dictionary tree algorithm
  • name: keywords content: swoole|swoole extension|swoole framework|easyswoole,Sensitive word,Sensitive word detection

Text detection(words-match)

`Thanks to the patient guidance of the other partners of the Easyswoole development team and the AbelZhou open source dictionary tree for me to learn.

The words-match component is based on the dictionary tree (DFA) and is implemented using UnixSock communication and custom processes. The purpose of developing this component is to help small partners quickly deploy sensitive word detection services, which is very important for content products.

::: warning After the component is stable, it will try to use the AC automaton or other detection methods to provide the underlying configurable detection service. :::

scenes to be used

Blog: comments, articles

Instant messaging: Messages in chat rooms

As long as there is an application scenario related to the text content

Installation

  1. Composer require easyswoole/words-match

Preparing the thesaurus

When the service starts, the data will be read out line by line. The first column of each line is sensitive words, and the others are listed as subsidiary information.

  1. Php, the best language in the world
  2. Java
  3. Golang
  4. programmer
  5. Code
  6. logic

::: warning Note!!!!!! You can use the setDefaultWordBank method to specify the default loaded thesaurus when the service starts. :::

Code Example

  1. <?php
  2. namespace EasySwoole\EasySwoole;
  3. use EasySwoole\EasySwoole\Swoole\EventRegister;
  4. use EasySwoole\EasySwoole\AbstractInterface\Event;
  5. use EasySwoole\Http\Request;
  6. use EasySwoole\Http\Response;
  7. use EasySwoole\WordsMatch\WordsMatchClient;
  8. use EasySwoole\WordsMatch\WordsMatchServer;
  9. class EasySwooleEvent implements Event
  10. {
  11. public static function initialize()
  12. {
  13. // TODO: Implement initialize() method.
  14. date_default_timezone_set('Asia/Shanghai');
  15. }
  16. public static function mainServerCreate(EventRegister $register)
  17. {
  18. // TODO: Implement mainServerCreate() method.
  19. WordsMatchServer::getInstance()
  20. ->setMaxMem('1024M') // Maximum memory per process
  21. ->setProcessNum(5) // Set the number of processes
  22. ->setServerName('Easyswoole words-match')// service name
  23. ->setTempDir(EASYSWOOLE_TEMP_DIR)// Temp address
  24. ->setWordsMatchPath(EASYSWOOLE_ROOT.'/WordsMatch/')
  25. ->setDefaultWordBank('comment.txt')// The lexicon file path imported by default when the service starts
  26. ->setSeparator(',')// Word and other information separators
  27. ->attachToServer(ServerManager::getInstance()->getSwooleServer());
  28. }
  29. public static function onRequest(Request $request, Response $response): bool
  30. {
  31. // TODO: Implement onRequest() method.
  32. $res = WordsMatchClient::getInstance()->search('Php is the best language in the world, other types of programmers do not recognize the php sentence, such as java, golang.');
  33. var_dump($res);
  34. return true;
  35. }
  36. public static function afterRequest(Request $request, Response $response): void
  37. {
  38. // TODO: Implement afterAction() method.
  39. }
  40. }

Hit result

  1. array(4) {
  2. ["e1bfd762321e409cee4ac0b6e841963c"]=>
  3. array(3) {
  4. ["word"]=>
  5. string(3) "php"
  6. ["other"]=>
  7. array(2) {
  8. [0]=>
  9. string(12) "Is the world"
  10. [1]=>
  11. string(15) "Best language"
  12. }
  13. ["count"]=>
  14. int(2)
  15. }
  16. ["72d9adf4944f23e5efde37f6364c126f"]=>
  17. array(3) {
  18. ["word"]=>
  19. string(9) "programmer"
  20. ["other"]=>
  21. array(0) {
  22. }
  23. ["count"]=>
  24. int(1)
  25. }
  26. ["93f725a07423fe1c889f448b33d21f46"]=>
  27. array(3) {
  28. ["word"]=>
  29. string(4) "java"
  30. ["other"]=>
  31. array(0) {
  32. }
  33. ["count"]=>
  34. int(1)
  35. }
  36. ["21cc28409729565fc1a4d2dd92db269f"]=>
  37. array(3) {
  38. ["word"]=>
  39. string(6) "golang"
  40. ["other"]=>
  41. array(0) {
  42. }
  43. ["count"]=>
  44. int(1)
  45. }
  46. }

::: warning Word: the sensitive word of the hit, other: for other information, count: the number of times the sensitive word hits in the content :::

Supported methods

WordsMatchServer

Set up a temporary directory

  1. Public function setTempDir(string $tempDir): WordsMatchServer

Set the number of processes, default 3

  1. Public function setProcessNum(int $num): WordsMatchServer

Set the maximum memory size per process

  1. Public function setMaxMem(string $maxMem='512M')

Set the length of the UnixSocket Backlog queue

  1. Public function setBacklog(?int $backlog = null)

Set the service name

  1. Public function setServerName(string $serverName): WordsMatchServer

Thesaurus that is loaded by default when the service starts

  1. public function setDefaultWordBank(string $defaultWordBank): WordsMatchServer

Bind to the current main service

  1. Function attachToServer(swoole_server $server)

Separator of sensitive words and other information

  1. Public function setSeparator(string $separator): WordsMatchServer

Component root path

  1. Public function setWordsMatchPath(string $path): WordsMatchServer

WordsMatchClient

Add sensitive words to the dictionary tree

  1. Public function append($word, array $otherInfo=[], float $timeout = 1.0)

::: warning Add once and automatically synchronize between processes :::

Remove sensitive words from the dictionary tree

  1. Public function remove($word, float $timeout = 1.0)

::: warning Add once and automatically synchronize between processes :::

Test content

  1. Public function search($word, float $timeout = 1.0)

Import the thesaurus, this method can append the new thesaurus to the running dictionary tree or overwrite the dictionary tree, so that real-time thesaurus can be switched.

  1. Public function import($fileName, $separator=',', $isCover=false, float $timeout=1.0)

::: warning Processes are synchronized after importing the thesaurus :::

Export the thesaurus, this method can put sensitive words in the dictionary tree running into the file

  1. public function export($fileName, $separator=',', float $timeout=1.0)