sphinx php 学习可以参考 /var/www/to8to/trunk/gb_php/sphinxapi.phpinclude 'sphinxapi.php';$sp = new SphinxClient;$sp->SetServer('127.0.0.1', 9314);$sp->SetConnectTimeout(5);$sp->SetLimits(0, 10);//($start, $limit);$keyword=(isset($_GET['kw'])&& !empty($_GET['kw'])) ?trim($_GET['kw']) : '搜索内容';//在执行搜索之前,可以加入各种条件$result=$sp>Query($keyword,'iiyicms');//'*' 'iiyicms:iiyicms_increment'学习地址http://www.21andy.com/new/20100928/1973.html遇到问题编绎coreseek以支持python的时候,提示:py_layer.h:16:27: 致命错误: Python.h:没有那个文件或目录找不到python.h头文件解决方法:ubuntu下安装python-devsudo apt-get install python-devhttp://pecl.php.net/package/sphinxsphinx php 学习可以参考 /var/www/to8to/trunk/gb_php/sphinxapi.phpinclude 'sphinxapi.php';$sp = new SphinxClient;$sp->SetServer('127.0.0.1', 9314);$sp->SetConnectTimeout(5);$sp->SetLimits(0, 10);//($start, $limit);$keyword=(isset($_GET['kw'])&& !empty($_GET['kw'])) ?trim($_GET['kw']) : '搜索内容';//在执行搜索之前,可以加入各种条件$result=$sp>Query($keyword,'iiyicms');//'*' 'iiyicms:iiyicms_increment'new sphinxapi urlhttps://code.google.com/p/sphinxsearch/source/browse/trunk/api/http://code.google.com/p/sphinxsearch/source/browse/trunk/api/sphinxapi.php1、PHP+Mysql+Sphinx高效的站内搜索引擎搭建详释http://jingyan.baidu.com/article/95c9d20d9a7176ec4e756119.html2、使用Coreseek-4.1快速搭建Sphinx中文分词 Php-Mysql 全文检索 搜索引擎http://www.gretheer.com/2014/07/install-coreseek-sphinx-php-mysql.html3、sphinx 下载地址http://sphinxsearch.com/downloads/release/找到对应版本http://sphinxsearch.com/files/sphinxsearch_2.2.6-release-1~wheezy_i386.deb安装学习http://sphinxsearch.com/docs/archives/1.10/installing.html两个结合安装http://blog.atime.me/note/sphinx-coreseek-summary.htmlhttp://github.tiankonguse.com/blog/2014/11/03/coreseek-install-log/开发环境操作系统: Ubuntu 12.04 x86-64Coreseek: 4.1测试版(Sphinx-2.0.1)Python: 2.7Sphinx/Coreseek简介Sphinx是一个高性能的全文检索引擎,使用C++语言开发,采用GPL协议发布,可购买商业授权,目前的稳定版本是2.1.7。Coreseek是基于Sphinx的中文全文检索引擎,使用MMSEG算法进行中文分词,并且提供Python数据源。Coreseek采用GPLv2协议发布,可购买商业授权,目前的稳定版本是3.2.14,基于Sphinx-0.9.9,测试版本是4.1,基于Sphinx-2.0.1。(另外,Coreseek官方论坛在2013年的年末称即将发布5.0版本,不过至今无详细消息)Sphinx/Coreseek安装下载Coreseek-4.1的源代码wget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gztar xvf coreseek-4.1.beta.tar.gzcd coreseek-4.1-beta解压后发现有三个目录,主要的目录结构如下coreseek-4.1-beta/csft-4.1/ coreseek修改sphinx-2.0.1后的代码api/ sphinx searchd[查询API][6]的实现mmseg-3.2.14/ libmmseg分词库testpack/ 测试和配置示例README.txt 介绍和安装指南按照官方的安装指南,依次安装mmseg和csft。如果在configure过程中提示缺少头文件,可通过apt-file查询需要安装的软件包。安装mmseg-3.2.14 http://www.coreseek.cn/uploads/csft/3.2/mmseg-3.2.14.tar.gz这里完全参考官方的安装指南即可cd mmseg-3.2.14./bootstrap./configure --prefix=/usr/local/mmseg3make && sudo make install安装libiconv-1.14先安装libiconv,用于字符集编码的转换。wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gztar xvf libiconv-1.14.tar.gzcd libiconv-1.14./configuremake && sudo make install && ldconfig如果你的glibc版本在2.16以上,make时很有可能出现如下错误In file included from progname.c:26:0:./stdio.h:1010:1: error: 'gets' undeclared here (not in a function)_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");^参考这里的方法,下载patch文件,解压后打上patch即可。在libiconv-1.14目录下执行wget -O - http://blog.atime.me/static/resource/libiconv-glibc-2.16.patch.gz | gzip -d - | patch -p0或者考虑直接注释掉srclib/stdio.in.h文件的第698行(应该没问题),即// _GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");安装csft-4.1这里configure的参数和安装指南上稍有区别,一是添加--with-python选项来支持Python数据源,二是添加LIBS=-liconv来避免最后的链接错误。cd csft-4.1sh buildconf.sh./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql --with-python LIBS=-liconvmake -j2 && sudo make install如果sh buildconf.sh最后没有生成configure脚本,且提示automake: warnings are treated as errors,可以将configure.ac中的这行AM_INIT_AUTOMAKE([-Wall -Werror foreign])改为AM_INIT_AUTOMAKE([-Wall foreign])即删掉-Werror,然后重新运行sh buildconf.sh。如果configure的时候提示没有安装MySQL的头文件,安装libmysql++-dev包即可。如果你的gcc版本在4.7以上,编译的时候可能会因为sphinx的一个bug报错sphinxexpr.cpp:1746:43: error: 'ExprEval' was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]解决方法参考bug报告里的一个patch,在csft-4.1目录下执行wget -O - http://blog.atime.me/static/resource/sphinxexpr-gcc4.7.patch.gz | gzip -d - | patch -p0或者你也可以直接修改src/sphixexpr.cpp文件的1746, 1777和1823行,将三行中的ExprEval改为this->ExprEval。安装辅助工具将csft-4.1/contrib/scripts目录下的searchd脚本拷贝到/etc/init.d/目录下,即可使用service命令启动和终止searchd服务。安装好coreseek后,将/usr/local/coreseek/share/man/目录下的所有文件和目录都拷贝到/usr/local/share/man/目录里,即可使用man命令查看indexer和searchd的使用手册。Sphinx/Coreseek目录结构按照上面的步骤正确安装Coreseek后,在/usr/local/coreseek可看到如下几个文件夹bin/ sphinx的程序目录searchd 搜索服务器程序indexer 索引建立工具etc/ 配置文件目录csft.conf 默认配置文件share/man/ sphinx的man手册,建议拷贝到系统man目录,方便查询var/data/ 默认的索引存放目录log/ 默认的日志目录和pid文件目录实际使用sphinx的流程大概如下:使用indexer建立或更新索引,如果searchd已经运行,则需要使用--rotate选项。运行searchd例如:cd /usr/local/coreseek./bin/indexer --all # 第一次建立索引,使用默认配置文件/usr/local/coreseek/etc/csft.conf./bin/searchd # 使用默认配置文件/usr/local/coreseek/etc/csft.confSphinx/Coreseek配置配置文件可参考Sphinx的官方文档和配置例子/usr/local/coreseek/etc/sphinx.conf.dist。searchd配置示例searchd{listen = 9312listen = 9306:mysql41log = /usr/local/coreseek/var/log/searchd.logquery_log = /usr/local/coreseek/var/log/query.logread_timeout = 5max_children = 30pid_file = /usr/local/coreseek/var/log/searchd.pidmax_matches = 1000seamless_rotate = 1preopen_indexes = 1unlink_old = 1workers = threads # for RT to work}这里面的诸多配置选项可参考searchd program configuration options。其中,通过第二个listen配置listen = 9306:mysql41,你可以使用mysql的client来访问searchd的索引。mysql -h 127.0.0.1 -P 9306然后使用SphinxQL查询语言即可搜索索引。indexer配置示例indexer {mem_limit = 1024Mwrite_buffer = 16M}索引工具indexer的配置相对少一些,参考indexer program configuration options。需要注意的是,mem_limit如果查过2048M会出问题1。数据源和索引配置参考示例配置文件/usr/local/coreseek/etc/sphinx.conf.dist和官方文档Data source configuration options,Index configuration options即可。数据源关于数据源,需要注意的是:每条数据的document id必须是唯一的正整数(不能为0)。6Python数据源Coreseek开发了一个号称万能的Python数据源,使用起来比xmlpipe2要方便一些。其实就是用Python脚本来获取待索引数据,配置文档见这里,接口文档见这里,示例程序见这里。Xmlpipe2数据源这是用Sphinx官方支持的一个"万能"数据源,其实就是将待索引数据按照xmlpipe2的schema写入标准输出中。在数据源的配置项中需要设置type为xmlpipe2,另外还要设置一个xmlpipe_command选项,该选项的命令必须输出符合xmlpipe2 schema的xml文档到标准输出流(stdout)里,比如:source news_src{type = xmlpipe2xmlpipe_command = cat /tmp/xmlpipe2_out.xml}建立索引使用indexer命令建立索引/usr/local/coreseek/bin/indexer --rotate $INDEX_NAMESphinx使用indexer工具建立和更新索引,据称indexer的索引速度能达到10~15MB/秒2。实际使用过程中,我尝试过分别用Python数据源和xmlpipe2数据源来建立索引,xmlpipe2稍微快一点点。使用Python数据源索引14G文本,大约50万个文件,最后生成2.3G索引,最快在2.8MB/秒左右,估计是慢在中文分词上。自定义中文词库见这篇文章。查询Sphinx支持使用SphinxAPI和SphinxQL查询数据。SphinxAPISphinxAPI用于和searchd通信,官方提供PHP, Python和Java的实现,API的文档见此。Coreseek携带的API和示例程序实现都放在csft-4.1/api/目录下。SphinxQLSphinxQL是Sphinx提供的SQL方言,用于查询和管理索引,相比SphinxAPI,SphinxQL支持的操作更多,比如删除索引等,文档在此。实际应用项目简介项目的部分需求:目前需要做全文检索的数据是html网页文件,总数在1000万左右,文件总大小大概是200GB,每天新增几千个文件左右。将来很可能需要检索pdf和mysql等不同的数据来源。提供RESTful风格的搜索接口,返回json格式的查询结果。因为搜索服务主要是内部使用,估计搜索请求的压力不大。为缩短开发周期,整个项目采用Python实现,使用coreseek自带的Python数据源建立索引。在开发过程中使用了如下的第三方Python packages:lxml-3.3.4: 解析html文件tornado-3.2: 异步http服务器,异步socket通信等设计考量索引上面有提到过,indexer是一个单线程的工具,建立中文索引的速度基本上很难超过3MB/秒,因此可以考虑将大的索引拆分成若干小索引,这些小索引可以同时建立,最后再合并成一个完整的索引。因为待索引文档的基数很大,但每天更新的数量又比较小,所以建立索引的时候最好使用官方推荐的一种Main + Delta的方式,主(Main)索引只需要最开始建立一次,然后每天重建一次增量(Delta)索引并合并到主索引中,相关文档见Delta index updates。Python相关项目里需要使用Python查找和解析html文件。文件查找没有使用Python标准库os的walk函数,当文件数量较多时,walk函数的效率会比较低。有兴趣的可以看下一个叫betterwalk的第三方库,据称比os.walk快不少。实际项目中,因为待索引文件的目录结构固定且很有规律,直接用os.listdir和os.lstat即可解决,os.lstat可以获取文件的最后修改日期,在建立增量索引的时候非常有用。html文件的解析使用了口碑很给力的lxml库,用lxml解析html文件时通常有多种方法,使用之前最好仔细看一下lxml各个函数的benchmark,了解一下哪种方法更快一些,比如使用xpath查找html节点时,lxml的XPath类比xpath()函数要快好几倍3。另外,Python的多线程处理计算密集型(CPU Bound)任务是一个众所周知的大坑,比如多线程解析html文件。这时最好用多进程分别做解析任务,然后将解析好的文件收集起来。前面说过indexer比较慢,一般建立索引的时候,速度瓶颈就在indexer上。为了尽量加快整体建立索引的速度,比较靠谱的方法是将文件扫描,文件解析和indexer索引这三步同时进行,由于indexer无法及时索引解析好的文件,因此必须将解析好的文件缓存起来,比如缓存在内存里。然而内存是紧俏资源,必须限量节约使用。关于内存的限量使用,在实现时可以为缓存设定一个阀值,缓存满了就先暂停所有的文件扫描和解析进程,等缓存快没了的时候再继续,在Linux上使用SIGSTOP和SIGCONT信号可以很容易就实现这一功能。相比之下,如何准确的获取缓存对象所占用的内存大小倒是比较困难,折中的办法是统计整个进程的内存占用或是间接的方法,或者干脆通过限制缓存对象的数目来做限制(这个比较弱智的感觉)。关于内存的节约使用,大家都知道一般的Python对象都会自动创建一个__dict__属性来存储其他的属性,然而不太广为人知的是,Python的内置类型dict是一个内存大户,当Python对象少的时候可能很难发现,如果在内存里存储十万或一百万个Python对象时,用Memory Profiler(比如Heapy)做下profiling你会发现,光是__dict__本身(不包括存在__dict__里的数据)就能吃掉你巨量的内存。通过设置类属性__slots__可以禁止__dict__属性的自动创建,其中一个成功故事在这里,这个哥们通过__slots__节约了9G内存。需要说明的是,__slots__会带来一些负面作用,比较明显的一个是,使用version 0版本的pickle协议序列化定义了__slots__属性的对象会有报错,但使用更高级别的pickle协议则没问题4(一般很少用到cPickle的protocol version 0,因为又慢又占空间)。另外缓存所使用的数据结构也比较重要,直接用Python的内置类型list肯定不行,因为缓存应该是一个FIFO的队列,而del(list[0])操作是O(n)的复杂度5,用collections.deque比较合适。使用中遇到的问题索引文件损坏导致searchd崩溃测试时发现搜索部分关键词的时候,searchd会因为断言失败后crash并自动重启。经调试和在网上查资料,发现有个比较大的索引(2G左右)很可能在merge的时候发生了损坏,用indextool --check检查对应的索引后,输出大量的FAILED, row not found错误,目前除了升级sphinx和重建损坏的索引,貌似没有别的解决方法。socket接收数据超时使用sphinxapi.py提供的接口和searchd通讯时,如果索引较大,searchd可能响应较慢,此时很有可能会报socket超时的异常。python里阻塞的socket的默认超时时间是1秒,解决的方法比较简单,直接调用sphinxapi里的SetConnectTimeout函数设置超时即可。资源和参考资料Sphinx 2.0.1 DocumentationCoreseek与第四城搜索,有很多性能相关的测试,很详尽。Coreseek python数据源接口文档脚注Sphinx indexer program configuration options, mem_limit,引用于2014-04-17。Wikipedia:Sphinx,引用于2014-04-17。lxml benchmarks and speed, xpath,引用于2014-04-18。python pickling slots error,引用于2014-04-18。Python Time Complexity,引用于2014-04-18。Restrictions on the source data,引用于2014-09-18。Linux下编译安装Sphinx、中文分词coreseek及PHP的sphinx扩展http://www.icultivator.com/p/6347.html使用 sphinx 需要做以下几件事1.有数据;2.建立 sphinx 配置文件;3.生成索引;4.启动 searchd 服务进程,默认是93125.用 PHP 去连接 sphinx 服务启动 sphinxcd /usr/local/coreseek/bin/./searchd启动命令searchd 命令参数介绍:-c 指定配置文件--stop 停止服务--pidfile 用来显式指定一个 PID 文件-p 指定端口5、php 安装 sphinx 扩展sudo pecl install sphinx如果出现错误:"configure: error: Cannot find libsphinxclient headers"解决方法:cd coreseek-4.1/csft-4.1/api/libsphinxclient/./configure --prefix=/usr/local/libsphinxclientsudo make && make install解决完毕!回去接着执行./configure --with-php-config=/usr/local/php/bin/php-config --with-sphinx=/usr/local/libsphinxclientsudo make && make install出现类似"Installing shared extensions: /usr/lib/php5/20090626/sphinx.so",表示成功。可以进入该目录下会发现生成了一个 sphinx.so 文件在 php.ini 中加载该 so 文件extension=/usr/lib/php5/20090626/sphinx.so重启 apache ,phpinfo() 中出现这个表明成功。sphinx<?php//// $Id: sphinxapi.php 1566 2008-11-17 19:06:44Z shodan $////// Copyright (c) 2001-2008, Andrew Aksyonoff. All rights reserved.//// This program is free software; you can redistribute it and/or modify// it under the terms of the GNU General Public License. You should have// received a copy of the GPL license along with this program; if you// did not, you can find it at http://www.gnu.org////////////////////////////////////////////////////////////////////////////////// PHP version of Sphinx searchd client (PHP API)//////////////////////////////////////////////////////////////////////////////// known searchd commandsdefine ( "SEARCHD_COMMAND_SEARCH", 0 );define ( "SEARCHD_COMMAND_EXCERPT", 1 );define ( "SEARCHD_COMMAND_UPDATE", 2 );define ( "SEARCHD_COMMAND_KEYWORDS",3 );define ( "SEARCHD_COMMAND_PERSIST", 4 );/// current client-side command implementation versionsdefine ( "VER_COMMAND_SEARCH", 0x116 );define ( "VER_COMMAND_EXCERPT", 0x100 );define ( "VER_COMMAND_UPDATE", 0x102 );define ( "VER_COMMAND_KEYWORDS", 0x100 );/// known searchd status codesdefine ( "SEARCHD_OK", 0 );define ( "SEARCHD_ERROR", 1 );define ( "SEARCHD_RETRY", 2 );define ( "SEARCHD_WARNING", 3 );/// known match modesdefine ( "SPH_MATCH_ALL", 0 );define ( "SPH_MATCH_ANY", 1 );define ( "SPH_MATCH_PHRASE", 2 );define ( "SPH_MATCH_BOOLEAN", 3 );define ( "SPH_MATCH_EXTENDED", 4 );define ( "SPH_MATCH_FULLSCAN", 5 );define ( "SPH_MATCH_EXTENDED2", 6 ); // extended engine V2 (TEMPORARY, WILL BE REMOVED)/// known ranking modes (ext2 only)define ( "SPH_RANK_PROXIMITY_BM25", 0 ); ///< default mode, phrase proximity major factor and BM25 minor onedefine ( "SPH_RANK_BM25", 1 ); ///< statistical mode, BM25 ranking only (faster but worse quality)define ( "SPH_RANK_NONE", 2 ); ///< no ranking, all matches get a weight of 1define ( "SPH_RANK_WORDCOUNT", 3 ); ///< simple word-count weighting, rank is a weighted sum of per-field keyword occurence countsdefine ( "SPH_RANK_PROXIMITY", 4 );define ( "SPH_RANK_MATCHANY", 5 );/// known sort modesdefine ( "SPH_SORT_RELEVANCE", 0 );define ( "SPH_SORT_ATTR_DESC", 1 );define ( "SPH_SORT_ATTR_ASC", 2 );define ( "SPH_SORT_TIME_SEGMENTS", 3 );define ( "SPH_SORT_EXTENDED", 4 );define ( "SPH_SORT_EXPR", 5 );/// known filter typesdefine ( "SPH_FILTER_VALUES", 0 );define ( "SPH_FILTER_RANGE", 1 );define ( "SPH_FILTER_FLOATRANGE", 2 );/// known attribute typesdefine ( "SPH_ATTR_INTEGER", 1 );define ( "SPH_ATTR_TIMESTAMP", 2 );define ( "SPH_ATTR_ORDINAL", 3 );define ( "SPH_ATTR_BOOL", 4 );define ( "SPH_ATTR_FLOAT", 5 );define ( "SPH_ATTR_BIGINT", 6 );define ( "SPH_ATTR_MULTI", 0x40000000 );/// known grouping functionsdefine ( "SPH_GROUPBY_DAY", 0 );define ( "SPH_GROUPBY_WEEK", 1 );define ( "SPH_GROUPBY_MONTH", 2 );define ( "SPH_GROUPBY_YEAR", 3 );define ( "SPH_GROUPBY_ATTR", 4 );define ( "SPH_GROUPBY_ATTRPAIR", 5 );// important properties of PHP's integers:// - always signed (one bit short of PHP_INT_SIZE)// - conversion from string to int is saturated// - float is double// - div converts arguments to floats// - mod converts arguments to ints// the packing code below works as follows:// - when we got an int, just pack it// if performance is a problem, this is the branch users should aim for//// - otherwise, we got a number in string form// this might be due to different reasons, but we assume that this is// because it didn't fit into PHP int//// - factor the string into high and low ints for packing// - if we have bcmath, then it is used// - if we don't, we have to do it manually (this is the fun part)//// - x64 branch does factoring using ints// - x32 (ab)uses floats, since we can't fit unsigned 32-bit number into an int//// unpacking routines are pretty much the same.// - return ints if we can// - otherwise format number into a string/// pack 64-bit signedfunction sphPackI64 ( $v ){assert ( is_numeric($v) );// x64if ( PHP_INT_SIZE>=8 ){$v = (int)$v;return pack ( "NN", $v>>32, $v&0xFFFFFFFF );}// x32, intif ( is_int($v) )return pack ( "NN", $v < 0 ? -1 : 0, $v );// x32, bcmathif ( function_exists("bcmul") ){if ( bccomp ( $v, 0 ) == -1 )$v = bcadd ( "18446744073709551616", $v );$h = bcdiv ( $v, "4294967296", 0 );$l = bcmod ( $v, "4294967296" );return pack ( "NN", (float)$h, (float)$l ); // conversion to float is intentional; int would lose 31st bit}// x32, no-bcmath$p = max(0, strlen($v) - 13);$lo = abs((float)substr($v, $p));$hi = abs((float)substr($v, 0, $p));$m = $lo + $hi*1316134912.0; // (10 ^ 13) % (1 << 32) = 1316134912$q = floor($m/4294967296.0);$l = $m - ($q*4294967296.0);$h = $hi*2328.0 + $q; // (10 ^ 13) / (1 << 32) = 2328if ( $v<0 ){if ( $l==0 )$h = 4294967296.0 - $h;else{$h = 4294967295.0 - $h;$l = 4294967296.0 - $l;}}return pack ( "NN", $h, $l );}/// pack 64-bit unsignedfunction sphPackU64 ( $v ){assert ( is_numeric($v) );// x64if ( PHP_INT_SIZE>=8 ){assert ( $v>=0 );// x64, intif ( is_int($v) )return pack ( "NN", $v>>32, $v&0xFFFFFFFF );// x64, bcmathif ( function_exists("bcmul") ){$h = bcdiv ( $v, 4294967296, 0 );$l = bcmod ( $v, 4294967296 );return pack ( "NN", $h, $l );}// x64, no-bcmath$p = max ( 0, strlen($v) - 13 );$lo = (int)substr ( $v, $p );$hi = (int)substr ( $v, 0, $p );$m = $lo + $hi*1316134912;$l = $m % 4294967296;$h = $hi*2328 + (int)($m/4294967296);return pack ( "NN", $h, $l );}// x32, intif ( is_int($v) )return pack ( "NN", 0, $v );// x32, bcmathif ( function_exists("bcmul") ){$h = bcdiv ( $v, "4294967296", 0 );$l = bcmod ( $v, "4294967296" );return pack ( "NN", (float)$h, (float)$l ); // conversion to float is intentional; int would lose 31st bit}// x32, no-bcmath$p = max(0, strlen($v) - 13);$lo = (float)substr($v, $p);$hi = (float)substr($v, 0, $p);$m = $lo + $hi*1316134912.0;$q = floor($m / 4294967296.0);$l = $m - ($q * 4294967296.0);$h = $hi*2328.0 + $q;return pack ( "NN", $h, $l );}// unpack 64-bit unsignedfunction sphUnpackU64 ( $v ){list ( $hi, $lo ) = array_values ( unpack ( "N*N*", $v ) );if ( PHP_INT_SIZE>=8 ){if ( $hi<0 ) $hi += (1<<32); // because php 5.2.2 to 5.2.5 is totally fucked up againif ( $lo<0 ) $lo += (1<<32);// x64, intif ( $hi<=2147483647 )return ($hi<<32) + $lo;// x64, bcmathif ( function_exists("bcmul") )return bcadd ( $lo, bcmul ( $hi, "4294967296" ) );// x64, no-bcmath$C = 100000;$h = ((int)($hi / $C) << 32) + (int)($lo / $C);$l = (($hi % $C) << 32) + ($lo % $C);if ( $l>$C ){$h += (int)($l / $C);$l = $l % $C;}if ( $h==0 )return $l;return sprintf ( "%d%05d", $h, $l );}// x32, intif ( $hi==0 ){if ( $lo>0 )return $lo;return sprintf ( "%u", $lo );}$hi = sprintf ( "%u", $hi );$lo = sprintf ( "%u", $lo );// x32, bcmathif ( function_exists("bcmul") )return bcadd ( $lo, bcmul ( $hi, "4294967296" ) );// x32, no-bcmath$hi = (float)$hi;$lo = (float)$lo;$q = floor($hi/10000000.0);$r = $hi - $q*10000000.0;$m = $lo + $r*4967296.0;$mq = floor($m/10000000.0);$l = $m - $mq*10000000.0;$h = $q*4294967296.0 + $r*429.0 + $mq;$h = sprintf ( "%.0f", $h );$l = sprintf ( "%07.0f", $l );if ( $h=="0" )return sprintf( "%.0f", (float)$l );return $h . $l;}// unpack 64-bit signedfunction sphUnpackI64 ( $v ){list ( $hi, $lo ) = array_values ( unpack ( "N*N*", $v ) );// x64if ( PHP_INT_SIZE>=8 ){if ( $hi<0 ) $hi += (1<<32); // because php 5.2.2 to 5.2.5 is totally fucked up againif ( $lo<0 ) $lo += (1<<32);return ($hi<<32) + $lo;}// x32, intif ( $hi==0 ){if ( $lo>0 )return $lo;return sprintf ( "%u", $lo );}// x32, intelseif ( $hi==-1 ){if ( $lo<0 )return $lo;return sprintf ( "%.0f", $lo - 4294967296.0 );}$neg = "";$c = 0;if ( $hi<0 ){$hi = ~$hi;$lo = ~$lo;$c = 1;$neg = "-";}$hi = sprintf ( "%u", $hi );$lo = sprintf ( "%u", $lo );// x32, bcmathif ( function_exists("bcmul") )return $neg . bcadd ( bcadd ( $lo, bcmul ( $hi, "4294967296" ) ), $c );// x32, no-bcmath$hi = (float)$hi;$lo = (float)$lo;$q = floor($hi/10000000.0);$r = $hi - $q*10000000.0;$m = $lo + $r*4967296.0;$mq = floor($m/10000000.0);$l = $m - $mq*10000000.0 + $c;$h = $q*4294967296.0 + $r*429.0 + $mq;$h = sprintf ( "%.0f", $h );$l = sprintf ( "%07.0f", $l );if ( $h=="0" )return $neg . sprintf( "%.0f", (float)$l );return $neg . $h . $l;}/// sphinx searchd client classclass SphinxClient{var $_host; ///< searchd host (default is "localhost")var $_port; ///< searchd port (default is 3312)var $_offset; ///< how many records to seek from result-set start (default is 0)var $_limit; ///< how many records to return from result-set starting at offset (default is 20)var $_mode; ///< query matching mode (default is SPH_MATCH_ALL)var $_weights; ///< per-field weights (default is 1 for all fields)var $_sort; ///< match sorting mode (default is SPH_SORT_RELEVANCE)var $_sortby; ///< attribute to sort by (defualt is "")var $_min_id; ///< min ID to match (default is 0, which means no limit)var $_max_id; ///< max ID to match (default is 0, which means no limit)var $_filters; ///< search filtersvar $_groupby; ///< group-by attribute namevar $_groupfunc; ///< group-by function (to pre-process group-by attribute value with)var $_groupsort; ///< group-by sorting clause (to sort groups in result set with)var $_groupdistinct;///< group-by count-distinct attributevar $_maxmatches; ///< max matches to retrievevar $_cutoff; ///< cutoff to stop searching at (default is 0)var $_retrycount; ///< distributed retries countvar $_retrydelay; ///< distributed retries delayvar $_anchor; ///< geographical anchor pointvar $_indexweights; ///< per-index weightsvar $_ranker; ///< ranking mode (default is SPH_RANK_PROXIMITY_BM25)var $_maxquerytime; ///< max query time, milliseconds (default is 0, do not limit)var $_fieldweights; ///< per-field-name weightsvar $_overrides; ///< per-query attribute values overridesvar $_select; ///< select-list (attributes or expressions, with optional aliases)var $_error; ///< last error messagevar $_warning; ///< last warning messagevar $_connerror; ///< connection error vs remote error flagvar $_reqs; ///< requests array for multi-queryvar $_mbenc; ///< stored mbstring encodingvar $_arrayresult; ///< whether $result["matches"] should be a hash or an arrayvar $_timeout; ///< connect timeout/////////////////////////////////////////////////////////////////////////////// common stuff//////////////////////////////////////////////////////////////////////////////// create a new client object and fill defaultsfunction SphinxClient (){// per-client-object settings$this->_host = "localhost";$this->_port = 3312;$this->_path = false;$this->_socket = false;// per-query settings$this->_offset = 0;$this->_limit = 20;$this->_mode = SPH_MATCH_ALL;$this->_weights = array ();$this->_sort = SPH_SORT_RELEVANCE;$this->_sortby = "";$this->_min_id = 0;$this->_max_id = 0;$this->_filters = array ();$this->_groupby = "";$this->_groupfunc = SPH_GROUPBY_DAY;$this->_groupsort = "@group desc";$this->_groupdistinct= "";$this->_maxmatches = 1000;$this->_cutoff = 0;$this->_retrycount = 0;$this->_retrydelay = 0;$this->_anchor = array ();$this->_indexweights= array ();$this->_ranker = SPH_RANK_PROXIMITY_BM25;$this->_maxquerytime= 0;$this->_fieldweights= array();$this->_overrides = array();$this->_select = "*";$this->_error = ""; // per-reply fields (for single-query case)$this->_warning = "";$this->_connerror = false;$this->_reqs = array (); // requests storage (for multi-query case)$this->_mbenc = "";$this->_arrayresult = false;$this->_timeout = 0;}function __destruct(){if ( $this->_socket !== false )fclose ( $this->_socket );}/// get last error message (string)function GetLastError (){return $this->_error;}/// get last warning message (string)function GetLastWarning (){return $this->_warning;}/// get last error flag (to tell network connection errors from searchd errors or broken responses)function IsConnectError(){return $this->_connerror;}/// set searchd host name (string) and port (integer)function SetServer ( $host, $port = 0 ){assert ( is_string($host) );if ( $host[0] == '/'){$this->_path = 'unix://' . $host;return;}if ( substr ( $host, 0, 7 )=="unix://" ){$this->_path = $host;return;}assert ( is_int($port) );$this->_host = $host;$this->_port = $port;$this->_path = '';}/// set server connection timeout (0 to remove)function SetConnectTimeout ( $timeout ){assert ( is_numeric($timeout) );$this->_timeout = $timeout;}function _Send ( $handle, $data, $length ){if ( feof($handle) || fwrite ( $handle, $data, $length ) !== $length ){$this->_error = 'connection unexpectedly closed (timed out?)';$this->_connerror = true;return false;}return true;}//////////////////////////////////////////////////////////////////////////////// enter mbstring workaround modefunction _MBPush (){$this->_mbenc = "";if ( ini_get ( "mbstring.func_overload" ) & 2 ){$this->_mbenc = mb_internal_encoding();mb_internal_encoding ( "latin1" );}}/// leave mbstring workaround modefunction _MBPop (){if ( $this->_mbenc )mb_internal_encoding ( $this->_mbenc );}/// connect to searchd serverfunction _Connect (){if ( $this->_socket !== false )return $this->_socket;$errno = 0;$errstr = "";$this->_connerror = false;if ( $this->_path ){$host = $this->_path;$port = 0;}else{$host = $this->_host;$port = $this->_port;}if ( $this->_timeout<=0 )$fp = @fsockopen ( $host, $port, $errno, $errstr );else$fp = @fsockopen ( $host, $port, $errno, $errstr, $this->_timeout );if ( !$fp ){if ( $this->_path )$location = $this->_path;else$location = "{$this->_host}:{$this->_port}";$errstr = trim ( $errstr );$this->_error = "connection to $location failed (errno=$errno, msg=$errstr)";$this->_connerror = true;return false;}// check versionlist(,$v) = unpack ( "N*", fread ( $fp, 4 ) );$v = (int)$v;if ( $v<1 ){fclose ( $fp );$this->_error = "expected searchd protocol version 1+, got version '$v'";return false;}// all ok, send my versionif ( !$this->_Send ( $fp, pack ( "N", 1 ), 4 ) )return false;return $fp;}/// get and check response packet from searchd serverfunction _GetResponse ( $fp, $client_ver ){$response = "";$len = 0;$header = fread ( $fp, 8 );if ( strlen($header)==8 ){list ( $status, $ver, $len ) = array_values ( unpack ( "n2a/Nb", $header ) );$left = $len;while ( $left>0 && !feof($fp) ){$chunk = fread ( $fp, $left );if ( $chunk ){$response .= $chunk;$left -= strlen($chunk);}}}if ( $this->_socket === false )fclose ( $fp );// check response$read = strlen ( $response );if ( !$response || $read!=$len ){$this->_error = $len? "failed to read searchd response (status=$status, ver=$ver, len=$len, read=$read)": "received zero-sized searchd response";return false;}// check statusif ( $status==SEARCHD_WARNING ){list(,$wlen) = unpack ( "N*", substr ( $response, 0, 4 ) );$this->_warning = substr ( $response, 4, $wlen );return substr ( $response, 4+$wlen );}if ( $status==SEARCHD_ERROR ){$this->_error = "searchd error: " . substr ( $response, 4 );return false;}if ( $status==SEARCHD_RETRY ){$this->_error = "temporary searchd error: " . substr ( $response, 4 );return false;}if ( $status!=SEARCHD_OK ){$this->_error = "unknown status code '$status'";return false;}// check versionif ( $ver<$client_ver ){$this->_warning = sprintf ( "searchd command v.%d.%d older than client's v.%d.%d, some options might not work",$ver>>8, $ver&0xff, $client_ver>>8, $client_ver&0xff );}return $response;}/////////////////////////////////////////////////////////////////////////////// searching//////////////////////////////////////////////////////////////////////////////// set offset and count into result set,/// and optionally set max-matches and cutoff limitsfunction SetLimits ( $offset, $limit, $max=0, $cutoff=0 ){assert ( is_int($offset) );assert ( is_int($limit) );assert ( $offset>=0 );assert ( $limit>0 );assert ( $max>=0 );$this->_offset = $offset;$this->_limit = $limit;if ( $max>0 )$this->_maxmatches = $max;if ( $cutoff>0 )$this->_cutoff = $cutoff;}/// set maximum query time, in milliseconds, per-index/// integer, 0 means "do not limit"function SetMaxQueryTime ( $max ){assert ( is_int($max) );assert ( $max>=0 );$this->_maxquerytime = $max;}/// set matching modefunction SetMatchMode ( $mode ){assert ( $mode==SPH_MATCH_ALL|| $mode==SPH_MATCH_ANY|| $mode==SPH_MATCH_PHRASE|| $mode==SPH_MATCH_BOOLEAN|| $mode==SPH_MATCH_EXTENDED|| $mode==SPH_MATCH_FULLSCAN|| $mode==SPH_MATCH_EXTENDED2 );$this->_mode = $mode;}/// set ranking modefunction SetRankingMode ( $ranker ){assert ( $ranker==SPH_RANK_PROXIMITY_BM25|| $ranker==SPH_RANK_BM25|| $ranker==SPH_RANK_NONE|| $ranker==SPH_RANK_WORDCOUNT|| $ranker==SPH_RANK_PROXIMITY );$this->_ranker = $ranker;}/// set matches sorting modefunction SetSortMode ( $mode, $sortby="" ){assert ($mode==SPH_SORT_RELEVANCE ||$mode==SPH_SORT_ATTR_DESC ||$mode==SPH_SORT_ATTR_ASC ||$mode==SPH_SORT_TIME_SEGMENTS ||$mode==SPH_SORT_EXTENDED ||$mode==SPH_SORT_EXPR );assert ( is_string($sortby) );assert ( $mode==SPH_SORT_RELEVANCE || strlen($sortby)>0 );$this->_sort = $mode;$this->_sortby = $sortby;}/// bind per-field weights by order/// DEPRECATED; use SetFieldWeights() insteadfunction SetWeights ( $weights ){assert ( is_array($weights) );foreach ( $weights as $weight )assert ( is_int($weight) );$this->_weights = $weights;}/// bind per-field weights by namefunction SetFieldWeights ( $weights ){assert ( is_array($weights) );foreach ( $weights as $name=>$weight ){assert ( is_string($name) );assert ( is_int($weight) );}$this->_fieldweights = $weights;}/// bind per-index weights by namefunction SetIndexWeights ( $weights ){assert ( is_array($weights) );foreach ( $weights as $index=>$weight ){assert ( is_string($index) );assert ( is_int($weight) );}$this->_indexweights = $weights;}/// set IDs range to match/// only match records if document ID is beetwen $min and $max (inclusive)function SetIDRange ( $min, $max ){assert ( is_numeric($min) );assert ( is_numeric($max) );assert ( $min<=$max );$this->_min_id = $min;$this->_max_id = $max;}/// set values set filter/// only match records where $attribute value is in given setfunction SetFilter ( $attribute, $values, $exclude=false ){assert ( is_string($attribute) );assert ( is_array($values) );assert ( count($values) );if ( is_array($values) && count($values) ){foreach ( $values as $value )assert ( is_numeric($value) );$this->_filters[] = array ( "type"=>SPH_FILTER_VALUES, "attr"=>$attribute, "exclude"=>$exclude, "values"=>$values );}}/// set range filter/// only match records if $attribute value is beetwen $min and $max (inclusive)function SetFilterRange ( $attribute, $min, $max, $exclude=false ){assert ( is_string($attribute) );assert ( is_numeric($min) );assert ( is_numeric($max) );assert ( $min<=$max );$this->_filters[] = array ( "type"=>SPH_FILTER_RANGE, "attr"=>$attribute, "exclude"=>$exclude, "min"=>$min, "max"=>$max );}/// set float range filter/// only match records if $attribute value is beetwen $min and $max (inclusive)function SetFilterFloatRange ( $attribute, $min, $max, $exclude=false ){assert ( is_string($attribute) );assert ( is_float($min) );assert ( is_float($max) );assert ( $min<=$max );$this->_filters[] = array ( "type"=>SPH_FILTER_FLOATRANGE, "attr"=>$attribute, "exclude"=>$exclude, "min"=>$min, "max"=>$max );}/// setup anchor point for geosphere distance calculations/// required to use @geodist in filters and sorting/// latitude and longitude must be in radiansfunction SetGeoAnchor ( $attrlat, $attrlong, $lat, $long ){assert ( is_string($attrlat) );assert ( is_string($attrlong) );assert ( is_float($lat) );assert ( is_float($long) );$this->_anchor = array ( "attrlat"=>$attrlat, "attrlong"=>$attrlong, "lat"=>$lat, "long"=>$long );}/// set grouping attribute and functionfunction SetGroupBy ( $attribute, $func, $groupsort="@group desc" ){assert ( is_string($attribute) );assert ( is_string($groupsort) );assert ( $func==SPH_GROUPBY_DAY|| $func==SPH_GROUPBY_WEEK|| $func==SPH_GROUPBY_MONTH|| $func==SPH_GROUPBY_YEAR|| $func==SPH_GROUPBY_ATTR|| $func==SPH_GROUPBY_ATTRPAIR );$this->_groupby = $attribute;$this->_groupfunc = $func;$this->_groupsort = $groupsort;}/// set count-distinct attribute for group-by queriesfunction SetGroupDistinct ( $attribute ){assert ( is_string($attribute) );$this->_groupdistinct = $attribute;}/// set distributed retries count and delayfunction SetRetries ( $count, $delay=0 ){assert ( is_int($count) && $count>=0 );assert ( is_int($delay) && $delay>=0 );$this->_retrycount = $count;$this->_retrydelay = $delay;}/// set result set format (hash or array; hash by default)/// PHP specific; needed for group-by-MVA result sets that may contain duplicate IDsfunction SetArrayResult ( $arrayresult ){assert ( is_bool($arrayresult) );$this->_arrayresult = $arrayresult;}/// set attribute values override/// there can be only one override per attribute/// $values must be a hash that maps document IDs to attribute valuesfunction SetOverride ( $attrname, $attrtype, $values ){assert ( is_string ( $attrname ) );assert ( in_array ( $attrtype, array ( SPH_ATTR_INTEGER, SPH_ATTR_TIMESTAMP, SPH_ATTR_BOOL, SPH_ATTR_FLOAT, SPH_ATTR_BIGINT ) ) );assert ( is_array ( $values ) );$this->_overrides[$attrname] = array ( "attr"=>$attrname, "type"=>$attrtype, "values"=>$values );}/// set select-list (attributes or expressions), SQL-like syntaxfunction SetSelect ( $select ){assert ( is_string ( $select ) );$this->_select = $select;}///////////////////////////////////////////////////////////////////////////////// clear all filters (for multi-queries)function ResetFilters (){$this->_filters = array();$this->_anchor = array();}/// clear groupby settings (for multi-queries)function ResetGroupBy (){$this->_groupby = "";$this->_groupfunc = SPH_GROUPBY_DAY;$this->_groupsort = "@group desc";$this->_groupdistinct= "";}/// clear all attribute value overrides (for multi-queries)function ResetOverrides (){$this->_overrides = array ();}///////////////////////////////////////////////////////////////////////////////// connect to searchd server, run given search query through given indexes,/// and return the search resultsfunction Query ( $query, $index="*", $comment="" ){assert ( empty($this->_reqs) );$this->AddQuery ( $query, $index, $comment );$results = $this->RunQueries ();$this->_reqs = array (); // just in case it failed too earlyif ( !is_array($results) )return false; // probably network error; error message should be already filled$this->_error = $results[0]["error"];$this->_warning = $results[0]["warning"];if ( $results[0]["status"]==SEARCHD_ERROR )return false;elsereturn $results[0];}/// helper to pack floats in network byte orderfunction _PackFloat ( $f ){$t1 = pack ( "f", $f ); // machine orderlist(,$t2) = unpack ( "L*", $t1 ); // int in machine orderreturn pack ( "N", $t2 );}/// add query to multi-query batch/// returns index into results array from RunQueries() callfunction AddQuery ( $query, $index="*", $comment="" ){// mbstring workaround$this->_MBPush ();// build request$req = pack ( "NNNNN", $this->_offset, $this->_limit, $this->_mode, $this->_ranker, $this->_sort ); // mode and limits$req .= pack ( "N", strlen($this->_sortby) ) . $this->_sortby;$req .= pack ( "N", strlen($query) ) . $query; // query itself$req .= pack ( "N", count($this->_weights) ); // weightsforeach ( $this->_weights as $weight )$req .= pack ( "N", (int)$weight );$req .= pack ( "N", strlen($index) ) . $index; // indexes$req .= pack ( "N", 1 ); // id64 range marker$req .= sphPackU64 ( $this->_min_id ) . sphPackU64 ( $this->_max_id ); // id64 range// filters$req .= pack ( "N", count($this->_filters) );foreach ( $this->_filters as $filter ){$req .= pack ( "N", strlen($filter["attr"]) ) . $filter["attr"];$req .= pack ( "N", $filter["type"] );switch ( $filter["type"] ){case SPH_FILTER_VALUES:$req .= pack ( "N", count($filter["values"]) );foreach ( $filter["values"] as $value )$req .= sphPackI64 ( $value );break;case SPH_FILTER_RANGE:$req .= sphPackI64 ( $filter["min"] ) . sphPackI64 ( $filter["max"] );break;case SPH_FILTER_FLOATRANGE:$req .= $this->_PackFloat ( $filter["min"] ) . $this->_PackFloat ( $filter["max"] );break;default:assert ( 0 && "internal error: unhandled filter type" );}$req .= pack ( "N", $filter["exclude"] );}// group-by clause, max-matches count, group-sort clause, cutoff count$req .= pack ( "NN", $this->_groupfunc, strlen($this->_groupby) ) . $this->_groupby;$req .= pack ( "N", $this->_maxmatches );$req .= pack ( "N", strlen($this->_groupsort) ) . $this->_groupsort;$req .= pack ( "NNN", $this->_cutoff, $this->_retrycount, $this->_retrydelay );$req .= pack ( "N", strlen($this->_groupdistinct) ) . $this->_groupdistinct;// anchor pointif ( empty($this->_anchor) ){$req .= pack ( "N", 0 );} else{$a =& $this->_anchor;$req .= pack ( "N", 1 );$req .= pack ( "N", strlen($a["attrlat"]) ) . $a["attrlat"];$req .= pack ( "N", strlen($a["attrlong"]) ) . $a["attrlong"];$req .= $this->_PackFloat ( $a["lat"] ) . $this->_PackFloat ( $a["long"] );}// per-index weights$req .= pack ( "N", count($this->_indexweights) );foreach ( $this->_indexweights as $idx=>$weight )$req .= pack ( "N", strlen($idx) ) . $idx . pack ( "N", $weight );// max query time$req .= pack ( "N", $this->_maxquerytime );// per-field weights$req .= pack ( "N", count($this->_fieldweights) );foreach ( $this->_fieldweights as $field=>$weight )$req .= pack ( "N", strlen($field) ) . $field . pack ( "N", $weight );// comment$req .= pack ( "N", strlen($comment) ) . $comment;// attribute overrides$req .= pack ( "N", count($this->_overrides) );foreach ( $this->_overrides as $key => $entry ){$req .= pack ( "N", strlen($entry["attr"]) ) . $entry["attr"];$req .= pack ( "NN", $entry["type"], count($entry["values"]) );foreach ( $entry["values"] as $id=>$val ){assert ( is_numeric($id) );assert ( is_numeric($val) );$req .= sphPackU64 ( $id );switch ( $entry["type"] ){case SPH_ATTR_FLOAT: $req .= $this->_PackFloat ( $val ); break;case SPH_ATTR_BIGINT: $req .= sphPackI64 ( $val ); break;default: $req .= pack ( "N", $val ); break;}}}// select-list$req .= pack ( "N", strlen($this->_select) ) . $this->_select;// mbstring workaround$this->_MBPop ();// store request to requests array$this->_reqs[] = $req;return count($this->_reqs)-1;}/// connect to searchd, run queries batch, and return an array of result setsfunction RunQueries (){if ( empty($this->_reqs) ){$this->_error = "no queries defined, issue AddQuery() first";return false;}// mbstring workaround$this->_MBPush ();if (!( $fp = $this->_Connect() )){$this->_MBPop ();return false;}////////////////////////////// send query, get response////////////////////////////$nreqs = count($this->_reqs);$req = join ( "", $this->_reqs );$len = 4+strlen($req);$req = pack ( "nnNN", SEARCHD_COMMAND_SEARCH, VER_COMMAND_SEARCH, $len, $nreqs ) . $req; // add headerif ( !( $this->_Send ( $fp, $req, $len+8 ) ) ||!( $response = $this->_GetResponse ( $fp, VER_COMMAND_SEARCH ) ) ){$this->_MBPop ();return false;}$this->_reqs = array ();//////////////////// parse response//////////////////$p = 0; // current position$max = strlen($response); // max position for checks, to protect against broken responses$results = array ();for ( $ires=0; $ires<$nreqs && $p<$max; $ires++ ){$results[] = array();$result =& $results[$ires];$result["error"] = "";$result["warning"] = "";// extract statuslist(,$status) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$result["status"] = $status;if ( $status!=SEARCHD_OK ){list(,$len) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$message = substr ( $response, $p, $len ); $p += $len;if ( $status==SEARCHD_WARNING ){$result["warning"] = $message;} else{$result["error"] = $message;continue;}}// read schema$fields = array ();$attrs = array ();list(,$nfields) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;while ( $nfields-->0 && $p<$max ){list(,$len) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$fields[] = substr ( $response, $p, $len ); $p += $len;}$result["fields"] = $fields;list(,$nattrs) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;while ( $nattrs-->0 && $p<$max ){list(,$len) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$attr = substr ( $response, $p, $len ); $p += $len;list(,$type) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$attrs[$attr] = $type;}$result["attrs"] = $attrs;// read match countlist(,$count) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;list(,$id64) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;// read matches$idx = -1;while ( $count-->0 && $p<$max ){// index into result array$idx++;// parse document id and weightif ( $id64 ){$doc = sphUnpackU64 ( substr ( $response, $p, 8 ) ); $p += 8;list(,$weight) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;}else{list ( $doc, $weight ) = array_values ( unpack ( "N*N*",substr ( $response, $p, 8 ) ) );$p += 8;if ( PHP_INT_SIZE>=8 ){// x64 route, workaround broken unpack() in 5.2.2+if ( $doc<0 ) $doc += (1<<32);} else{// x32 route, workaround php signed/unsigned braindamage$doc = sprintf ( "%u", $doc );}}$weight = sprintf ( "%u", $weight );// create match entryif ( $this->_arrayresult )$result["matches"][$idx] = array ( "id"=>$doc, "weight"=>$weight );else$result["matches"][$doc]["weight"] = $weight;// parse and create attributes$attrvals = array ();foreach ( $attrs as $attr=>$type ){// handle 64bit intsif ( $type==SPH_ATTR_BIGINT ){$attrvals[$attr] = sphUnpackI64 ( substr ( $response, $p, 8 ) ); $p += 8;continue;}// handle floatsif ( $type==SPH_ATTR_FLOAT ){list(,$uval) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;list(,$fval) = unpack ( "f*", pack ( "L", $uval ) );$attrvals[$attr] = $fval;continue;}// handle everything else as unsigned intslist(,$val) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;if ( $type & SPH_ATTR_MULTI ){$attrvals[$attr] = array ();$nvalues = $val;while ( $nvalues-->0 && $p<$max ){list(,$val) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$attrvals[$attr][] = sprintf ( "%u", $val );}} else{$attrvals[$attr] = sprintf ( "%u", $val );}}if ( $this->_arrayresult )$result["matches"][$idx]["attrs"] = $attrvals;else$result["matches"][$doc]["attrs"] = $attrvals;}list ( $total, $total_found, $msecs, $words ) =array_values ( unpack ( "N*N*N*N*", substr ( $response, $p, 16 ) ) );$result["total"] = sprintf ( "%u", $total );$result["total_found"] = sprintf ( "%u", $total_found );$result["time"] = sprintf ( "%.3f", $msecs/1000 );$p += 16;while ( $words-->0 && $p<$max ){list(,$len) = unpack ( "N*", substr ( $response, $p, 4 ) ); $p += 4;$word = substr ( $response, $p, $len ); $p += $len;list ( $docs, $hits ) = array_values ( unpack ( "N*N*", substr ( $response, $p, 8 ) ) ); $p += 8;$result["words"][$word] = array ("docs"=>sprintf ( "%u", $docs ),"hits"=>sprintf ( "%u", $hits ) );}}$this->_MBPop ();return $results;}/////////////////////////////////////////////////////////////////////////////// excerpts generation//////////////////////////////////////////////////////////////////////////////// connect to searchd server, and generate exceprts (snippets)/// of given documents for given query. returns false on failure,/// an array of snippets on successfunction BuildExcerpts ( $docs, $index, $words, $opts=array() ){assert ( is_array($docs) );assert ( is_string($index) );assert ( is_string($words) );assert ( is_array($opts) );$this->_MBPush ();if (!( $fp = $this->_Connect() )){$this->_MBPop();return false;}/////////////////// fixup options/////////////////if ( !isset($opts["before_match"]) ) $opts["before_match"] = "<b>";if ( !isset($opts["after_match"]) ) $opts["after_match"] = "</b>";if ( !isset($opts["chunk_separator"]) ) $opts["chunk_separator"] = " ... ";if ( !isset($opts["limit"]) ) $opts["limit"] = 256;if ( !isset($opts["around"]) ) $opts["around"] = 5;if ( !isset($opts["exact_phrase"]) ) $opts["exact_phrase"] = false;if ( !isset($opts["single_passage"]) ) $opts["single_passage"] = false;if ( !isset($opts["use_boundaries"]) ) $opts["use_boundaries"] = false;if ( !isset($opts["weight_order"]) ) $opts["weight_order"] = false;/////////////////// build request/////////////////// v.1.0 req$flags = 1; // remove spacesif ( $opts["exact_phrase"] ) $flags |= 2;if ( $opts["single_passage"] ) $flags |= 4;if ( $opts["use_boundaries"] ) $flags |= 8;if ( $opts["weight_order"] ) $flags |= 16;$req = pack ( "NN", 0, $flags ); // mode=0, flags=$flags$req .= pack ( "N", strlen($index) ) . $index; // req index$req .= pack ( "N", strlen($words) ) . $words; // req words// options$req .= pack ( "N", strlen($opts["before_match"]) ) . $opts["before_match"];$req .= pack ( "N", strlen($opts["after_match"]) ) . $opts["after_match"];$req .= pack ( "N", strlen($opts["chunk_separator"]) ) . $opts["chunk_separator"];$req .= pack ( "N", (int)$opts["limit"] );$req .= pack ( "N", (int)$opts["around"] );// documents$req .= pack ( "N", count($docs) );foreach ( $docs as $doc ){assert ( is_string($doc) );$req .= pack ( "N", strlen($doc) ) . $doc;}////////////////////////////// send query, get response////////////////////////////$len = strlen($req);$req = pack ( "nnN", SEARCHD_COMMAND_EXCERPT, VER_COMMAND_EXCERPT, $len ) . $req; // add headerif ( !( $this->_Send ( $fp, $req, $len+8 ) ) ||!( $response = $this->_GetResponse ( $fp, VER_COMMAND_EXCERPT ) ) ){$this->_MBPop ();return false;}//////////////////// parse response//////////////////$pos = 0;$res = array ();$rlen = strlen($response);for ( $i=0; $i<count($docs); $i++ ){list(,$len) = unpack ( "N*", substr ( $response, $pos, 4 ) );$pos += 4;if ( $pos+$len > $rlen ){$this->_error = "incomplete reply";$this->_MBPop ();return false;}$res[] = $len ? substr ( $response, $pos, $len ) : "";$pos += $len;}$this->_MBPop ();return $res;}/////////////////////////////////////////////////////////////////////////////// keyword generation//////////////////////////////////////////////////////////////////////////////// connect to searchd server, and generate keyword list for a given query/// returns false on failure,/// an array of words on successfunction BuildKeywords ( $query, $index, $hits ){assert ( is_string($query) );assert ( is_string($index) );assert ( is_bool($hits) );$this->_MBPush ();if (!( $fp = $this->_Connect() )){$this->_MBPop();return false;}/////////////////// build request/////////////////// v.1.0 req$req = pack ( "N", strlen($query) ) . $query; // req query$req .= pack ( "N", strlen($index) ) . $index; // req index$req .= pack ( "N", (int)$hits );////////////////////////////// send query, get response////////////////////////////$len = strlen($req);$req = pack ( "nnN", SEARCHD_COMMAND_KEYWORDS, VER_COMMAND_KEYWORDS, $len ) . $req; // add headerif ( !( $this->_Send ( $fp, $req, $len+8 ) ) ||!( $response = $this->_GetResponse ( $fp, VER_COMMAND_KEYWORDS ) ) ){$this->_MBPop ();return false;}//////////////////// parse response//////////////////$pos = 0;$res = array ();$rlen = strlen($response);list(,$nwords) = unpack ( "N*", substr ( $response, $pos, 4 ) );$pos += 4;for ( $i=0; $i<$nwords; $i++ ){list(,$len) = unpack ( "N*", substr ( $response, $pos, 4 ) ); $pos += 4;$tokenized = $len ? substr ( $response, $pos, $len ) : "";$pos += $len;list(,$len) = unpack ( "N*", substr ( $response, $pos, 4 ) ); $pos += 4;$normalized = $len ? substr ( $response, $pos, $len ) : "";$pos += $len;$res[] = array ( "tokenized"=>$tokenized, "normalized"=>$normalized );if ( $hits ){list($ndocs,$nhits) = array_values ( unpack ( "N*N*", substr ( $response, $pos, 8 ) ) );$pos += 8;$res [$i]["docs"] = $ndocs;$res [$i]["hits"] = $nhits;}if ( $pos > $rlen ){$this->_error = "incomplete reply";$this->_MBPop ();return false;}}$this->_MBPop ();return $res;}function EscapeString ( $string ){$from = array ( '(',')','|','-','!','@','~','"','&', '/', '\\' );$to = array ( '\(','\)','\|','\-','\!','\@','\~','\"', '\&', '\/', '\\\\' );return str_replace ( $from, $to, $string );}/////////////////////////////////////////////////////////////////////////////// attribute updates//////////////////////////////////////////////////////////////////////////////// batch update given attributes in given rows in given indexes/// returns amount of updated documents (0 or more) on success, or -1 on failurefunction UpdateAttributes ( $index, $attrs, $values, $mva=false ){// verify everythingassert ( is_string($index) );assert ( is_bool($mva) );assert ( is_array($attrs) );foreach ( $attrs as $attr )assert ( is_string($attr) );assert ( is_array($values) );foreach ( $values as $id=>$entry ){assert ( is_numeric($id) );assert ( is_array($entry) );assert ( count($entry)==count($attrs) );foreach ( $entry as $v ){if ( $mva ){assert ( is_array($v) );foreach ( $v as $vv )assert ( is_int($vv) );} elseassert ( is_int($v) );}}// build request$req = pack ( "N", strlen($index) ) . $index;$req .= pack ( "N", count($attrs) );foreach ( $attrs as $attr ){$req .= pack ( "N", strlen($attr) ) . $attr;$req .= pack ( "N", $mva ? 1 : 0 );}$req .= pack ( "N", count($values) );foreach ( $values as $id=>$entry ){$req .= sphPackU64 ( $id );foreach ( $entry as $v ){$req .= pack ( "N", $mva ? count($v) : $v );if ( $mva )foreach ( $v as $vv )$req .= pack ( "N", $vv );}}// connect, send query, get responseif (!( $fp = $this->_Connect() ))return -1;$len = strlen($req);$req = pack ( "nnN", SEARCHD_COMMAND_UPDATE, VER_COMMAND_UPDATE, $len ) . $req; // add headerif ( !$this->_Send ( $fp, $req, $len+8 ) )return -1;if (!( $response = $this->_GetResponse ( $fp, VER_COMMAND_UPDATE ) ))return -1;// parse responselist(,$updated) = unpack ( "N*", substr ( $response, 0, 4 ) );return $updated;}/////////////////////////////////////////////////////////////////////////////// persistent connections/////////////////////////////////////////////////////////////////////////////function Open(){if ( $this->_socket !== false ){$this->_error = 'already connected';return false;}if ( !$fp = $this->_Connect() )return false;// command, command version = 0, body length = 4, body = 1$req = pack ( "nnNN", SEARCHD_COMMAND_PERSIST, 0, 4, 1 );if ( !$this->_Send ( $fp, $req, 12 ) )return false;$this->_socket = $fp;return true;}function Close(){if ( $this->_socket === false ){$this->_error = 'not connected';return false;}fclose ( $this->_socket );$this->_socket = false;return true;}}//// $Id: sphinxapi.php 1566 2008-11-17 19:06:44Z shodan $//?>