常用优化方法

任何一款数据库,在业务数据不断累积的情况下,迟早要碰到数据堆积,影响业务系统运行速度的事情。如何解决数据合理存储与使用问题,就成了技术人员必须面对的一个重要问题。

目前,对于数据库速度问题的处理,可以从横向和纵向两个方面进行技术处理,以改善数据库运行响应速度问题。

纵向的,就是在单服务器内部挖掘潜力,优化数据库操作技术细节,如以优化数据库索引,提高查询速度等方式来解决问题。但是,该方法只能局部改善问题,而且受服务器硬件整体性能所限,这方面的技术提升最终效果是有限的。

横向的,就是把本来由一台服务器处理的数据,转为分布式多服务器处理,并进行读、写分离操作,这样可以大幅度提高数据库的运行响应速度,并且横向扩展是线性的,受硬件设备本身的限制非常小。这对于具有分布使用的MongoDB数据库来说,是一个好的技术解决思路。

对于提高MongoDB的操作使用性能,顶级高手在设计阶段,就应该充分考虑数据读写操作性能,并在技术上做准确而合理的优化考虑;一般高手,通过工具主动及时发现问题,进行问题针对性的解决;水平比较低的,被动发现问题,并解决问题。

下面从设计、问题提早发现、问题针对性调优等角度,提供相应的技术解决思路。MongoDB数据库的优化内容涉及范围很广,常见的包括对单机各种命令执行的优化(如索引)、数据本身的优化、分布式读写优化等。

在数据规模还允许的情况下,先做单台服务器纵向优化措施。这是最常见的一类读写速度优化措施,也是数据库技术人员技术是否扎实的基本功之一。

一. 开启MongoDB慢命令检测功能(Database Profiler)

当电子商务平台某些功能界面操作相应慢时,对应的数据库系统是一个重点怀疑对象,但如何去检查和测试数据库相关运行命令呢?MongoDB提供了自动检测哪些命令执行太慢的方法,我们可以通过开启db.setProfilingLevel功能自动记录有问题的命令清单,并通过db.system.profile.find命令显示问题命令清单。有了问题命令清单,就可以迅速把注意力集中到问题命令上,进行针对性的测试和技术调优。

db.setProfilingLevel(level,slowms)命令参数:

  • level为指定慢命令分析级别,0为不执行该命令;1为记录慢命令,当指定时间限制时(默认为超过100毫秒),超过该时间限制范围的执行命令都将记入system.profile文件中;2为记录所有执行命令的情况。
  • slowms为可选参数,指定时间限制范围(单位:毫秒),当超过该时间范围时,所执行命令就认为是慢命令。

基本用法,在mongod上执行如下:

  1. #开启慢命令记录模式,默认超过100毫秒的执行命令,都进行记录
  2. > db.setProfilingLevel(1)
  3. #开启慢命令记录模式,设置超过限制为300毫秒
  4. > db.setProfilingLevel(1,300)

在生产环境中,开启慢命令记录后,检查一段时间,就可以使用db.system.profile.find命令查看慢命令检测记录结果

  1. > db.system.profile.find( { millis: { $gt:100 } } )
  2. # millis为系统慢命令记录日志里的时间键名(key)

为了演示,可以采用db.setProfilingLevel(2)方式,测试记录内容:

  1. > db.setProfilingLevel(2)
  2. > db.stats()
  3. > db.system.profile.find().pretty()
  4. #显示如下:
  5. {
  6. "op" : "command",
  7. "ns" : "test",
  8. "command" : {
  9. "dbstats" : 1,
  10. "scale" : undefined,
  11. "lsid" : {
  12. "id" : UUID("1258dc1d-cd38-42af-ac60-f2f34c273690")
  13. },
  14. "$clusterTime" : {
  15. "clusterTime" : Timestamp(1580479289, 1),
  16. "signature" : {
  17. "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
  18. "keyId" : NumberLong(0)
  19. }
  20. },
  21. "$db" : "test"
  22. },
  23. "numYield" : 0,
  24. "locks" : {
  25. "ParallelBatchWriterMode" : {
  26. "acquireCount" : {
  27. "r" : NumberLong(1)
  28. }
  29. },
  30. "ReplicationStateTransition" : {
  31. "acquireCount" : {
  32. "w" : NumberLong(1)
  33. }
  34. },
  35. "Global" : {
  36. "acquireCount" : {
  37. "r" : NumberLong(1)
  38. }
  39. },
  40. "Database" : {
  41. "acquireCount" : {
  42. "r" : NumberLong(1)
  43. }
  44. },
  45. "Collection" : {
  46. "acquireCount" : {
  47. "r" : NumberLong(3)
  48. }
  49. },
  50. "Mutex" : {
  51. "acquireCount" : {
  52. "r" : NumberLong(1)
  53. }
  54. }
  55. },
  56. "flowControl" : {
  57. },
  58. "responseLength" : 386,
  59. "protocol" : "op_msg",
  60. "millis" : 0,
  61. "ts" : ISODate("2020-01-31T14:01:38.820Z"),
  62. "client" : "127.0.0.1",
  63. "appName" : "MongoDB Shell",
  64. "allUsers" : [ ],
  65. "user" : ""
  66. }

慢命令记录日志内的主要参数说明如下:

  • op:记录的命令
  • ns:数据库名
  • ts:记录命令执行开始时间
  • millis:改命令执行的消耗时间(单位:毫秒)

注意
开启慢命令记录后,在服务器端会产生额外的运行开销,对服务器运行性能产生一定影响,测试完成后,必须设置db.setProfilingLevel(0)。

二. 使用explain命令分析问题语句

这里假设通过慢命令检测功能,发现了一个带索引的find()执行速度比较慢。于是进一步通过explain命令分析该命令的执行详细信息,其模拟执行过程如下:

  1. > db.books.createIndex({name:1}) #对name键建立索引,books集合已经存在若干值
  2. > db.books.createIndex({price:1}) #对price键建立索引
  3. > db.books.find({price:{$gt:30}}, {name:1}).explain("executionStats") #对find命令执行explain()分析

三. Mtools工具分析日志

Mtools工具是MongoDB技术人员用于数据库运行健康检查的一款官方工具,可以通过图形化的界面展示监控数据运行情况,及时发现问题,及时解决问题。尤其是对MongoDB日志可以进行定向分析,以便更加容易地发现问题。
地址:https://github.com/rueckstiess/mtools

四. Mongoreplay监控工具

一个强大的流量捕获和重播工具,可用于检查和记录发送MongoDB数据库命令。官方地址:https://docs.mongodb.com/manual/reference/program/mongoreplay/index.html

五. Mongostat监控工具

Mongostat是MongoDB自带的状态检查工具,在命令行下使用,它会间隔固定时间获取MongoDB的当前运行状态,并输出。如果发现MongoDB响应速度变慢或者有其他问题,可以优先考虑采用该工具,快速获取MongoDB数据库的运行状态信息。该工具主要提供大量的统计信息。

  1. #以1秒为间隔,返回20次状态数据
  2. [root@slave01 ~]# mongostat -n 20 1
  3. insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time
  4. *0 *0 *0 *0 0 1|0 0.0% 0.0% 0 1.48G 83.0M 0|0 1|0 165b 35.5k 3 Feb 1 15:53:35.523
  5. *0 *0 *0 *0 0 0|0 0.0% 0.0% 0 1.48G 83.0M 0|0 1|0 111b 35.4k 3 Feb 1 15:53:36.528
  6. *0 *0 *0 *0 0 1|0 0.0% 0.0% 0 1.48G 83.0M 0|0 1|0 112b 35.7k 3 Feb 1 15:53:37.525
  7. *0 *0 *0 *0 0 1|0 0.0% 0.0% 0 1.48G 83.0M 0|0 1|0 112b 35.7k 3 Feb 1 15:53:38.521
  8. *0 *0 *0 *0 0 0|0 0.0% 0.0% 0 1.48G 83.0M 0|0 1|0 111b 35.4k 3 Feb 1 15:53:39.525
  9. *0 *0 *0 *0 0 1|0 0.0% 0.0% 0 1.48G 83.0M 0|0 1|0 112b 35.7k 3 Feb 1 15:53:40.522

六. db.serverStaus命令

db.serverStaus也是了解MongoDB数据库运行状态的一种选择,提供磁盘使用状态、内存使用状态、连接、日志、可用的索引等信息。该命令响应迅速,不会影响MongoDB的运行性能。

  1. > db.serverStatus()
  2. {
  3. "host" : "slave01", #服务器主机名
  4. "version" : "4.2.2", #当前MongoDB数据库版本号
  5. "process" : "mongod", #进程名
  6. "pid" : NumberLong(1777), #当前进程ID号
  7. "uptime" : 528, #当前数据库已经运行时间(单位,秒)
  8. "uptimeMillis" : NumberLong(527409), #当前数据库已经运行时间(单位,毫秒)
  9. "uptimeEstimate" : NumberLong(527), #MongoDB内部系统统计正常运行时间(单位,秒)
  10. "localTime" : ISODate("2020-02-01T07:25:54.031Z"), #UTC为单位的服务器当前时间,显示年月日时间
  11. "asserts" : { #自MongoDB启动以来出现的各种错误数量报告
  12. "regular" : 0,
  13. "warning" : 0,
  14. "msg" : 0,
  15. "user" : 12,
  16. "rollovers" : 0
  17. },
  18. "connections" : { #客户端连接服务器的状态统计报告
  19. "current" : 1,
  20. "available" : 51199,
  21. "totalCreated" : 1,
  22. "active" : 1
  23. },
  24. "electionMetrics" : {
  25. "stepUpCmd" : {
  26. "called" : NumberLong(0),
  27. "successful" : NumberLong(0)
  28. },
  29. "priorityTakeover" : {
  30. "called" : NumberLong(0),
  31. "successful" : NumberLong(0)
  32. },
  33. "catchUpTakeover" : {
  34. "called" : NumberLong(0),
  35. "successful" : NumberLong(0)
  36. },
  37. "electionTimeout" : {
  38. "called" : NumberLong(0),
  39. "successful" : NumberLong(0)
  40. },
  41. "freezeTimeout" : {
  42. "called" : NumberLong(0),
  43. "successful" : NumberLong(0)
  44. },
  45. "numStepDownsCausedByHigherTerm" : NumberLong(0),
  46. "numCatchUps" : NumberLong(0),
  47. "numCatchUpsSucceeded" : NumberLong(0),
  48. "numCatchUpsAlreadyCaughtUp" : NumberLong(0),
  49. "numCatchUpsSkipped" : NumberLong(0),
  50. "numCatchUpsTimedOut" : NumberLong(0),
  51. "numCatchUpsFailedWithError" : NumberLong(0),
  52. "numCatchUpsFailedWithNewTerm" : NumberLong(0),
  53. "numCatchUpsFailedWithReplSetAbortPrimaryCatchUpCmd" : NumberLong(0),
  54. "averageCatchUpOps" : 0
  55. },
  56. "extra_info" : { #提供数据库系统底层相关信息的统计报告
  57. "note" : "fields vary by platform",
  58. "user_time_us" : NumberLong(1180422),
  59. "system_time_us" : NumberLong(3063335),
  60. "maximum_resident_set_kb" : NumberLong(82932),
  61. "input_blocks" : NumberLong(14352),
  62. "output_blocks" : NumberLong(2736),
  63. "page_reclaims" : NumberLong(8378),
  64. "page_faults" : NumberLong(2),
  65. "voluntary_context_switches" : NumberLong(18599),
  66. "involuntary_context_switches" : NumberLong(33)
  67. },
  68. "flowControl" : { #流控情况
  69. "enabled" : true,
  70. "targetRateLimit" : 1000000000,
  71. "timeAcquiringMicros" : NumberLong(20),
  72. "locksPerOp" : 0,
  73. "sustainerRate" : 0,
  74. "isLagged" : false,
  75. "isLaggedCount" : 0,
  76. "isLaggedTimeMicros" : NumberLong(0)
  77. },
  78. "freeMonitoring" : {
  79. "state" : "undecided"
  80. },
  81. "globalLock" : { #提供数据库锁定状态的详细统计报告
  82. "totalTime" : NumberLong(527407000),
  83. "currentQueue" : {
  84. "total" : 0,
  85. "readers" : 0,
  86. "writers" : 0
  87. },
  88. "activeClients" : {
  89. "total" : 0,
  90. "readers" : 0,
  91. "writers" : 0
  92. }
  93. },
  94. "locks" : { #对每个锁详细信息进行统计报告
  95. "ParallelBatchWriterMode" : {
  96. "acquireCount" : {
  97. "r" : NumberLong(32)
  98. }
  99. },
  100. "ReplicationStateTransition" : {
  101. "acquireCount" : {
  102. "w" : NumberLong(1625)
  103. }
  104. },
  105. "Global" : {
  106. "acquireCount" : {
  107. "r" : NumberLong(1609),
  108. "w" : NumberLong(12),
  109. "W" : NumberLong(4)
  110. }
  111. },
  112. "Database" : {
  113. "acquireCount" : {
  114. "r" : NumberLong(556),
  115. "w" : NumberLong(9),
  116. "W" : NumberLong(3)
  117. }
  118. },
  119. "Collection" : {
  120. "acquireCount" : {
  121. "r" : NumberLong(592),
  122. "w" : NumberLong(9)
  123. }
  124. },
  125. "Mutex" : {
  126. "acquireCount" : {
  127. "r" : NumberLong(570)
  128. }
  129. },
  130. "oplog" : {
  131. "acquireCount" : {
  132. "r" : NumberLong(528)
  133. }
  134. }
  135. },
  136. "logicalSessionRecordCache" : {
  137. "activeSessionsCount" : 1,
  138. "sessionsCollectionJobCount" : 2,
  139. "lastSessionsCollectionJobDurationMillis" : 12,
  140. "lastSessionsCollectionJobTimestamp" : ISODate("2020-02-01T07:22:07.417Z"),
  141. "lastSessionsCollectionJobEntriesRefreshed" : 1,
  142. "lastSessionsCollectionJobEntriesEnded" : 0,
  143. "lastSessionsCollectionJobCursorsClosed" : 0,
  144. "transactionReaperJobCount" : 2,
  145. "lastTransactionReaperJobDurationMillis" : 0,
  146. "lastTransactionReaperJobTimestamp" : ISODate("2020-02-01T07:22:07.417Z"),
  147. "lastTransactionReaperJobEntriesCleanedUp" : 0,
  148. "sessionCatalogSize" : 0
  149. },
  150. "network" : { #MongoDB网络使用情况统计报告
  151. "bytesIn" : NumberLong(2233),
  152. "bytesOut" : NumberLong(44003),
  153. "physicalBytesIn" : NumberLong(2233),
  154. "physicalBytesOut" : NumberLong(44003),
  155. "numRequests" : NumberLong(19),
  156. "compression" : {
  157. "snappy" : {
  158. "compressor" : {
  159. "bytesIn" : NumberLong(0),
  160. "bytesOut" : NumberLong(0)
  161. },
  162. "decompressor" : {
  163. "bytesIn" : NumberLong(0),
  164. "bytesOut" : NumberLong(0)
  165. }
  166. },
  167. "zstd" : {
  168. "compressor" : {
  169. "bytesIn" : NumberLong(0),
  170. "bytesOut" : NumberLong(0)
  171. },
  172. "decompressor" : {
  173. "bytesIn" : NumberLong(0),
  174. "bytesOut" : NumberLong(0)
  175. }
  176. },
  177. "zlib" : {
  178. "compressor" : {
  179. "bytesIn" : NumberLong(0),
  180. "bytesOut" : NumberLong(0)
  181. },
  182. "decompressor" : {
  183. "bytesIn" : NumberLong(0),
  184. "bytesOut" : NumberLong(0)
  185. }
  186. }
  187. },
  188. "serviceExecutorTaskStats" : {
  189. "executor" : "passthrough",
  190. "threadsRunning" : 1
  191. }
  192. },
  193. "opLatencies" : { #数据库操作延迟内容文档报告
  194. "reads" : {
  195. "latency" : NumberLong(0),
  196. "ops" : NumberLong(0)
  197. },
  198. "writes" : {
  199. "latency" : NumberLong(0),
  200. "ops" : NumberLong(0)
  201. },
  202. "commands" : {
  203. "latency" : NumberLong(58311),
  204. "ops" : NumberLong(18)
  205. },
  206. "transactions" : {
  207. "latency" : NumberLong(0),
  208. "ops" : NumberLong(0)
  209. }
  210. },
  211. "opReadConcernCounters" : {
  212. "available" : NumberLong(0),
  213. "linearizable" : NumberLong(0),
  214. "local" : NumberLong(0),
  215. "majority" : NumberLong(0),
  216. "snapshot" : NumberLong(0),
  217. "none" : NumberLong(3)
  218. },
  219. "opcounters" : { #自数据库启动以来分类对数据库操作进行统计报告
  220. "insert" : NumberLong(0),
  221. "query" : NumberLong(3),
  222. "update" : NumberLong(1),
  223. "delete" : NumberLong(0),
  224. "getmore" : NumberLong(0),
  225. "command" : NumberLong(23)
  226. },
  227. "opcountersRepl" : { #自数据库启动以来,分类统计复制操作并形成报告,用于副本集。
  228. "insert" : NumberLong(0),
  229. "query" : NumberLong(0),
  230. "update" : NumberLong(0),
  231. "delete" : NumberLong(0),
  232. "getmore" : NumberLong(0),
  233. "command" : NumberLong(0)
  234. },
  235. "storageEngine" : { #存储引擎
  236. "name" : "wiredTiger",
  237. "supportsCommittedReads" : true,
  238. "oldestRequiredTimestampForCrashRecovery" : Timestamp(0, 0),
  239. "supportsPendingDrops" : true,
  240. "dropPendingIdents" : NumberLong(0),
  241. "supportsSnapshotReadConcern" : true,
  242. "readOnly" : false,
  243. "persistent" : true,
  244. "backupCursorOpen" : false
  245. },
  246. "tcmalloc" : {
  247. "generic" : {
  248. "current_allocated_bytes" : 95329872,
  249. "heap_size" : 98521088
  250. },
  251. "tcmalloc" : {
  252. "pageheap_free_bytes" : 2093056,
  253. "pageheap_unmapped_bytes" : 0,
  254. "max_total_thread_cache_bytes" : 514850816,
  255. "current_total_thread_cache_bytes" : 851688,
  256. "total_free_bytes" : 1098160,
  257. "central_cache_free_bytes" : 151240,
  258. "transfer_cache_free_bytes" : 95232,
  259. "thread_cache_free_bytes" : 851688,
  260. "aggressive_memory_decommit" : 0,
  261. "pageheap_committed_bytes" : 98521088,
  262. "pageheap_scavenge_count" : 0,
  263. "pageheap_commit_count" : 51,
  264. "pageheap_total_commit_bytes" : 98521088,
  265. "pageheap_decommit_count" : 0,
  266. "pageheap_total_decommit_bytes" : 0,
  267. "pageheap_reserve_count" : 51,
  268. "pageheap_total_reserve_bytes" : 98521088,
  269. "spinlock_total_delay_ns" : 0,
  270. "formattedString" : "------------------------------------------------\nMALLOC: 95330448 ( 90.9 MiB) Bytes in use by application\nMALLOC: + 2093056 ( 2.0 MiB) Bytes in page heap freelist\nMALLOC: + 151240 ( 0.1 MiB) Bytes in central cache freelist\nMALLOC: + 95232 ( 0.1 MiB) Bytes in transfer cache freelist\nMALLOC: + 851112 ( 0.8 MiB) Bytes in thread cache freelists\nMALLOC: + 2752512 ( 2.6 MiB) Bytes in malloc metadata\nMALLOC: ------------\nMALLOC: = 101273600 ( 96.6 MiB) Actual memory used (physical + swap)\nMALLOC: + 0 ( 0.0 MiB) Bytes released to OS (aka unmapped)\nMALLOC: ------------\nMALLOC: = 101273600 ( 96.6 MiB) Virtual address space used\nMALLOC:\nMALLOC: 690 Spans in use\nMALLOC: 27 Thread heaps in use\nMALLOC: 4096 Tcmalloc page size\n------------------------------------------------\nCall ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).\nBytes released to the OS take up virtual address space but no physical memory.\n"
  271. }
  272. },
  273. "trafficRecording" : {
  274. "running" : false
  275. },
  276. "transactions" : {
  277. "retriedCommandsCount" : NumberLong(0),
  278. "retriedStatementsCount" : NumberLong(0),
  279. "transactionsCollectionWriteCount" : NumberLong(0),
  280. "currentActive" : NumberLong(0),
  281. "currentInactive" : NumberLong(0),
  282. "currentOpen" : NumberLong(0),
  283. "totalAborted" : NumberLong(0),
  284. "totalCommitted" : NumberLong(0),
  285. "totalStarted" : NumberLong(0),
  286. "totalPrepared" : NumberLong(0),
  287. "totalPreparedThenCommitted" : NumberLong(0),
  288. "totalPreparedThenAborted" : NumberLong(0),
  289. "currentPrepared" : NumberLong(0)
  290. },
  291. "transportSecurity" : {
  292. "1.0" : NumberLong(0),
  293. "1.1" : NumberLong(0),
  294. "1.2" : NumberLong(0),
  295. "1.3" : NumberLong(0),
  296. "unknown" : NumberLong(0)
  297. },
  298. "twoPhaseCommitCoordinator" : {
  299. "totalCreated" : NumberLong(0),
  300. "totalStartedTwoPhaseCommit" : NumberLong(0),
  301. "totalAbortedTwoPhaseCommit" : NumberLong(0),
  302. "totalCommittedTwoPhaseCommit" : NumberLong(0),
  303. "currentInSteps" : {
  304. "writingParticipantList" : NumberLong(0),
  305. "waitingForVotes" : NumberLong(0),
  306. "writingDecision" : NumberLong(0),
  307. "waitingForDecisionAcks" : NumberLong(0),
  308. "deletingCoordinatorDoc" : NumberLong(0)
  309. }
  310. },
  311. "wiredTiger" : { #wiredTiger引擎详细使用信息。
  312. "uri" : "statistics:",
  313. "async" : {
  314. "current work queue length" : 0,
  315. "maximum work queue length" : 0,
  316. "number of allocation state races" : 0,
  317. "number of flush calls" : 0,
  318. "number of operation slots viewed for allocation" : 0,
  319. "number of times operation allocation failed" : 0,
  320. "number of times worker found no work" : 0,
  321. "total allocations" : 0,
  322. "total compact calls" : 0,
  323. "total insert calls" : 0,
  324. "total remove calls" : 0,
  325. "total search calls" : 0,
  326. "total update calls" : 0
  327. },
  328. "block-manager" : {
  329. "blocks pre-loaded" : 12,
  330. "blocks read" : 40,
  331. "blocks written" : 63,
  332. "bytes read" : 282624,
  333. "bytes written" : 512000,
  334. "bytes written for checkpoint" : 512000,
  335. "mapped blocks read" : 0,
  336. "mapped bytes read" : 0
  337. },
  338. "cache" : {
  339. "application threads page read from disk to cache count" : 6,
  340. "application threads page read from disk to cache time (usecs)" : 41,
  341. "application threads page write from cache to disk count" : 33,
  342. "application threads page write from cache to disk time (usecs)" : 2962,
  343. "bytes belonging to page images in the cache" : 205911,
  344. "bytes belonging to the cache overflow table in the cache" : 182,
  345. "bytes currently in the cache" : 236372,
  346. "bytes dirty in the cache cumulative" : 231288,
  347. "bytes not belonging to page images in the cache" : 30460,
  348. "bytes read into cache" : 190659,
  349. "bytes written from cache" : 328519,
  350. "cache overflow cursor application thread wait time (usecs)" : 0,
  351. "cache overflow cursor internal thread wait time (usecs)" : 0,
  352. "cache overflow score" : 0,
  353. "cache overflow table entries" : 0,
  354. "cache overflow table insert calls" : 0,
  355. "cache overflow table max on-disk size" : 0,
  356. "cache overflow table on-disk size" : 0,
  357. "cache overflow table remove calls" : 0,
  358. "checkpoint blocked page eviction" : 0,
  359. "eviction calls to get a page" : 3,
  360. "eviction calls to get a page found queue empty" : 3,
  361. "eviction calls to get a page found queue empty after locking" : 0,
  362. "eviction currently operating in aggressive mode" : 0,
  363. "eviction empty score" : 0,
  364. "eviction passes of a file" : 0,
  365. "eviction server candidate queue empty when topping up" : 0,
  366. "eviction server candidate queue not empty when topping up" : 0,
  367. "eviction server evicting pages" : 0,
  368. "eviction server slept, because we did not make progress with eviction" : 0,
  369. "eviction server unable to reach eviction goal" : 0,
  370. "eviction server waiting for a leaf page" : 0,
  371. "eviction state" : 128,
  372. "eviction walk target pages histogram - 0-9" : 0,
  373. "eviction walk target pages histogram - 10-31" : 0,
  374. "eviction walk target pages histogram - 128 and higher" : 0,
  375. "eviction walk target pages histogram - 32-63" : 0,
  376. "eviction walk target pages histogram - 64-128" : 0,
  377. "eviction walk target strategy both clean and dirty pages" : 0,
  378. "eviction walk target strategy only clean pages" : 0,
  379. "eviction walk target strategy only dirty pages" : 0,
  380. "eviction walks abandoned" : 0,
  381. "eviction walks gave up because they restarted their walk twice" : 0,
  382. "eviction walks gave up because they saw too many pages and found no candidates" : 0,
  383. "eviction walks gave up because they saw too many pages and found too few candidates" : 0,
  384. "eviction walks reached end of tree" : 0,
  385. "eviction walks started from root of tree" : 0,
  386. "eviction walks started from saved location in tree" : 0,
  387. "eviction worker thread active" : 4,
  388. "eviction worker thread created" : 0,
  389. "eviction worker thread evicting pages" : 0,
  390. "eviction worker thread removed" : 0,
  391. "eviction worker thread stable number" : 0,
  392. "files with active eviction walks" : 0,
  393. "files with new eviction walks started" : 0,
  394. "force re-tuning of eviction workers once in a while" : 0,
  395. "forced eviction - pages evicted that were clean count" : 0,
  396. "forced eviction - pages evicted that were clean time (usecs)" : 0,
  397. "forced eviction - pages evicted that were dirty count" : 0,
  398. "forced eviction - pages evicted that were dirty time (usecs)" : 0,
  399. "forced eviction - pages selected because of too many deleted items count" : 0,
  400. "forced eviction - pages selected count" : 0,
  401. "forced eviction - pages selected unable to be evicted count" : 0,
  402. "forced eviction - pages selected unable to be evicted time" : 0,
  403. "hazard pointer blocked page eviction" : 0,
  404. "hazard pointer check calls" : 0,
  405. "hazard pointer check entries walked" : 0,
  406. "hazard pointer maximum array length" : 0,
  407. "in-memory page passed criteria to be split" : 0,
  408. "in-memory page splits" : 0,
  409. "internal pages evicted" : 0,
  410. "internal pages queued for eviction" : 0,
  411. "internal pages seen by eviction walk" : 0,
  412. "internal pages seen by eviction walk that are already queued" : 0,
  413. "internal pages split during eviction" : 0,
  414. "leaf pages split during eviction" : 0,
  415. "maximum bytes configured" : 1523580928,
  416. "maximum page size at eviction" : 0,
  417. "modified pages evicted" : 0,
  418. "modified pages evicted by application threads" : 0,
  419. "operations timed out waiting for space in cache" : 0,
  420. "overflow pages read into cache" : 0,
  421. "page split during eviction deepened the tree" : 0,
  422. "page written requiring cache overflow records" : 0,
  423. "pages currently held in the cache" : 26,
  424. "pages evicted by application threads" : 0,
  425. "pages queued for eviction" : 0,
  426. "pages queued for eviction post lru sorting" : 0,
  427. "pages queued for urgent eviction" : 0,
  428. "pages queued for urgent eviction during walk" : 0,
  429. "pages read into cache" : 19,
  430. "pages read into cache after truncate" : 3,
  431. "pages read into cache after truncate in prepare state" : 0,
  432. "pages read into cache requiring cache overflow entries" : 0,
  433. "pages read into cache requiring cache overflow for checkpoint" : 0,
  434. "pages read into cache skipping older cache overflow entries" : 0,
  435. "pages read into cache with skipped cache overflow entries needed later" : 0,
  436. "pages read into cache with skipped cache overflow entries needed later by checkpoint" : 0,
  437. "pages requested from the cache" : 1319,
  438. "pages seen by eviction walk" : 0,
  439. "pages seen by eviction walk that are already queued" : 0,
  440. "pages selected for eviction unable to be evicted" : 0,
  441. "pages selected for eviction unable to be evicted as the parent page has overflow items" : 0,
  442. "pages selected for eviction unable to be evicted because of active children on an internal page" : 0,
  443. "pages selected for eviction unable to be evicted because of failure in reconciliation" : 0,
  444. "pages selected for eviction unable to be evicted due to newer modifications on a clean page" : 0,
  445. "pages walked for eviction" : 0,
  446. "pages written from cache" : 34,
  447. "pages written requiring in-memory restoration" : 0,
  448. "percentage overhead" : 8,
  449. "tracked bytes belonging to internal pages in the cache" : 4672,
  450. "tracked bytes belonging to leaf pages in the cache" : 231700,
  451. "tracked dirty bytes in the cache" : 0,
  452. "tracked dirty pages in the cache" : 0,
  453. "unmodified pages evicted" : 0
  454. },
  455. "capacity" : {
  456. "background fsync file handles considered" : 0,
  457. "background fsync file handles synced" : 0,
  458. "background fsync time (msecs)" : 0,
  459. "bytes read" : 196608,
  460. "bytes written for checkpoint" : 284627,
  461. "bytes written for eviction" : 0,
  462. "bytes written for log" : 1258265984,
  463. "bytes written total" : 1258550611,
  464. "threshold to call fsync" : 0,
  465. "time waiting due to total capacity (usecs)" : 0,
  466. "time waiting during checkpoint (usecs)" : 0,
  467. "time waiting during eviction (usecs)" : 0,
  468. "time waiting during logging (usecs)" : 0,
  469. "time waiting during read (usecs)" : 0
  470. },
  471. "connection" : {
  472. "auto adjusting condition resets" : 40,
  473. "auto adjusting condition wait calls" : 3268,
  474. "detected system time went backwards" : 0,
  475. "files currently open" : 14,
  476. "memory allocations" : 22271,
  477. "memory frees" : 21286,
  478. "memory re-allocations" : 2800,
  479. "pthread mutex condition wait calls" : 8703,
  480. "pthread mutex shared lock read-lock calls" : 6405,
  481. "pthread mutex shared lock write-lock calls" : 654,
  482. "total fsync I/Os" : 51,
  483. "total read I/Os" : 1302,
  484. "total write I/Os" : 87
  485. },
  486. "cursor" : {
  487. "cached cursor count" : 13,
  488. "cursor bulk loaded cursor insert calls" : 0,
  489. "cursor close calls that result in cache" : 33,
  490. "cursor create calls" : 295,
  491. "cursor insert calls" : 16,
  492. "cursor insert key and value bytes" : 11847,
  493. "cursor modify calls" : 0,
  494. "cursor modify key and value bytes affected" : 0,
  495. "cursor modify value bytes modified" : 0,
  496. "cursor next calls" : 434,
  497. "cursor operation restarted" : 0,
  498. "cursor prev calls" : 8,
  499. "cursor remove calls" : 1,
  500. "cursor remove key bytes removed" : 22,
  501. "cursor reserve calls" : 0,
  502. "cursor reset calls" : 1218,
  503. "cursor search calls" : 1236,
  504. "cursor search near calls" : 15,
  505. "cursor sweep buckets" : 96,
  506. "cursor sweep cursors closed" : 0,
  507. "cursor sweep cursors examined" : 1,
  508. "cursor sweeps" : 16,
  509. "cursor truncate calls" : 0,
  510. "cursor update calls" : 0,
  511. "cursor update key and value bytes" : 0,
  512. "cursor update value size change" : 0,
  513. "cursors reused from cache" : 19,
  514. "open cursor count" : 22
  515. },
  516. "data-handle" : {
  517. "connection data handle size" : 432,
  518. "connection data handles currently active" : 20,
  519. "connection sweep candidate became referenced" : 0,
  520. "connection sweep dhandles closed" : 0,
  521. "connection sweep dhandles removed from hash list" : 8,
  522. "connection sweep time-of-death sets" : 41,
  523. "connection sweeps" : 52,
  524. "session dhandles swept" : 0,
  525. "session sweep attempts" : 107
  526. },
  527. "lock" : {
  528. "checkpoint lock acquisitions" : 9,
  529. "checkpoint lock application thread wait time (usecs)" : 0,
  530. "checkpoint lock internal thread wait time (usecs)" : 0,
  531. "dhandle lock application thread time waiting (usecs)" : 0,
  532. "dhandle lock internal thread time waiting (usecs)" : 0,
  533. "dhandle read lock acquisitions" : 2171,
  534. "dhandle write lock acquisitions" : 36,
  535. "durable timestamp queue lock application thread time waiting (usecs)" : 0,
  536. "durable timestamp queue lock internal thread time waiting (usecs)" : 0,
  537. "durable timestamp queue read lock acquisitions" : 0,
  538. "durable timestamp queue write lock acquisitions" : 0,
  539. "metadata lock acquisitions" : 7,
  540. "metadata lock application thread wait time (usecs)" : 0,
  541. "metadata lock internal thread wait time (usecs)" : 0,
  542. "read timestamp queue lock application thread time waiting (usecs)" : 0,
  543. "read timestamp queue lock internal thread time waiting (usecs)" : 0,
  544. "read timestamp queue read lock acquisitions" : 0,
  545. "read timestamp queue write lock acquisitions" : 0,
  546. "schema lock acquisitions" : 29,
  547. "schema lock application thread wait time (usecs)" : 0,
  548. "schema lock internal thread wait time (usecs)" : 0,
  549. "table lock application thread time waiting for the table lock (usecs)" : 0,
  550. "table lock internal thread time waiting for the table lock (usecs)" : 0,
  551. "table read lock acquisitions" : 0,
  552. "table write lock acquisitions" : 11,
  553. "txn global lock application thread time waiting (usecs)" : 0,
  554. "txn global lock internal thread time waiting (usecs)" : 0,
  555. "txn global read lock acquisitions" : 27,
  556. "txn global write lock acquisitions" : 21
  557. },
  558. "log" : {
  559. "busy returns attempting to switch slots" : 0,
  560. "force archive time sleeping (usecs)" : 0,
  561. "log bytes of payload data" : 6925,
  562. "log bytes written" : 9728,
  563. "log files manually zero-filled" : 0,
  564. "log flush operations" : 5163,
  565. "log force write operations" : 5730,
  566. "log force write operations skipped" : 5727,
  567. "log records compressed" : 6,
  568. "log records not compressed" : 1,
  569. "log records too small to compress" : 19,
  570. "log release advances write LSN" : 9,
  571. "log scan operations" : 6,
  572. "log scan records requiring two reads" : 0,
  573. "log server thread advances write LSN" : 3,
  574. "log server thread write LSN walk skipped" : 1505,
  575. "log sync operations" : 12,
  576. "log sync time duration (usecs)" : 6699,
  577. "log sync_dir operations" : 1,
  578. "log sync_dir time duration (usecs)" : 1,
  579. "log write operations" : 26,
  580. "logging bytes consolidated" : 9216,
  581. "maximum log file size" : 104857600,
  582. "number of pre-allocated log files to create" : 2,
  583. "pre-allocated log files not ready and missed" : 1,
  584. "pre-allocated log files prepared" : 2,
  585. "pre-allocated log files used" : 0,
  586. "records processed by log scan" : 13,
  587. "slot close lost race" : 0,
  588. "slot close unbuffered waits" : 0,
  589. "slot closures" : 12,
  590. "slot join atomic update races" : 0,
  591. "slot join calls atomic updates raced" : 0,
  592. "slot join calls did not yield" : 26,
  593. "slot join calls found active slot closed" : 0,
  594. "slot join calls slept" : 0,
  595. "slot join calls yielded" : 0,
  596. "slot join found active slot closed" : 0,
  597. "slot joins yield time (usecs)" : 0,
  598. "slot transitions unable to find free slot" : 0,
  599. "slot unbuffered writes" : 0,
  600. "total in-memory size of compressed records" : 11665,
  601. "total log buffer size" : 33554432,
  602. "total size of compressed records" : 6140,
  603. "written slots coalesced" : 0,
  604. "yields waiting for previous log file close" : 0
  605. },
  606. "perf" : {
  607. "file system read latency histogram (bucket 1) - 10-49ms" : 7,
  608. "file system read latency histogram (bucket 2) - 50-99ms" : 0,
  609. "file system read latency histogram (bucket 3) - 100-249ms" : 0,
  610. "file system read latency histogram (bucket 4) - 250-499ms" : 0,
  611. "file system read latency histogram (bucket 5) - 500-999ms" : 0,
  612. "file system read latency histogram (bucket 6) - 1000ms+" : 0,
  613. "file system write latency histogram (bucket 1) - 10-49ms" : 0,
  614. "file system write latency histogram (bucket 2) - 50-99ms" : 0,
  615. "file system write latency histogram (bucket 3) - 100-249ms" : 0,
  616. "file system write latency histogram (bucket 4) - 250-499ms" : 0,
  617. "file system write latency histogram (bucket 5) - 500-999ms" : 0,
  618. "file system write latency histogram (bucket 6) - 1000ms+" : 0,
  619. "operation read latency histogram (bucket 1) - 100-249us" : 0,
  620. "operation read latency histogram (bucket 2) - 250-499us" : 0,
  621. "operation read latency histogram (bucket 3) - 500-999us" : 0,
  622. "operation read latency histogram (bucket 4) - 1000-9999us" : 0,
  623. "operation read latency histogram (bucket 5) - 10000us+" : 0,
  624. "operation write latency histogram (bucket 1) - 100-249us" : 0,
  625. "operation write latency histogram (bucket 2) - 250-499us" : 0,
  626. "operation write latency histogram (bucket 3) - 500-999us" : 0,
  627. "operation write latency histogram (bucket 4) - 1000-9999us" : 0,
  628. "operation write latency histogram (bucket 5) - 10000us+" : 0
  629. },
  630. "reconciliation" : {
  631. "fast-path pages deleted" : 0,
  632. "page reconciliation calls" : 36,
  633. "page reconciliation calls for eviction" : 0,
  634. "pages deleted" : 2,
  635. "split bytes currently awaiting free" : 0,
  636. "split objects currently awaiting free" : 0
  637. },
  638. "session" : {
  639. "open session count" : 19,
  640. "session query timestamp calls" : 0,
  641. "table alter failed calls" : 0,
  642. "table alter successful calls" : 0,
  643. "table alter unchanged and skipped" : 0,
  644. "table compact failed calls" : 0,
  645. "table compact successful calls" : 0,
  646. "table create failed calls" : 0,
  647. "table create successful calls" : 1,
  648. "table drop failed calls" : 0,
  649. "table drop successful calls" : 0,
  650. "table import failed calls" : 0,
  651. "table import successful calls" : 0,
  652. "table rebalance failed calls" : 0,
  653. "table rebalance successful calls" : 0,
  654. "table rename failed calls" : 0,
  655. "table rename successful calls" : 0,
  656. "table salvage failed calls" : 0,
  657. "table salvage successful calls" : 0,
  658. "table truncate failed calls" : 0,
  659. "table truncate successful calls" : 0,
  660. "table verify failed calls" : 0,
  661. "table verify successful calls" : 0
  662. },
  663. "thread-state" : {
  664. "active filesystem fsync calls" : 0,
  665. "active filesystem read calls" : 0,
  666. "active filesystem write calls" : 0
  667. },
  668. "thread-yield" : {
  669. "application thread time evicting (usecs)" : 0,
  670. "application thread time waiting for cache (usecs)" : 0,
  671. "connection close blocked waiting for transaction state stabilization" : 0,
  672. "connection close yielded for lsm manager shutdown" : 0,
  673. "data handle lock yielded" : 0,
  674. "get reference for page index and slot time sleeping (usecs)" : 0,
  675. "log server sync yielded for log write" : 0,
  676. "page access yielded due to prepare state change" : 0,
  677. "page acquire busy blocked" : 0,
  678. "page acquire eviction blocked" : 0,
  679. "page acquire locked blocked" : 0,
  680. "page acquire read blocked" : 0,
  681. "page acquire time sleeping (usecs)" : 0,
  682. "page delete rollback time sleeping for state change (usecs)" : 0,
  683. "page reconciliation yielded due to child modification" : 0
  684. },
  685. "transaction" : {
  686. "Number of prepared updates" : 0,
  687. "Number of prepared updates added to cache overflow" : 0,
  688. "durable timestamp queue entries walked" : 0,
  689. "durable timestamp queue insert to empty" : 0,
  690. "durable timestamp queue inserts to head" : 0,
  691. "durable timestamp queue inserts total" : 0,
  692. "durable timestamp queue length" : 0,
  693. "number of named snapshots created" : 0,
  694. "number of named snapshots dropped" : 0,
  695. "prepared transactions" : 0,
  696. "prepared transactions committed" : 0,
  697. "prepared transactions currently active" : 0,
  698. "prepared transactions rolled back" : 0,
  699. "query timestamp calls" : 527,
  700. "read timestamp queue entries walked" : 0,
  701. "read timestamp queue insert to empty" : 0,
  702. "read timestamp queue inserts to head" : 0,
  703. "read timestamp queue inserts total" : 0,
  704. "read timestamp queue length" : 0,
  705. "rollback to stable calls" : 0,
  706. "rollback to stable updates aborted" : 0,
  707. "rollback to stable updates removed from cache overflow" : 0,
  708. "set timestamp calls" : 0,
  709. "set timestamp durable calls" : 0,
  710. "set timestamp durable updates" : 0,
  711. "set timestamp oldest calls" : 0,
  712. "set timestamp oldest updates" : 0,
  713. "set timestamp stable calls" : 0,
  714. "set timestamp stable updates" : 0,
  715. "transaction begins" : 35,
  716. "transaction checkpoint currently running" : 0,
  717. "transaction checkpoint generation" : 8,
  718. "transaction checkpoint max time (msecs)" : 11,
  719. "transaction checkpoint min time (msecs)" : 2,
  720. "transaction checkpoint most recent time (msecs)" : 2,
  721. "transaction checkpoint scrub dirty target" : 0,
  722. "transaction checkpoint scrub time (msecs)" : 0,
  723. "transaction checkpoint total time (msecs)" : 35,
  724. "transaction checkpoints" : 9,
  725. "transaction checkpoints skipped because database was clean" : 2,
  726. "transaction failures due to cache overflow" : 0,
  727. "transaction fsync calls for checkpoint after allocating the transaction ID" : 7,
  728. "transaction fsync duration for checkpoint after allocating the transaction ID (usecs)" : 0,
  729. "transaction range of IDs currently pinned" : 0,
  730. "transaction range of IDs currently pinned by a checkpoint" : 0,
  731. "transaction range of IDs currently pinned by named snapshots" : 0,
  732. "transaction range of timestamps currently pinned" : 0,
  733. "transaction range of timestamps pinned by a checkpoint" : 0,
  734. "transaction range of timestamps pinned by the oldest active read timestamp" : 0,
  735. "transaction range of timestamps pinned by the oldest timestamp" : 0,
  736. "transaction read timestamp of the oldest active reader" : 0,
  737. "transaction sync calls" : 0,
  738. "transactions committed" : 4,
  739. "transactions rolled back" : 31,
  740. "update conflicts" : 0
  741. },
  742. "concurrentTransactions" : {
  743. "write" : {
  744. "out" : 0,
  745. "available" : 128,
  746. "totalTickets" : 128
  747. },
  748. "read" : {
  749. "out" : 1,
  750. "available" : 127,
  751. "totalTickets" : 128
  752. }
  753. },
  754. "snapshot-window-settings" : {
  755. "cache pressure percentage threshold" : 95,
  756. "current cache pressure percentage" : NumberLong(0),
  757. "total number of SnapshotTooOld errors" : NumberLong(0),
  758. "max target available snapshots window size in seconds" : 5,
  759. "target available snapshots window size in seconds" : 5,
  760. "current available snapshots window size in seconds" : 0,
  761. "latest majority snapshot timestamp available" : "Jan 1 08:00:00:0",
  762. "oldest majority snapshot timestamp available" : "Jan 1 08:00:00:0"
  763. }
  764. },
  765. "mem" : { #对MongoDB和当前内存使用情况的统计报告
  766. "bits" : 64,
  767. "resident" : 80,
  768. "virtual" : 1513,
  769. "supported" : true
  770. },
  771. "metrics" : { #当前MongoDB实例运行的各种状态的统计报告
  772. "commands" : {
  773. "buildInfo" : {
  774. "failed" : NumberLong(0),
  775. "total" : NumberLong(3)
  776. },
  777. "dbStats" : {
  778. "failed" : NumberLong(0),
  779. "total" : NumberLong(1)
  780. },
  781. "find" : {
  782. "failed" : NumberLong(0),
  783. "total" : NumberLong(3)
  784. },
  785. "getCmdLineOpts" : {
  786. "failed" : NumberLong(0),
  787. "total" : NumberLong(1)
  788. },
  789. "getFreeMonitoringStatus" : {
  790. "failed" : NumberLong(0),
  791. "total" : NumberLong(1)
  792. },
  793. "getLog" : {
  794. "failed" : NumberLong(0),
  795. "total" : NumberLong(1)
  796. },
  797. "isMaster" : {
  798. "failed" : NumberLong(0),
  799. "total" : NumberLong(5)
  800. },
  801. "listCollections" : {
  802. "failed" : NumberLong(0),
  803. "total" : NumberLong(3)
  804. },
  805. "listIndexes" : {
  806. "failed" : NumberLong(0),
  807. "total" : NumberLong(4)
  808. },
  809. "replSetGetStatus" : {
  810. "failed" : NumberLong(1),
  811. "total" : NumberLong(1)
  812. },
  813. "serverStatus" : {
  814. "failed" : NumberLong(0),
  815. "total" : NumberLong(2)
  816. },
  817. "update" : {
  818. "failed" : NumberLong(0),
  819. "total" : NumberLong(1)
  820. },
  821. "whatsmyuri" : {
  822. "failed" : NumberLong(0),
  823. "total" : NumberLong(1)
  824. }
  825. },
  826. "cursor" : {
  827. "timedOut" : NumberLong(0),
  828. "open" : {
  829. "noTimeout" : NumberLong(0),
  830. "pinned" : NumberLong(0),
  831. "total" : NumberLong(0)
  832. }
  833. },
  834. "document" : {
  835. "deleted" : NumberLong(0),
  836. "inserted" : NumberLong(0),
  837. "returned" : NumberLong(0),
  838. "updated" : NumberLong(0)
  839. },
  840. "getLastError" : {
  841. "wtime" : {
  842. "num" : 0,
  843. "totalMillis" : 0
  844. },
  845. "wtimeouts" : NumberLong(0)
  846. },
  847. "operation" : {
  848. "scanAndOrder" : NumberLong(0),
  849. "writeConflicts" : NumberLong(0)
  850. },
  851. "query" : {
  852. "planCacheTotalSizeEstimateBytes" : NumberLong(0),
  853. "updateOneOpStyleBroadcastWithExactIDCount" : NumberLong(0)
  854. },
  855. "queryExecutor" : {
  856. "scanned" : NumberLong(0),
  857. "scannedObjects" : NumberLong(0)
  858. },
  859. "record" : {
  860. "moves" : NumberLong(0)
  861. },
  862. "repl" : {
  863. "executor" : {
  864. "pool" : {
  865. "inProgressCount" : 0
  866. },
  867. "queues" : {
  868. "networkInProgress" : 0,
  869. "sleepers" : 0
  870. },
  871. "unsignaledEvents" : 0,
  872. "shuttingDown" : false,
  873. "networkInterface" : "DEPRECATED: getDiagnosticString is deprecated in NetworkInterfaceTL"
  874. },
  875. "apply" : {
  876. "attemptsToBecomeSecondary" : NumberLong(0),
  877. "batchSize" : NumberLong(0),
  878. "batches" : {
  879. "num" : 0,
  880. "totalMillis" : 0
  881. },
  882. "ops" : NumberLong(0)
  883. },
  884. "buffer" : {
  885. "count" : NumberLong(0),
  886. "maxSizeBytes" : NumberLong(0),
  887. "sizeBytes" : NumberLong(0)
  888. },
  889. "initialSync" : {
  890. "completed" : NumberLong(0),
  891. "failedAttempts" : NumberLong(0),
  892. "failures" : NumberLong(0)
  893. },
  894. "network" : {
  895. "bytes" : NumberLong(0),
  896. "getmores" : {
  897. "num" : 0,
  898. "totalMillis" : 0
  899. },
  900. "notMasterLegacyUnacknowledgedWrites" : NumberLong(0),
  901. "notMasterUnacknowledgedWrites" : NumberLong(0),
  902. "ops" : NumberLong(0),
  903. "readersCreated" : NumberLong(0)
  904. },
  905. "stepDown" : {
  906. "userOperationsKilled" : NumberLong(0),
  907. "userOperationsRunning" : NumberLong(0)
  908. }
  909. },
  910. "ttl" : {
  911. "deletedDocuments" : NumberLong(0),
  912. "passes" : NumberLong(8)
  913. }
  914. },
  915. "ok" : 1
  916. }

七. db.stats命令

db.status查看一个数据库实例的运行状态。

  1. > db.stats()
  2. {
  3. "db" : "test", #数据库名称
  4. "collections" : 5, #集合数量
  5. "views" : 0,
  6. "objects" : 14, #文档对象的个数,所有集合的记录数之和
  7. "avgObjSize" : 87.14285714285714, #平均每个对象的大小,通过dataSize/objects得到
  8. "dataSize" : 1220, #当前库所有集合的数据大小
  9. "storageSize" : 155648, #磁盘存储大小
  10. "numExtents" : 0, #所有集合的扩展数据量统计数
  11. "indexes" : 9, #已建立索引数量
  12. "indexSize" : 229376, #索引大小
  13. "scaleFactor" : 1,
  14. "fsUsedSize" : 5491126272,
  15. "fsTotalSize" : 18238930944,
  16. "ok" : 1
  17. }

索引查询及优化

在大规模数据存储及查询使用情况下,索引是影响数据库数据使用性能的一个重要技术因素。索引使用成功,可以成倍地提高检索速度;索引使用失败,则会延迟数据库的响应性能,甚至导致业务系统无法正常使用。

1. 注意索引数据对象

索引主要通过对数据对象事先建立排序顺序,以快速实现相关数据的检索和读取。索引建立后查询速度的快慢,与建立索引字段的值的颗粒数紧密相关。在索引字段值中每个数据的重复数量称为颗粒,也叫做索引的基数。基数占整列值中的比重越大,索引效率越低。

假设在一个名叫GoodsInf的集合里,有如下特征的商品记录1万条,在不同键上所建立的索引,效率会明显不一样。假设文档记录中,name键对应的值都不一样,price里的值为50的重复率达到了30%,press值全部重复,那么它们的基数占列数的百分比为:

  • _id,0.01%(万分之一),每条都不重复(这是_id键值对的特点)
  • name,0.01%(万分之一),每条都不重复(数据输入后产生的特点)
  • price,30%(万分之三千),近三分之一的重复率(数据输入后产生的特点)
  • press,100%,1万条都重复(数据输入后产生的特点)
  1. { "_id" : ObjectId("5e353bed68e115bedd289f93"), "name" : "C语言", "price" : 32.5, "press" : "水利水电出版社" }
  2. { "_id" : ObjectId("5e353bfe68e115bedd289f94"), "name" : "B语言", "price" : 50, "press" : "水利水电出版社" }
  3. { "_id" : ObjectId("5e353c1068e115bedd289f95"), "name" : "D语言", "price" : 40, "press" : "水利水电出版社" }
  4. { "_id" : ObjectId("5e353c1d68e115bedd289f96"), "name" : "E语言", "price" : 50, "press" : "水利水电出版社" }
  5. { "_id" : ObjectId("5e353c2f68e115bedd289f97"), "name" : "G语言", "price" : 50, "press" : "水利水电出版社" }
  6. { "_id" : ObjectId("5e353c3c68e115bedd289f98"), "name" : "T语言", "price" : 35, "press" : "水利水电出版社" }
  7. { "_id" : ObjectId("5e353c4968e115bedd289f99"), "name" : "F语言", "price" : 36.1, "press" : "水利水电出版社" }
  8. { "_id" : ObjectId("5e353c5668e115bedd289f9a"), "name" : "H语言", "price" : 29, "press" : "水利水电出版社" }
  9. { "_id" : ObjectId("5e353c6568e115bedd289f9b"), "name" : "I语言", "price" : 50, "press" : "水利水电出版社" }
  10. { "_id" : ObjectId("5e353c7e68e115bedd289f9c"), "name" : "J语言", "price" : 50, "press" : "水利水电出版社" }
  11. ...

若要快速查找数据,那么_id和name带索引的字段最快,因为它们的基数占整列比重最小;press带索引情况下最慢,若查找其值,就要把一万条记录都扫描一遍;price带索引情况下,查找price=50的速度介于最快和最慢之间。

根据实际使用经验,在一整列值的基数高度重复的情况下,建立索引意义不大,甚至会影响数据库的读写性能。所以,不应对press这样特征的键值对建立索引,而要对name键建立。利用字段的基数只能对索引的使用做大概参考建议,若要确定一个字段建立索引后,到底对系统响应性能产生好的影响还是坏的影响,最好用Explain()等命令进行对比测试,再做出选择。一般来说,基数超过30%,查询性能会明显下降,所以不要把这种字段作为索引字段。

2. 用Explain()做对比测试

对于在一个集合的某一个字段上是否建立索引,最好的方法是用Explain()命令做对比测试。

接着前面的例子,如果想知道对price是建立索引的查询速度快,还是不建立索引查询速度快?

  • 第一步,在没有索引的情况下,查询该字段的price=50的值,并利用Explain()获取第一次执行的速度

    1. > db.GoodsInf.find({price:50}).explain("executionStats")
  • 第二步,先为price字段建立一个索引,然后查询该字段的price=50的值,并利用Explain()获取第二次执行的速度。

    1. > db.GoodsInf.createIndex({price:1})
    2. > db.GoodsInf.find({price:50}).explain("executionStats")

比较2次用时,就可以做出合理的判断,利用Explain()做命令执行速度对比,是一种良好的技术优化手段,可靠性强,是数据库技术人员重视的性能调优方法。

3. 慎用索引功能

为了提高查询性能,并不是索引建立越多越好,而是根据实际需要,慎重建立相关索引,并进行性能模拟测试。

因为,在实际生产过程,有些系统响应速度变的很糟糕,不是由于没有建立索引而引起的,而是由于建立了不合适的索引字段而导致的读写性能急剧下降。

如果建立索引的字段索引基数过大、数据规模超过建立索引的最佳值范围(假设超过一千万条文档,会导致索引开销本身很庞大)、对于修改频繁的字段建立索引可能是件糟糕的事情(在大规模数据的情况下,会导致重新建立索引过程开销变大)等。索引在建立字段索引时要谨慎,在满足操作要求的情况下,一个集合里的索引数量越少越好。

4. 查询只返回需要的字段

假设一个集合有十几个字段,而业务应用时,只需要返回3个字段,那么在具体的代码编写过程,就让返回3个字段,而不能偷懒让全部字段都返回。因为全部字段都返回,会引起网络通信的额外流量开销,在高并发访问的情况下,将引起不必要的网络拥堵问题。例如:

  1. > db.GoodsInf.find({price:50}) #返回集合文档的所有字段
  2. { "_id" : ObjectId("5e353bfe68e115bedd289f94"), "name" : "B语言", "price" : 50, "press" : "水利水电出版社" }
  3. { "_id" : ObjectId("5e353c1d68e115bedd289f96"), "name" : "E语言", "price" : 50, "press" : "水利水电出版社" }
  4. { "_id" : ObjectId("5e353c2f68e115bedd289f97"), "name" : "G语言", "price" : 50, "press" : "水利水电出版社" }
  5. { "_id" : ObjectId("5e353c6568e115bedd289f9b"), "name" : "I语言", "price" : 50, "press" : "水利水电出版社" }
  6. { "_id" : ObjectId("5e353c7e68e115bedd289f9c"), "name" : "J语言", "price" : 50, "press" : "水利水电出版社" }
  7. > db.GoodsInf.find({price:50},{name:1,_id:0}) #显示指出需要返回的字段
  8. { "name" : "B语言" }
  9. { "name" : "E语言" }
  10. { "name" : "G语言" }
  11. { "name" : "I语言" }
  12. { "name" : "J语言" }
  13. > db.GoodsInf.find({price:50},{name:1,_id:0}).limit(3) #最多返回3条
  14. { "name" : "B语言" }
  15. { "name" : "E语言" }
  16. { "name" : "G语言" }

5. 使用$inc函数实现服务器端字段值的增减操作

对数值型的更新操作,能用inc函数的,就不用inc函数的,就不用set函数,因为用$inc函数更新速度快。

  1. > db.GoodsInf.updateOne({name:"C语言"},{$set: {"price": 35 }})
  2. #在同样修改操作情况下,能用下面的执行代码,就不用上面的修改代码
  3. > db.GoodsInf.updateOne({name:"C语言"},{$inc: {"price": 25 }})

6. or和o*r和and使用要点

在相关语句进行多条件匹配操作时,要注意or和or和and的使用方法,以提高条件检索速度。or和or和and条件组合符号,对多条件顺序安排是有要求的,良好的条件顺序安排可以提高检索速度,差的检索条件组合会影响响应性能。

$or 在安排多条件组合检索时,把匹配最多的字段放在最左边位置,匹配第二多的放左边第二的位置,依次类推,这样做可以提高检索速度。

  1. > db.GoodsInf.find({ $or: [ {price:50},{name:"C语言"}]})
  2. { "_id" : ObjectId("5e353bed68e115bedd289f93"), "name" : "C语言", "price" : 75, "press" : "水利水电出版社" }
  3. { "_id" : ObjectId("5e353bfe68e115bedd289f94"), "name" : "B语言", "price" : 50, "press" : "水利水电出版社" }
  4. { "_id" : ObjectId("5e353c1d68e115bedd289f96"), "name" : "E语言", "price" : 50, "press" : "水利水电出版社" }
  5. { "_id" : ObjectId("5e353c2f68e115bedd289f97"), "name" : "G语言", "price" : 50, "press" : "水利水电出版社" }
  6. { "_id" : ObjectId("5e353c6568e115bedd289f9b"), "name" : "I语言", "price" : 50, "press" : "水利水电出版社" }
  7. { "_id" : ObjectId("5e353c7e68e115bedd289f9c"), "name" : "J语言", "price" : 50, "press" : "水利水电出版社" }

$and 在安排多条件组合检索时,把匹配最少的字段放在最左边位置,匹配第二少的放左边第二的位置,依次类推,这样做可以提高检索速度。

  1. > db.GoodsInf.find({ $and: [ {name:"B语言"},{"price":50} ]})
  2. { "_id" : ObjectId("5e353bfe68e115bedd289f94"), "name" : "B语言", "price" : 50, "press" : "水利水电出版社" }

or和o*r和and要求这样做是由它们的检索算法决定的。

MongoDB 读写分离

通过MongoDB 数据库系统建立副本集节点,实现主节点数据异步复制到从节点后,会大幅提高数据的安全性和可操作性。但是在默认安装方式下,客户端应用程序只对主节点实现读、写操作,不会直接对从节点进行读写操作。

当主节点读写压力增大后,在一些应用场景,技术人员准备把部分读数据的任务分配给从节点,让主节点专注于写操作,这样可以进一步提高对客户应用程序的响应性能。主节点专注于数据的写入,并把数据复制到从节点,然后从节点为客户应用程序提供数据读支持,这样真正实现了读写分离,提高了服务器节点之间的有效配合和使用效率。

在实际应用中,必须考虑技术实现的要求,才能针对性地应用到合适的业务使用场景。

1. MongoDB所支持的读优化模式

MongoDB提供5种读操作的优化模式

  • primary: 默认规则,所有读请求发到Primary,也就是读写都在主节点上操作;
  • primaryPreferred: Primary优先,如果Primary不可达,请求Secondary;这种模式可以最大化保证可用性,但是读响应性能提高不会太明显,当从从节点读取数据时,存在读取数据不一致的可能性;
  • secondary: 所有的读请求都发到secondary,这是读写分离最彻底的设置模式,可以大幅度提高读的响应性能,但是由于复制延迟的原因,存在读取数据不一致的可能性;
  • secondaryPreferred:Secondary优先,当所有Secondary不可达时,请求Primary,存在读取数据不一致的可能性;
  • nearest:读请求发送到最小网络延迟的可达节点上(通过ping探测得出最近的节点),这意味着也许从主节点读取数据,也许从从节点读取。该模式保证读取数据时延迟时间最小,但是不能保证数据的一致性。

MongoDB的客户端驱动程序或语言所提供的API接口,都支持对上述模式的读优化操作功能,如C、C++、Java、Python、Go、Lua等。

2. MongoDB读写分离操作

根据MongoDB 官网提供的资料,对读写方式的控制主要通过应用程序端代码的编写进行灵活控制,这里控制的地方可以是建立数据库连接时进行,可以只指定对一个数据库文件进行读取,也可以只指定某一个读命令。另外,也可以通过MongoDB shell平台来实现对相关节点的数据读写操作。

在利用代码或shell平台实现读写分离操作之前,应该先建立好MongoDB集群副本集,至少一个主节点、两个从节点。

2.1 shell平台读操作

通过MongoDB shell平台,在主节点上执行如下代码命令,则查询的是当前主节点的数据库集合内容。

  1. rs0:PRIMARY> db.books.find()
  2. { "_id" : ObjectId("5e33e15836a42c3aa797c72b"), "name" : "C语言基础", "price" : 30 }
  3. { "_id" : ObjectId("5e33e3ed36a42c3aa797c72c"), "name" : "python语言基础", "price" : 45 }

若在从节点执行以上命令,则会出错:

  1. rs0:SECONDARY> db.books.find()
  2. Error: error: {
  3. "operationTime" : Timestamp(1580568822, 1),
  4. "ok" : 0,
  5. "errmsg" : "not master and slaveOk=false",
  6. "code" : 13435,
  7. "codeName" : "NotMasterNoSlaveOk",
  8. "$clusterTime" : {
  9. "clusterTime" : Timestamp(1580568822, 1),
  10. "signature" : {
  11. "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
  12. "keyId" : NumberLong(0)
  13. }
  14. }
  15. }

为了使从节点数据可读,可以在从节点上执行以下命令:

  1. rs0:SECONDARY> db.getMongo().setSlaveOk()
  2. rs0:SECONDARY> db.books.find()
  3. { "_id" : ObjectId("5e33e15836a42c3aa797c72b"), "name" : "C语言基础", "price" : 30 }
  4. { "_id" : ObjectId("5e33e3ed36a42c3aa797c72c"), "name" : "python语言基础", "price" : 45 }

或者

  1. rs0:SECONDARY> rs.slaveOk()
  2. rs0:SECONDARY> db.books.find()
  3. { "_id" : ObjectId("5e33e15836a42c3aa797c72b"), "name" : "C语言基础", "price" : 30 }
  4. { "_id" : ObjectId("5e33e3ed36a42c3aa797c72c"), "name" : "python语言基础", "price" : 45 }

另外也可直接在find()命令后加readPref()子命令来确定对从节点的查询方法。

  1. > db.books.find().readPref("secondary")

readPref(mode,tagset)的第一个参数mode值为primary、primaryPreferred、secondary、secondaryPreferred或nearest。

第二个可选参数tagset是一种存放访问路径的集合,查询特定内容的从节点。

2.2 异步复制带来的问题

实现了读写分离的设想,但是从节点获取的数据存在数据不一致性,这与MongoDB分布式异步复制原理是相关的。需要准确掌握异步复制的特点,把读写分离用到合适的业务应用场景上,而不能用到不该用的地方。

2.2.1 异步复制原理

这里假设已经建立一个主节点A,两个从节点(B、C)的MongoDB副本集。

  1. 业务系统从客户端发送一条写入命令,如电子商务平台里的商品评论信息,然后先写入主节点A,A同步把新增的该命令数据写入A节点自己的Oplog.rs封顶集合中;
  2. 接着,从节点B和C通过心跳功能,得到主节点A数据新增消息,就向主节点提交数据复制请求;
  3. 主节点把自己记录的Oplog记录分别复制给B、C从节点本地的Oplog.rs封顶集合;
  4. 从节点将Oplog.rs封顶集合里的最新数据刷新到从节点的数据库中,完成一次数据复制过程

Oplog(operations log)是一个特殊的封顶集合(capped collection,一个固定大小的集合,当它达到其最大大小时自动覆盖其最旧的条目。),它保存修改数据库中存储的数据的所有操作的滚动记录。

Oplog日志集合在MongoDB第一次启动副本集时自动生成,在Unix和Windows系统上,默认大小如下:

Storage Engine Default Oplog Size Lower Bound Upper Bound
In-Memory Storage Engine 5% of physical memory 50 MB 50 GB
WiredTiger Storage Engine 5% of free disk space 990 MB 50 GB

为了提高插入速度,Oplog采用Capped Collection技术,实现数据记录大小的固定,超过限制上限后,新插入的数据将覆盖最旧的数据。

Oplog日志存放位置:replica sets架构下,local.oplog.rs

2.2.2 存在问题

问题1:当主节点A向从节点B、C复制过程,受心跳通知影响及数据传输过程的网络状态影响,会产生一定的时间延迟,在这个延迟期间,就会导致主从数据不一致,这个问题是分布式通信本身机制造成的问题,严格来说,所有分布式的数据传输都是异步的,存在时间差。

问题2:从Oplog日志刷新数据到本地数据库过程,也存在一定延迟(默认60秒,可以通过系统配置文件来提高刷新速度,但是会降低系统响应性能)。若发生从节点从本地Oplog日志刷新到本地数据库过程还没结束,Oplog日志记录已经轮滚了一次,那么从节点本地数据库刷新将会跟不上主节点的复制,复制将会停止(可以用db.runCommand({“resync”:1})命令强制重启复制)。这将产生严重的数据不一致性问题。可以采取一些措施,预防该问题发生。

  • 在第一次建立副本集时,指定合适大小Oplog空间。

    1. #配置文件中指定
    2. replication:
    3. oplogSizeMB: <int> #指定合适的大小
    4. replSetName: <string>
    5. secondaryIndexPrefetch: <string>
    6. enableMajorityReadConcern: <boolean>
  • 在MongoDB生产运行中从新设置Oplog的大小。

    • 先查看当前Oplog大小,单位是字节

      1. use local
      2. db.oplog.rs.stats().maxSize
    • 重设大小,不用重启即生效,单位为M

      1. db.adminCommand({replSetResizeOplog:1, size: 16384})

问题3:当主节点发生故障时,从Oplog复制最新的从节点会被投票选为新的主节点,但是如果存在原先主节点Oplog还有部分内容没有复制到从节点的情况下,当原先主节点故障恢复后,会回滚自己的Oplog数据,以与新主节点Oplog数据保持一致,这个过程是副本集自动切换实现的,因此,将导致一部分数据的丢失。要预防该问题的主要方法如下:

  • 在客户端进行写操作时,利用write concern规则保证写到大多数从节点成功,以避免数据回滚问题的发生。
    利用写命令直接限制:
    另外一种方法,在主节点的配置文件里设置,这样,所有的插入操作都要求要么复制到大多数从节点,要么提醒插入失败,以保证数据的一致性。

    1. db.products.insert(
    2. { item: "envelopes", qty : 100, type: "Clasp" },
    3. { writeConcern: { w: "majority" , wtimeout: 5000 } }
    4. )
    1. > rsconfig=rs.conf()
    2. > rsconfig.settings.getLastErrorDefaults = { w: "majority", wtimeout: 5000}
    3. > rs.reconfig(rsconfig)
  • 旧primary将回滚的数据写到单独的rollback目录下,数据库技术人员可根据需要使用mongorestore进行恢复。

问题4:生产环境下扩充从节点,导致主节点响应性能急剧下降。当一个全新的从节点加入MongoDB副本集时,新的从节点数据库里的数据库是空的,需要通过复制主节点获取数据,以保证数据的主从一致。这将导致主节点数据库里的数据大量复制到该新从节点,从而产生大量的网络数据流,并加大主节点的读写负担,严重时,会影响业务系统对主节点的访问性能,而这样的复制过程可能持续几个小时甚至几天。为了避免该问题的产生,在新节点接入生产环境下的副本集之前,可以采用以下方法:

  • 方法一:通过人工复制主节点的数据到新的从节点(单机状态),然后接入副本集;
  • 方法二:利用时间差来同步数据,如利用晚上时间,把新的从节点接入副本集,利用其的初始化机制(Initial Sync),自动同步数据。

2.3 读写分离应用场景

总体来说,只能应用到对数据一致性要求不是非常高的应用场景上,也就是能容忍少数数据的不一致性的业务场景上。

2.3.1 一致性不敏感的数据

具体如大型电子商务平台,存在大量的商品点评信息,在业务上来说,如果从从节点读取某商品的点评信息,发生少数读取数据不一致问题,对于访问信息的顾客是可以容忍的。因为他们感觉不到数据的不一致性,另外对一个商品获得20个点评信息和获取19个点评信息,对顾客来说是差不多的,对顾客更重要的是能快速获取点评信息。

2.3.2 后台统计分析数据

如大型电子商务平台需要定期分析商品浏览排行信息,而该信息作为电子商务平台运行商或平台上的商家所关心的数据。对于该类数据,运行商或商家不会太在意一个商品被点击了50下还是49下,他们更关心商品点击总数的变化和趋势。如果某个商品点击趋势走高,同时该商品的库存下降很快,那么相关决策者就从该数据中得到了有价值的信息,他们可以提早增加存货量,以加快发货速度,甚至可以考虑提高零售价格,因为商品已经供不应求了。在这样的情况下,通过从数据读取并分析数据,将是一个合理的设计。

2.3.3 写入不频繁,读高负荷的场景

如在典型电子商务平台,一些常用商品基本信息,一旦写入完成,后续修改等操作很少,处于相对固定不变的状态。这类数据,同时要承担每天几十万、几百万、甚至几千万人次的访问,访问压力很大,在这样的情况下,采用读写分离是合理的。可以大幅减轻主节点的压力,同时不一致问题出现的可能性非常小(已经在从节点了,从节点到主节点进行复制操作的频度不高)。