1. 这周看的 第一部分 - 宏观看分布式系统 [Distributed stystems at a high level](http://book.mixu.net/distsys/intro.html)
  2. 单机系统受到很多限制,如果当现实/业务问题到达一定规模时候(升级硬件、钱、时间都是问题),分布式系统是个不错的方案。那么,使用商用的、中等的硬件提供服务,是个不错的选择。如果用经济学的话来说,就是边际效应。

使用分布式系统想达到什么目的/指标?有什么好处?

Scalability

具体的展现为 节点规模数量可扩展性、地理可扩展性、管理成本可扩展性

Size scalability: adding more nodes should make the system linearly faster; growing the dataset should not increase latency Geographic scalability: it should be possible to use multiple data centers to reduce the time it takes to respond to user queries, while dealing with cross-data center latency in some sensible manner. Administrative scalability: adding more nodes should not increase the administrative costs of the system (e.g. the administrators-to-machines ratio).

Performance(and latency)

具体展现为 响应时长更短、更大的吞吐量、计算资源利用率低

Short response time/low latency for a given piece of work High throughput (rate of processing work) Low utilization of computing resource(s)

Availability (and fault tolerance)

具体展现为 可用性,不能挂(宕机)

Availability = uptime / (uptime + downtime)

使用分布式系统当然也遇到各种条件限制。

现实因素

节点数量 和 传输距离(光速并不能瞬间到达,也需要时间)

the number of nodes (which increases with the required storage and computation capacity) the distance between nodes (information travels, at best, at the speed of light)

具体限制

节点数量加了可能宕机、机器沟通成本增加、分区&物理距离的问题

an increase in the number of independent nodes increases the probability of failure in a system (reducing availability and increasing administrative costs) an increase in the number of independent nodes may increase the need for communication between nodes (reducing performance as scale increases) an increase in geographic distance increases the minimum latency for communication between distant nodes (reducing performance for certain operations)

设计原则: 分片 和 冗余

part-repl.png

Partitioning

例如: 数据库分片
限制了每个分片的大小,可以 更快 / 宕机的可能性降低

Partitioning improves performance by limiting the amount of data to be examined and by locating related data in the same partition Partitioning improves availability by allowing partitions to fail independently, increasing the number of nodes that need to fail before availability is sacrificed

Replication

数据拷贝。需要 强一致性。有点像会计的复式记账。

Replication improves performance by making additional computing power and bandwidth applicable to a new copy of the data Replication improves availability by creating additional copies of the data, increasing the number of nodes that need to fail before availability is sacrificed

Abstraction and models

这个模块没看懂….

相关链接:

分布式系统
CAP 定理的含义 - 阮一峰
谷歌翻译