Introduction

引言

The year was 2015. I was writing a bunch of ML training scripts as well as several production scripts. They all needed financial data. Data was spread across multiple tables and multiple datastores. Intraday market data was stored differently in a cassandra cluster, while the daily/monthly data was in a MySQL database. Similarly, different types of securities (Future, Option, Stock etc.) were stored in different locations.
那一年是2015年。我正在写一堆ML训练脚本以及几个生产脚本。他们都需要金融数据。数据分散在多个表和多个数据存储中。日内市场数据以不同方式存储在cassandra集群中,而每日/每月的数据则在MySQL数据库中。同样地,不同类型的证券(期货、期权、股票等)被存储在不同的位置。

So, I decided to make a data library that I can use in my scripts. This data library turned out to be quite popular with my team. It had all the things we needed at that point:
所以,我决定写一个可以在我的脚本中使用的数据操作库。结果这个数据操作库在我的团队中相当受欢迎。它拥有我们当时需要的所有东西:

  • A single interface for all data types - Futures, Stocks, ETFs, Currencies, Indexes, and Funds from different exchanges. 所有数据类型的单一接口 - 来自不同交易所的期货、股票、ETF、货币、指数和基金。
  • Easy to use interface. 易于使用的接口
  • Flexible in terms of the data intervals supported. It worked flawlessly for interday, daily and monthly time periods. 在支持的数据间隔方面很灵活。它在日内、跨日和跨月的时间段里工作得完美无缺
  • It could be used for both live ingestion/consumption as well as historical data requirements. 它既可用于实时数据获取/使用,也可用于历史数据需求
  • It was easy to support a new type of data - for example, macroeconomic indicators. 很容易支持一种新的数据类型——例如,宏观经济指标

However, it had some fatal flaws that I could not foresee at that point. Over time, the number of production scripts relying on this library grew exponentially. Our data library directly called database queries.
然而,它有一些我当时无法预见的致命的缺陷。随着时间的推移,依赖这个库的生产脚本的数量成倍增长。而我们的数据操作库直接调用数据库查询。

  • Changing anything in the database would break existing production processes. So, there was no way of changing the database without incurring downtime. 更改数据库中的任何内容都会破坏现有的生产流程。因此,没有办法在不造成停机的情况下更改数据库
  • Additionally, rapidly increasing production processes caused a strain on the database. Because the database access was finely ingrained into the rest of the codebase, it was not possible to optimize or load balance properly. 此外,迅速增加的生产进程对数据库造成了压力。由于数据库的访问被细化到代码库的其他部分,所以不可能进行适当的优化或负载平衡。

About a year ago, I was asked if we should convert that library to a service. I brushed it away - not realizing the problems I was going to face in the next year. To be fair, I didn’t fully understand the services or microservices at that point - that made me skeptical of its use for something like data fetching. I was still convinced the flexibility and rapid changes would only come from having that code as a library.
大约一年前,有人问我,我们是否应该把那个代码库转换为服务。我对此置之不理—完全没有意识到在接下来的一年里我将面临的问题。说实话,那时候我还没有完全理解服务或微服务—这让我对它用于数据获取这样的事情持怀疑态度。我仍然相信,将这些代码作为一个库是灵活性和快速变化的保证。

But, I finally started taking another look at services a few days ago. I looked at gRPC, Thrift and RPyC over the past few days. I am summarizing my initial findings in this post. Because I mostly use python for everything, I am approaching these frameworks from that point of view.
但是,几天前我终于开始重新审视这些服务。在过去的几天里,我看了gRPC、Thrift和RPyC。我在这篇文章中总结了我的初步结论。因为我主要是用python来做所有事情,所以我是从这个角度来看待这些框架的。
You can find the code for the subsequent examples in this repo.
您可以在这个链接中找到后续示例的代码。

gRPC

gGPC uses Protocol Buffers for serialization and deserialization. It was developed by Google - they released this as an open source software when they were rewriting their internal framework called stubby. At the moment, several companies including Netflix and Square are using this framework to implement their services.
gGPC使用Protocal Buffers 进行序列化和反序列化。它是由谷歌开发的—他们在重写内部框架stubby的时候将其作为一个开源软件发布。目前,包括Netflix和Square在内的一些公司正在使用这个框架来实现他们的服务。
image.png
Let’s jump directly into the simplest example.
让我们直接跳到最简单的例子中。

We will use the same toy example for all 3 frameworks:
我们将为所有3个框架使用相同的玩具示例:

  • We will define a service called Time. 我们将定义一个名为Time 的服务。
  • It implements a single RPC call called GetTime 它实现了一个单一的 RPC 调用:GetTime.
  • GetTime doesn’t take any argument and returns the current server time in string format。GetTime 不接受任何参数并以字符串格式返回当前的服务器时间。

    Simple gRPC Example

    简单的 gRPC 示例

    Create a time.proto Protocol Buffers file describing our service.
    创建一个 time.proto Protocol Buffers文件来描述我们的服务。 ```protobuf syntax = “proto3”; package time;

service Time { // Time 服务名 // GetTime RPC 调用 // TimeRequest RPC 输入类型 // TimeReply RPC 输出类型 rpc GetTime (TimeRequest) returns (TimeReply) {} }

// Empty Request Message message TimeRequest { }

// The response message containing the time message TimeReply { string message = 1; // 字符串类型 }

  1. And heres a bit of explanation of the above code.<br />下面是对上面代码的一点解释。<br />![image.png](https://cdn.nlark.com/yuque/0/2021/png/125726/1629011793322-8cffc381-b6fb-454d-b1f4-f058e123ff81.png#clientId=u1f7008f2-f7ca-4&from=paste&id=u440e58c3&margin=%5Bobject%20Object%5D&name=image.png&originHeight=336&originWidth=649&originalType=url&ratio=1&size=155760&status=done&style=none&taskId=u80703b64-361c-488f-90cd-4f1509e004d)<br />Now, use the above protobuf file to generate python files time_pb2.py and time_pb2_grpc.py. We will use them for both our server and client code. Here’s the command line code to do so (you will need the grpcio-tools python package):<br />现在,使用上面的 protobuf 文件生成 python 文件 time_pb2.py 和 time_pb2_grpc.py。我们将在服务器和客户端代码中使用它们。下面是执行此操作的命令行代码(您将需要 **grpcio-tools** python 包) :
  2. ```bash
  3. python -m grpc_tools.protoc --python_out=. --grpc_python_out=. time.proto

Create the server script server.py.
创建服务器脚本 server.py。

import time
from concurrent import futures

import grpc

# import 生成的代码
import time_pb2
import time_pb2_grpc

_ONE_DAY_IN_SECONDS = 60 * 60 * 24


# 定义 Timer 类
class Timer(time_pb2_grpc.TimeServicer):
    def GetTime(self, request, context): # 定义RPC 调用
        return time_pb2.TimeReply(message=time.ctime()) # 返回当前时间


def serve():
    # 创建一个线程池,添加我们的服务实例并启动服务器
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    time_pb2_grpc.add_TimeServicer_to_server(Timer(), server)
    server.add_insecure_port('[::]:50051') 
    server.start()
    try:
        while True:# sleep 避免主线程退出
            time.sleep(_ONE_DAY_IN_SECONDS)
    except KeyboardInterrupt:
        server.stop(0)


if __name__ == '__main__':
    serve()

And here’s the annotated server code:
下面是带注释的服务器代码:
image.png
Add client code to the client.py file.
将客户端代码添加到 client.py 文件。

import grpc

import time_pb2
import time_pb2_grpc


def run():
    channel = grpc.insecure_channel('localhost:50051') # 连接服务器
    stub = time_pb2_grpc.TimeStub(channel)
    response = stub.GetTime(time_pb2.TimeRequest()) # 调用RPC
    print('Client received: {}'.format(response.message))


if __name__ == '__main__':
    run()

I have added the annotated client code below.
我在下面添加了注释客户机代码。
image.png

More Details

更多细节

gRPC uses HTTP/2 for client-server communication. Every RPC call is a separate stream in the same TCP/IP connection.
gRPC 使用 HTTP/2进行客户机-服务器通信,每个 RPC 调用都是同一个 TCP/IP 连接中的单独的流。
4 different types of RPCs supported:
支援4种不同类型的RPCs:

  • Unary RPC - a single request followed by a single response from the server. Our TimeService example uses Unary RPC. 单一的 RPC ——一个请求后跟一个来自服务器的响应。我们的 TimeService 示例使用单一的 RPC。

    rpc GetTime (TimeRequest) returns (TimeReply) {}
    
  • Server Streaming RPC - client sends a request and gets a stream to read from. 服务器流 RPC-客户端发送一个请求并获得一个可读取的流。

    rpc GetTime (TimeRequest) returns (stream TimeReply) {}
    
  • Client Streaming RPC - Client writes a sequence of messages. 客户端流式 RPC-客户端写入一个消息序列。

    rpc GetTime (stream TimeRequest) returns (TimeReply) {}
    
  • Bidirectional Streaming RPC - Both sides send a sequence of messages using a read-write stream. 双向流式 RPC ——双方使用读写流发送一系列消息。

    rpc GetTime (stream TimeRequest) returns (stream TimeReply) {}
    

    gRPC comes with an inbuilt timeout functionality. This is quite handy in practice. Many applications require a response within a certain time interval.
    带有内置的超时功能,这在实践中相当方便。许多应用程序要求在一定的时间间隔内做出响应。

    Pros and Cons

    正反两方面

    Pros:
    优点:

  • Multiple Language Support for both servers and clients. 为服务器和客户端提供多语言支持

  • It uses HTTP/2 by default for connections. 默认情况下,连接使用 HTTP/2
  • Abundant documentation. 丰富的文档
  • This project is actively supported by Google and others. 这个项目得到了谷歌和其他公司的积极支持

Cons:
缺点:

  • Less flexibility (especially compared to 灵活性较低(特别是与rpyc).

Links:
链接:

  • Official Website and Tutorial - 官方网站及教程 -https://grpc.io/docs/guides/.
  • gRPC Concepts.

    Thrift

    节俭

    Thrift is quite popular at Facebook and in the Hadoop/Java services world. It was created at Facebook and they open sourced it as an Apache project at some point.
    Thrift在Facebook和Hadoop/Java服务世界中相当流行。它是在Facebook创建的,他们在某个时候把它作为一个Apache项目开源了。

    Simple thrift Example

    简单的节俭例子

    Create time_service.thrift file describing the Interface using Thrift Interface Description Language (IDL).
    使用Thrift接口描述语言(IDL)创建描述接口的time_service.thrift文件。
    service TimeService {
      string get_time()
    }
    
    Run the following command to generate python code. It will create a gen-py directory. We will use it to build Server and Client scripts.
    运行以下命令生成 python 代码。它将创建一个 gen-py 目录。我们将使用它来构建服务器和客户端脚本。
    thrift -r --gen py time_service.thrift
    
    Write the following server code in server.py.
    用 server.py 编写以下服务器代码。 ```python import sys import time

from thrift.protocol import TBinaryProtocol from thrift.server import TServer from thrift.transport import TSocket, TTransport sys.path.append(‘gen-py’) from time_service import TimeService

class TimeHandler: def init(self): self.log = {}

def get_time(self):
    return time.ctime()

if name == ‘main‘: handler = TimeHandler() processor = TimeService.Processor(handler) transport = TSocket.TServerSocket(host=’127.0.0.1’, port=9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory()

server = TServer.TSimpleServer(processor, transport, tfactory, pfactory)

print('Starting the server...')
server.serve()
print('done.')
Write the following code in client.py.<br />在 client.py 中编写以下代码。
```python
import sys

from thrift import Thrift
from thrift.protocol import TBinaryProtocol
from thrift.transport import TSocket, TTransport
sys.path.append('gen-py')
from time_service import TimeService


def main():
    # 创建 socket
    transport = TSocket.TSocket('localhost', 9090)

    # Buffering 是关键. 原始套接字非常慢
    transport = TTransport.TBufferedTransport(transport)

    # 以协议方式包装
    protocol = TBinaryProtocol.TBinaryProtocol(transport)

    # 创建一个client 
    client = TimeService.Client(protocol)

    # Connect!
    transport.open()

    ts = client.get_time()
    print('Client Received {}'.format(ts))

    # Close!
    transport.close()


if __name__ == '__main__':
    try:
        main()
    except Thrift.TException as tx:
        print('%s' % tx.message)

Simple thriftPy Example

简单的 thriftPy 例子

thriftPy seems to be more popular than the default python support. It also solves some common issues with the default python support - this includes a more pythonic approach to creating server and client code. For example, check out the following server and client code:
thriftPy似乎比默认的python thrift 库更受欢迎。它也解决了默认的python thrift 库的一些常见问题—这包括用更多的pythonic方法来创建服务器和客户端代码。例如,看看下面的服务器和客户端代码。
Server code
服务器代码

import time

import thriftpy
from thriftpy.rpc import make_server

class Dispatcher(object):
    def get_time(self):
        return time.ctime()

time_thrift = thriftpy.load('time_service.thrift', module_name='time_thrift')
server = make_server(time_thrift.TimeService, Dispatcher(), '127.0.0.1', 6000)
server.serve()

Client code
客户端代码

import thriftpy
from thriftpy.rpc import make_client

time_thrift = thriftpy.load('time_service.thrift', module_name='time_thrift')
client = make_client(time_thrift.TimeService, '127.0.0.1', 6000)
print(client.get_time())

Pros and Cons

正反两方面

Pros:
优点:

  • Thrift supports container types list, set and map. They also support constants. This is not supported by Protocol Buffers. However, rpycsupports all python and python library types - you can even send a numpy array in an RPC call. (Edit: proto3 supports those types too. Thanks Barak Michener for pointing this out.) Thrift支持容器类型list、set和map。也支持常量。这是protocol Buffers 所不支持的。然而,rpyc支持所有的python和python库类型—你甚至可以在RPC调用中发送一个numpy数组。(编辑:proto3也支持这些类型。感谢Barak Michener指出这一点)。)

Cons:
缺点:

  • Python doesn’t feel like a primary language for Thrift. Having to add sys.path.append(‘gen-py’) doesn’t make for a smooth python experience. Python感觉不是Thrift的主要语言。不得不添加sys.path.append(‘gen-py’),这并不能带来流畅的python体验。
  • Documentation and online discussions seem relatively scarce compared to gRPC. 与gRPC相比,文档和在线讨论相对匮乏

    RPyC

    RPyC is a pure python RPC framework. It does not support multiple languages. If your entire codebase is in python, this could be an easy and flexible framework for you.
    RPyC 是一个纯粹的 python RPC 框架。它不支持多种语言。如果您的整个代码库都使用 python,那么这将是一个简单而灵活的框架。

    Simple rpyc Example

    简单的 rpyc 示例

    server.py

    ```python import time

from rpyc import Service from rpyc.utils.server import ThreadedServer

定义 TimeService 类

class TimeService(Service): def exposedget_time(self): # 在RPC 调用 名字加 exposed 前缀 return time.ctime()

if name == ‘main‘: s = ThreadedServer(TimeService, port=18871) # 启动服务 s.start()

Here’s the annotated server code:<br />下面是注释的服务器代码:<br />![image.png](https://cdn.nlark.com/yuque/0/2021/png/125726/1629011793679-692526bf-0864-4bc2-b03a-7016ab4e24b0.png#clientId=u1f7008f2-f7ca-4&from=paste&id=uba67ade5&margin=%5Bobject%20Object%5D&name=image.png&originHeight=318&originWidth=643&originalType=url&ratio=1&size=143756&status=done&style=none&taskId=u30fa6b37-bee8-4916-b959-aade753ac2d)
<a name="TtNDq"></a>
### client.py
```python
import rpyc

conn = rpyc.connect('localhost', 18871) # 连接服务
print('Time is {}'.format(conn.root.get_time()))

Annotated client code:
附加注释的客户端代码:
image.png

Pros and Cons

正反两方面

Pros:
优点:

  • Probably the easiest to get started. No need to understand Protocol Buffers or Thrift syntax. 可能是最容易开始的,不需要理解Protocol Buffer或Thrift的语法
  • Extremely flexible. No need to formally use IDL (Interface Definition Language) to define the client-server interfaces. Simply start implementing your code - it embraces python’s Duck Typing. 极为灵活。不需要正式使用IDL(接口定义语言)来定义客户-服务器接口。只需开始实现你的代码—它拥抱了python的Duck Typing。

Cons:
缺点:

  • Lack of multiple client languages. 缺少多种客户机语言
  • Lack of formally defined service interface can potentially cause maintenance issues if the codebase becomes large enough. 如果代码库变得足够大,缺乏正式定义的服务接口可能会导致维护问题

    gRPC vs Thrift vs RPyC comparison matrix

    比较矩阵

    Let me summarize my experiences here before jumping into the details of each framework.
    在深入讨论每个框架的细节之前,让我在这里总结一下。
gRPC Thrift RPyC
Getting Started 入门指南 image.png image.png image.png
Documentation 文档 image.png image.png image.png
Language Support 语言支持 C++, Python,.. 2. c + + ,Python,. C++, Python,.. 2. c + + ,Python,. Python Only 只能用 Python
Maintenance 可维护性 image.png image.png image.png
Streaming 流 image.png image.png image.png
Can work without IDL 没有 IDL 也能工作 image.png image.png image.png

Notes on the above table:
上表的注释:

  • I found it relatively hard to get the basic Thrift example working. The few python examples I found were targetted for older thrift version (and python2).我发现要让基本的Thrift例子工作起来比较困难。我发现的几个python例子都是针对较早的thrift版本(和python2)。
  • My opinion on “Maintenance” is based on the fact that RPyC doesn’t have an IDL (gRPC uses protobuf, Thrift uses Thrift IDL) - it embraces duck typing. While this makes it really easy to get started, it can be a bad thing when it comes to maintenance.我对 “可维护性 “的看法是基于这样一个事实:RPyC没有IDL(gRPC使用protobuf,Thrift使用Thrift IDL)—它拥抱鸭子的类型。虽然这使得它非常容易上手,但在维护方面,它可能是一件坏事。

My preferences are:
我的偏好是:

  • I would personally perfer to use RPyC if python is the only language I am going to use.如果Python是我要使用的唯一语言,我个人更倾向于使用RPyC。
  • I would prefer to use gPRC if I needed robustness, reliability, and scalability from my services.如果我的服务需要稳健性、可靠性和可扩展性,我更愿意使用gPRC。
  • The best thing about Thrift is that it supports so many languages. If that’s what you’re targetting, go for Thirft.Thrift最好的一点是它支持更多语言。如果这是你的目标,就选择Thirft吧。

Other important things to note:
其他要注意的重要事项:

  1. I did not compare speed. This might be the most relevant criterion for some. 我没有比较速度,对于某些人来说,这可能是最相关的指标
  2. I do not have experience with very large services. I am not the right person to comment on maintainability of each framework. However, this is an important criterion to decide which RPC framework to choose. 我没有处理非常大的服务的经验。我不是评论每个框架的可维护性的合适人选。然而,这是决定选择哪种RPC框架的一个重要标准。

You can find the code for the above examples in this repo.
你可以在这个代码库中找到上面例子的代码。

参考链接: