1 概述
在 Wireshark 中, 通过 文件 菜单中的 导出对象 子菜单, 可以导出 HTTP, TFTP 等协议传输的数据.
比如选中 HTTP, 弹出对话框中可看到可导出的文件, 可以预览也可以另存为文件.
TShark 也支持这个功能, 只要使用 --export-objects 选项就好:
--export-objects <protocol>,<destdir>Export all objects within a protocol into directory destdir. The available values for protocol can be listed with --export-objects help.The objects are directly saved in the given directory. Filenames are dependent on the dissector, but typically it is named after the basename of a file. Duplicate files are not overwritten, instead an increasing number is appended before the file extension.This interface is subject to change, adding the possibility to filter on files.
TShark 执行示例:
zzq@vbox:~/dev/wireshark_build/run$mkdir extmpzzq@vbox:~/dev/wireshark_build/run$./tshark --export-objects http,extmp -r ~/pcap/http_gnu.pcap1 0.000000 192.168.1.103 → 209.51.188.148 TCP 66 6507 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=12 0.309763 209.51.188.148 → 192.168.1.103 TCP 66 80 → 6507 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1440 SACK_PERM=1 WS=1283 0.309827 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1 Ack=1 Win=132352 Len=04 0.310076 192.168.1.103 → 209.51.188.148 HTTP 466 GET / HTTP/1.1...45 1.922438 192.168.1.103 → 209.51.188.148 HTTP 379 GET /print.min.css HTTP/1.146 2.232797 209.51.188.148 → 192.168.1.103 HTTP 1414 HTTP/1.1 200 OK (text/css)47 2.274466 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1804 Ack=30667 Win=132352 Len=048 5.235437 209.51.188.148 → 192.168.1.103 TCP 60 80 → 6507 [FIN, ACK] Seq=30667 Ack=1804 Win=34560 Len=049 5.235469 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1804 Ack=30668 Win=132352 Len=0zzq@vbox:~/dev/wireshark_build/run$ls extmp/%2f heckert_gnu.transp.small.png hyperbola-i3-thumb.jpg print.min.css
本文通过跟踪分析 TShark 来探索导出对象的原理. 调试示例:
zzq@vbox:~/dev/wireshark_build/run$gdb ./tshark...Reading symbols from ./tshark...(gdb) b eo_drawBreakpoint 1 at 0x267db: file /home/zzq/dev/wireshark/ui/cli/tap-exportobject.c, line 103.(gdb) r --export-objects http,extmp -r ~/pcap/http_gnu.pcapStarting program: /home/zzq/dev/wireshark_build/run/tshark --export-objects http,extmp -r ~/pcap/http_gnu.pcap[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".[New Thread 0x7fffee845700 (LWP 9624)][Thread 0x7fffee845700 (LWP 9624) exited][New Thread 0x7fffee845700 (LWP 9625)][Thread 0x7fffee845700 (LWP 9625) exited]1 0.000000 192.168.1.103 → 209.51.188.148 TCP 66 6507 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1...49 5.235469 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1804 Ack=30668 Win=132352 Len=0Thread 1 "tshark" hit Breakpoint 1, eo_draw (tapdata=0x55555592be40) at /home/zzq/dev/wireshark/ui/cli/tap-exportobject.c:103103 {(gdb) bt#0 eo_draw (tapdata=0x55555592be40) at /home/zzq/dev/wireshark/ui/cli/tap-exportobject.c:103#1 0x00007ffff3fc9754 in draw_tap_listeners (draw_all=1) at /home/zzq/dev/wireshark/epan/tap.c:442#2 0x0000555555574194 in main (argc=5, argv=0x7fffffffe268) at /home/zzq/dev/wireshark/tshark.c:2310
以下简称导出对象为 eo. Wireshark 导出对象功能是基于 tap 实现的, 理解其原理需要先理解 Wireshark tap 机制.
2 数据结构
register_eo
eo 注册表项. 需要支持对象导出的协议应在初始化(如epan_init流程中)时进行 eo 事项注册, 如处理函数等. 此注册会生成 eo 注册表项.
// epan/export_object.cstruct register_eo {int proto_id; /* protocol id (0-indexed) */const char* tap_listen_str; /* string used in register_tap_listener (NULL to use protocol name) */tap_packet_cb eo_func; /* function to be called for new incoming packets for SRT */export_object_gui_reset_cb reset_cb; /* function to parse parameters of optional arguments of tap string */};// epan/export_object.h/** Structure for information about a registered exported object */typedef struct register_eo register_eo_t;
其中,
- proto_id: 协议 id
- taplisten_str: eo tap listener 名称, 比如 HTTP 协议对应的是 _http_eo
eo_func: eo tap 处理函数, 原型是
typedef tap_packet_status (*tap_packet_cb)(void *tapdata, packet_info *pinfo, epan_dissect_t *edt, const void *data);
此函数非常关键, 会由 tap 机制在合适的时机调用, 用于将协议数据提取出来, 暂存在 eo 数据结构中. 不同协议的 eo_func 一般不同, 如 HTTP 协议为
http_eo_packet.registered_eo_tables
全局 eo 注册表, 以红黑树实现, 查找 key 为协议过滤名字符串, 如
http. 注意不是http_eo, 那是 tap listener 的名字字符串.// epan/export_object.cstatic wmem_tree_t *registered_eo_tables = NULL;
[register_export_object](#L14OR)函数会根据参数生成新的 eo 注册表项, 并将它添加到全局注册表registered_eo_tables中.eo_opts
eo 全局选项, 以哈希表实现, key 为协议名, value 为保存目录.
// ui/cli/tap-exportobject.cstatic GHashTable* eo_opts = NULL;
export_object_entry_t
表示协议实际解析出的数据, 如 HTTP 中传输的文件等.
// epan/export_object.htypedef struct _export_object_entry_t {guint32 pkt_num;gchar *hostname;gchar *content_type;gchar *filename;/* We need to store a 64 bit integer to hold a file length(was guint payload_len;)XXX - we store the entire object in the program's address space,so the *real* maximum object size is size_t; if we were to exportobjects by going through all of the packets containing data fromthe object, one packet at a time, and write the object incrementally,we could support objects that don't fit into the address space. */gint64 payload_len;guint8 *payload_data;} export_object_entry_t;
export_object_list_t
tap listener 结构体中包括类型为
void*的 tap 上下文数据, 不同类型 tap 可能是不同的, 对于 eo, 它的实际类型为export_object_list_t:// epan/export_object.htypedef struct _export_object_list_t {export_object_object_list_add_entry_cb add_entry; //GUI specific handler for adding an object entryexport_object_object_list_get_entry_cb get_entry; //GUI specific handler for retrieving an object entryvoid* gui_data; //GUI specific data (for UI representation)} export_object_list_t;
在报文处理过程中, tap 机制会调用
tap_packet_cb类型的回调函数, 它的第一个实参就是这里所说的”tap 上下文数据”, 对于 eo 则就是export_object_list_t指针了. 联系上下文, 也就是register_eo结构体中的eo_func成员, 具体到 HTTP 协议就是http_eo_packet函数.
此结构体中,
- addentry: 把导出数据添加到链表, 这链表就是下文的”导出数据表”
export_object_list_gui_t中的链表. add_entry 的实际函数是 _ui/cli/tap-exportobject.c 中的object_list_add_entry - getentry: 把链表中的导出数据取回来. get_entry 的实际函数是 _ui/cli/tap-exportobject.c 中的
object_list_get_entry gui_data:
export_object_list_gui_t指针export_object_list_gui_t
导出数据会添加到导出数据表中.
// ui/cli/tap-exportobject.ctypedef struct _export_object_list_gui_t {GSList *entries;register_eo_t* eo;} export_object_list_gui_t;
其中,
entries: 同一协议的多个导出数据会挂到 entries 列表中
- eo: 初始化注册的 eo 注册表项
数据结构关系
3 流程
eo 流程主要分 3 个阶段:
- 注册: 协议需要注册自己的 eo 处理函数, 而主程序(如 TShark)需要注册 tap listener
- 数据提取: 协议在解析过程中提出数据, 并暂存到内部数据结构中
- 数据导出: 将提取好的数据写入文件等
整体流程见下文中的图, 结合图看可以更容易理解.
注册
register_export_object
协议应在初始化时调用此函数进行 eo 注册, 其中的 export_packet_func 非常关键, 是协议特定的数据导出回调函数. 此函数会新建 eo 注册表项, 并添加到全局表中, 并注册 tap.
// epan/export_object.cintregister_export_object(const int proto_id, tap_packet_cb export_packet_func, export_object_gui_reset_cb reset_cb){register_eo_t *table;DISSECTOR_ASSERT(export_packet_func);table = wmem_new(wmem_epan_scope(), register_eo_t);table->proto_id = proto_id;table->tap_listen_str = wmem_strdup_printf(wmem_epan_scope(), "%s_eo", proto_get_protocol_filter_name(proto_id));table->eo_func = export_packet_func;table->reset_cb = reset_cb;if (registered_eo_tables == NULL)registered_eo_tables = wmem_tree_new(wmem_epan_scope());wmem_tree_insert_string(registered_eo_tables, proto_get_protocol_filter_name(proto_id), table, 0);return register_tap(table->tap_listen_str);}
此函数的返回值是 tap 句柄, 之后实际提取协议数据时要用到.
在 HTTP 中的注册示例:
// epan/dissectors/packet-http.cvoidproto_register_http(void){.../** Register for tapping*/http_tap = register_tap("http"); /* HTTP statistics tap */http_follow_tap = register_tap("http_follow"); /* HTTP Follow tap */...register_follow_stream(proto_http, "http_follow", tcp_follow_conv_filter, tcp_follow_index_filter, tcp_follow_address_filter,tcp_port_to_display, follow_tvb_tap_listener);http_eo_tap = register_export_object(proto_http, http_eo_packet, NULL);}
其中 http_eo_packet 就是 HTTP 协议对就的 eo 回调函数, 其具体实现见下文.
eo_tap_opt_add
向 eo 传递命令行选项, 语法是 <protocol>,<destdir>. eo 选项会被添加到全局的 eo_opts.
// ui/cli/tap-exportobject.h/* will be called by main each time a --export-objects option is found */gboolean eo_tap_opt_add(const char *optarg);// ui/cli/tap-exportobject.cgboolean eo_tap_opt_add(const char *option_string){gchar** splitted;if (!eo_opts)eo_opts = g_hash_table_new(g_str_hash,g_str_equal);splitted = g_strsplit(option_string, ",", 2);if ((splitted[0] == NULL) || (splitted[1] == NULL) || (get_eo_by_name(splitted[0]) == NULL)){fprintf(stderr, "tshark: \"--export-objects\" are specified as: <protocol>,<destdir>\n");fprintf(stderr, "tshark: The available export object types for the \"--export-objects\" option are:\n");eo_list_object_types();}else{gchar* dir = (gchar*)g_hash_table_lookup(eo_opts, splitted[0]);/* Since we're saving all objects from a protocol,it can only be listed once */if (dir == NULL) {g_hash_table_insert(eo_opts, splitted[0], splitted[1]);g_free(splitted);return TRUE;}else...}g_strfreev(splitted);return FALSE;}
此函数在 TShark 的 main 函数中执行, 具体是在命令行解析的时候, 如果用户指定了 --export-objects 选项就执行:
// tshark.cintmain(int argc, char *argv[]){...case LONGOPT_EXPORT_OBJECTS: /* --export-objects */if (strcmp("help", optarg) == 0) {fprintf(stderr, "tshark: The available export object types for the \"--export-objects\" option are:\n");eo_list_object_types();exit_status = EXIT_SUCCESS;goto clean_exit;}if (!eo_tap_opt_add(optarg)) {exit_status = INVALID_OPTION;goto clean_exit;}...}
start_exportobjects
声明开启 eo 操作. 这会对全局 eo 选项(一个哈希表)中的每一项调用 exportobject_handler 函数.
// ui/cli/tap-exportobject.cstatic voidexportobject_handler(gpointer key, gpointer value _U_, gpointer user_data _U_){GString *error_msg;export_object_list_t *tap_data;export_object_list_gui_t *object_list;register_eo_t* eo;eo = get_eo_by_name((const char*)key);...tap_data = g_new0(export_object_list_t,1);object_list = g_new0(export_object_list_gui_t,1);tap_data->add_entry = object_list_add_entry;tap_data->get_entry = object_list_get_entry;tap_data->gui_data = (void*)object_list;object_list->eo = eo;/* Data will be gathered via a tap callback */error_msg = register_tap_listener(get_eo_tap_listener_name(eo), tap_data, NULL, 0,NULL, get_eo_packet_func(eo), eo_draw, NULL);if (error_msg) {...}}void start_exportobjects(void){if (eo_opts != NULL)g_hash_table_foreach(eo_opts, exportobject_handler, NULL);}
调用这个函数会调用 register_tap_listener, 使得 eo 与 tap 机制彻底绑定起来. 此函数在 TShark 的 main 函数中执行:
// tshark.cintmain(int argc, char *argv[]){...prefs_apply_all();/* We can also enable specified taps for export object */start_exportobjects();...}
提取数据
在报文的处理过程中, 注册时构建好的 eo - tap 联动机制将发挥作用, 完成协议数据的提取. 具体来说, 协议解析器与 tap 机制要完成各自任务:
- 协议解析器: 解析报文, 在准备好协议数据时, 根据这些数据构造 eo 信息, 然后调用
tap_queue_packet, 把 eo 信息加入 tap 队列. - tap 机制: 在实际解析报文前初始化 tap 队列, 而在之后遍历 tap 队列完成处理. 在处理 tap 队列的过程中, 会调用到协议注册的 eo 处理函数.
协议解析器的处理示例如 HTTP:
// epan/dissectors/packet-http.cstatic intdissect_http_message(tvbuff_t *tvb, int offset, packet_info *pinfo,proto_tree *tree, http_conv_t *conv_data,const char* proto_tag, int proto, gboolean end_of_stream){...if (datalen > 0) {.../* Save values for the Export Object GUI feature if we have* an active listener to process it (which happens when* the export object window is open). */if(have_tap_listener(http_eo_tap)) {eo_info = wmem_new(wmem_packet_scope(), http_eo_t);eo_info->hostname = conv_data->http_host;eo_info->filename = conv_data->request_uri;eo_info->content_type = headers.content_type;eo_info->payload_len = tvb_captured_length(next_tvb);eo_info->payload_data = tvb_get_ptr(next_tvb, 0, eo_info->payload_len);tap_queue_packet(http_eo_tap, pinfo, eo_info);}...}...}
此时已经把 HTTP 报文解析完毕, 于是根据解析出的数据构造 http_eo_t, 并将其加入 tap 队列.
tap 机制包含在协议解析流程中, 如 epan_dissect_run_with_taps:
// epan/epan.cvoidepan_dissect_run_with_taps(epan_dissect_t *edt, int file_type_subtype,wtap_rec *rec, tvbuff_t *tvb, frame_data *fd,column_info *cinfo){wmem_enter_packet_scope();tap_queue_init(edt);dissect_record(edt, file_type_subtype, rec, tvb, fd, cinfo);tap_push_tapped_queue(edt);/* free all memory allocated */wmem_leave_packet_scope();}
其中, dissect_record 中调用到上文的 dissect_http_message, 它在必要时填充 tap 队列; 而 tap_push_tapped_queue 会处理现有的 tap 队列, 对队中每一项调用处理函数(tl->packet()):
// epan/tap.c/* this function is called after a packet has been fully dissected to push the tappeddata to all extensions that has callbacks registered.*/voidtap_push_tapped_queue(epan_dissect_t *edt){tap_packet_t *tp;tap_listener_t *tl;guint i;/* nothing to do, just return */if(!tapping_is_active){return;}tapping_is_active=FALSE;/* nothing to do, just return */if(!tap_packet_index){return;}/* loop over all tap listeners and call the listener callbackfor all packets that match the filter. */for(i=0;i<tap_packet_index;i++){for(tl=tap_listener_queue;tl;tl=tl->next){tp=&tap_packet_array[i];/* Don't tap the packet if it's an "error packet"* unless the listener has requested that we do so.*/if (!(tp->flags & TAP_PACKET_IS_ERROR_PACKET) || (tl->flags & TL_REQUIRES_ERROR_PACKETS)){if(tp->tap_id==tl->tap_id){if(!tl->packet){/* There isn't a per-packet* routine for this tap.*/continue;}.../* So call the per-packet routine. */tap_packet_status status;status = tl->packet(tl->tapdata, tp->pinfo, edt, tp->tap_specific_data);...}}}}}
这里 tap listener 的报文处理函数 packet, 就是在上文的 start_exportobjects 中配置好的, 因为其中调用 register_tap_listener 注册了回调函数.
对于 HTTP, 这个函数就是 http_eo_packet:
// epan/dissectors/packet-http.cstatic tap_packet_statushttp_eo_packet(void *tapdata, packet_info *pinfo, epan_dissect_t *edt _U_, const void *data){export_object_list_t *object_list = (export_object_list_t *)tapdata;const http_eo_t *eo_info = (const http_eo_t *)data;export_object_entry_t *entry;if(eo_info) { /* We have data waiting for us *//* These values will be freed when the Export Object window* is closed. */entry = g_new(export_object_entry_t, 1);entry->pkt_num = pinfo->num;entry->hostname = g_strdup(eo_info->hostname);entry->content_type = g_strdup(eo_info->content_type);entry->filename = eo_info->filename ? g_path_get_basename(eo_info->filename) : NULL;entry->payload_len = eo_info->payload_len;entry->payload_data = (guint8 *)g_memdup(eo_info->payload_data, eo_info->payload_len);object_list->add_entry(object_list->gui_data, entry);return TAP_PACKET_REDRAW; /* State changed - window should be redrawn */} else {return TAP_PACKET_DONT_REDRAW; /* State unchanged - no window updates needed */}}
此函数根据报文解析时得到的 HTTP 导出对象信息构造导出数据对象(entry), 并把它挂到全局链表上, 可参考上文的数据结构关系图来帮助理解.
HTTP 导出对象信息:
// epan/dissectors/packet-http.c/* Used for HTTP Export Object feature */typedef struct _http_eo_t {guint32 pkt_num;gchar *hostname;gchar *filename;gchar *content_type;guint32 payload_len;const guint8 *payload_data;} http_eo_t;
可见包括解析出的 Host, Content-Type 字符串, 以及对应的数据与长度.
导出数据
整个 pcap 文件解析完成后(本文不涉及网卡抓包场景), TShark 会调用 draw_tap_listeners, 最终导致之前提取的数据写入文件.
if (draw_taps)draw_tap_listeners(TRUE);
此函数遍历所有 tap listeners, 调用其绑定的 draw 回调函数:
// epan/tap.c/* This function is called when we need to redraw all tap listeners, for examplewhen we open/start a new capture or if we need to rescan the packet list.It should be called from a low priority thread say once every 3 secondsIf draw_all is true, redraw all applications regardless if they havechanged or not.*/voiddraw_tap_listeners(gboolean draw_all){tap_listener_t *tl;for(tl=tap_listener_queue;tl;tl=tl->next){if(tl->needs_redraw || draw_all){if(tl->draw){tl->draw(tl->tapdata);}}tl->needs_redraw=FALSE;}}
对于 eo tap, 这个 draw 函数就是 eo_draw, 它是在 start_exportobjects 时配置好的:
// ui/cli/tap-exportobject.c/* This is just for writing Exported Objects to a file */static voideo_draw(void *tapdata){export_object_list_t *tap_object = (export_object_list_t *)tapdata;export_object_list_gui_t *object_list = (export_object_list_gui_t*)tap_object->gui_data;GSList *slist = object_list->entries;export_object_entry_t *entry;gchar* save_in_path = (gchar*)g_hash_table_lookup(eo_opts, proto_get_protocol_filter_name(get_eo_proto_id(object_list->eo)));GString *safe_filename = NULL;gchar *save_as_fullpath = NULL;guint count = 0;if (!g_file_test(save_in_path, G_FILE_TEST_IS_DIR)) {/* If the destination directory (or its parents) do not exist, create them. */if (g_mkdir_with_parents(save_in_path, 0755) == -1) {fprintf(stderr, "Failed to create export objects output directory \"%s\": %s\n",save_in_path, g_strerror(errno));return;}}while (slist) {entry = (export_object_entry_t *)slist->data;do {g_free(save_as_fullpath);if (entry->filename) {safe_filename = eo_massage_str(entry->filename,EXPORT_OBJECT_MAXFILELEN, count);} else {char generic_name[EXPORT_OBJECT_MAXFILELEN+1];const char *ext;ext = eo_ct2ext(entry->content_type);g_snprintf(generic_name, sizeof(generic_name),"object%u%s%s", entry->pkt_num, ext ? "." : "", ext ? ext : "");safe_filename = eo_massage_str(generic_name,EXPORT_OBJECT_MAXFILELEN, count);}save_as_fullpath = g_build_filename(save_in_path, safe_filename->str, NULL);g_string_free(safe_filename, TRUE);} while (g_file_test(save_as_fullpath, G_FILE_TEST_EXISTS) && ++count < prefs.gui_max_export_objects);count = 0;eo_save_entry(save_as_fullpath, entry);g_free(save_as_fullpath);save_as_fullpath = NULL;slist = slist->next;}}
此函数遍历导出数据对象链表, 对每一项执行导出文件名构造, 写入文件等事务.
整体流程
图例:
- 实线箭头: 一般流程, A->B
- 虚线箭头: 函数嵌套调用等, A contains B
- 黄色背景: 相关的数据结构
参考
- Wireshark源码: epan/export_object.h
- Wireshark源码: ui/cli/tap-exportobject.h
- Wireshark源码: epan/dissectors/packet-http.c
- Wireshark源码: tshark.c
- Wireshark原理: tap
