1 概述
在 Wireshark 中, 通过 文件
菜单中的 导出对象
子菜单, 可以导出 HTTP, TFTP 等协议传输的数据.
比如选中 HTTP, 弹出对话框中可看到可导出的文件, 可以预览也可以另存为文件.
TShark 也支持这个功能, 只要使用 --export-objects
选项就好:
--export-objects <protocol>,<destdir>
Export all objects within a protocol into directory destdir. The available values for protocol can be listed with --export-objects help.
The objects are directly saved in the given directory. Filenames are dependent on the dissector, but typically it is named after the basename of a file. Duplicate files are not overwritten, instead an increasing number is appended before the file extension.
This interface is subject to change, adding the possibility to filter on files.
TShark 执行示例:
zzq@vbox:~/dev/wireshark_build/run
$mkdir extmp
zzq@vbox:~/dev/wireshark_build/run
$./tshark --export-objects http,extmp -r ~/pcap/http_gnu.pcap
1 0.000000 192.168.1.103 → 209.51.188.148 TCP 66 6507 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1
2 0.309763 209.51.188.148 → 192.168.1.103 TCP 66 80 → 6507 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1440 SACK_PERM=1 WS=128
3 0.309827 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1 Ack=1 Win=132352 Len=0
4 0.310076 192.168.1.103 → 209.51.188.148 HTTP 466 GET / HTTP/1.1
...
45 1.922438 192.168.1.103 → 209.51.188.148 HTTP 379 GET /print.min.css HTTP/1.1
46 2.232797 209.51.188.148 → 192.168.1.103 HTTP 1414 HTTP/1.1 200 OK (text/css)
47 2.274466 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1804 Ack=30667 Win=132352 Len=0
48 5.235437 209.51.188.148 → 192.168.1.103 TCP 60 80 → 6507 [FIN, ACK] Seq=30667 Ack=1804 Win=34560 Len=0
49 5.235469 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1804 Ack=30668 Win=132352 Len=0
zzq@vbox:~/dev/wireshark_build/run
$ls extmp/
%2f heckert_gnu.transp.small.png hyperbola-i3-thumb.jpg print.min.css
本文通过跟踪分析 TShark 来探索导出对象的原理. 调试示例:
zzq@vbox:~/dev/wireshark_build/run
$gdb ./tshark
...
Reading symbols from ./tshark...
(gdb) b eo_draw
Breakpoint 1 at 0x267db: file /home/zzq/dev/wireshark/ui/cli/tap-exportobject.c, line 103.
(gdb) r --export-objects http,extmp -r ~/pcap/http_gnu.pcap
Starting program: /home/zzq/dev/wireshark_build/run/tshark --export-objects http,extmp -r ~/pcap/http_gnu.pcap
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffee845700 (LWP 9624)]
[Thread 0x7fffee845700 (LWP 9624) exited]
[New Thread 0x7fffee845700 (LWP 9625)]
[Thread 0x7fffee845700 (LWP 9625) exited]
1 0.000000 192.168.1.103 → 209.51.188.148 TCP 66 6507 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1
...
49 5.235469 192.168.1.103 → 209.51.188.148 TCP 54 6507 → 80 [ACK] Seq=1804 Ack=30668 Win=132352 Len=0
Thread 1 "tshark" hit Breakpoint 1, eo_draw (tapdata=0x55555592be40) at /home/zzq/dev/wireshark/ui/cli/tap-exportobject.c:103
103 {
(gdb) bt
#0 eo_draw (tapdata=0x55555592be40) at /home/zzq/dev/wireshark/ui/cli/tap-exportobject.c:103
#1 0x00007ffff3fc9754 in draw_tap_listeners (draw_all=1) at /home/zzq/dev/wireshark/epan/tap.c:442
#2 0x0000555555574194 in main (argc=5, argv=0x7fffffffe268) at /home/zzq/dev/wireshark/tshark.c:2310
以下简称导出对象为 eo. Wireshark 导出对象功能是基于 tap 实现的, 理解其原理需要先理解 Wireshark tap 机制.
2 数据结构
register_eo
eo 注册表项. 需要支持对象导出的协议应在初始化(如epan_init流程中)时进行 eo 事项注册, 如处理函数等. 此注册会生成 eo 注册表项.
// epan/export_object.c
struct register_eo {
int proto_id; /* protocol id (0-indexed) */
const char* tap_listen_str; /* string used in register_tap_listener (NULL to use protocol name) */
tap_packet_cb eo_func; /* function to be called for new incoming packets for SRT */
export_object_gui_reset_cb reset_cb; /* function to parse parameters of optional arguments of tap string */
};
// epan/export_object.h
/** Structure for information about a registered exported object */
typedef struct register_eo register_eo_t;
其中,
- proto_id: 协议 id
- taplisten_str: eo tap listener 名称, 比如 HTTP 协议对应的是 _http_eo
eo_func: eo tap 处理函数, 原型是
typedef tap_packet_status (*tap_packet_cb)(void *tapdata, packet_info *pinfo, epan_dissect_t *edt, const void *data);
此函数非常关键, 会由 tap 机制在合适的时机调用, 用于将协议数据提取出来, 暂存在 eo 数据结构中. 不同协议的 eo_func 一般不同, 如 HTTP 协议为
http_eo_packet
.registered_eo_tables
全局 eo 注册表, 以红黑树实现, 查找 key 为协议过滤名字符串, 如
http
. 注意不是http_eo
, 那是 tap listener 的名字字符串.// epan/export_object.c
static wmem_tree_t *registered_eo_tables = NULL;
[register_export_object](#L14OR)
函数会根据参数生成新的 eo 注册表项, 并将它添加到全局注册表registered_eo_tables
中.eo_opts
eo 全局选项, 以哈希表实现, key 为协议名, value 为保存目录.
// ui/cli/tap-exportobject.c
static GHashTable* eo_opts = NULL;
export_object_entry_t
表示协议实际解析出的数据, 如 HTTP 中传输的文件等.
// epan/export_object.h
typedef struct _export_object_entry_t {
guint32 pkt_num;
gchar *hostname;
gchar *content_type;
gchar *filename;
/* We need to store a 64 bit integer to hold a file length
(was guint payload_len;)
XXX - we store the entire object in the program's address space,
so the *real* maximum object size is size_t; if we were to export
objects by going through all of the packets containing data from
the object, one packet at a time, and write the object incrementally,
we could support objects that don't fit into the address space. */
gint64 payload_len;
guint8 *payload_data;
} export_object_entry_t;
export_object_list_t
tap listener 结构体中包括类型为
void*
的 tap 上下文数据, 不同类型 tap 可能是不同的, 对于 eo, 它的实际类型为export_object_list_t
:// epan/export_object.h
typedef struct _export_object_list_t {
export_object_object_list_add_entry_cb add_entry; //GUI specific handler for adding an object entry
export_object_object_list_get_entry_cb get_entry; //GUI specific handler for retrieving an object entry
void* gui_data; //GUI specific data (for UI representation)
} export_object_list_t;
在报文处理过程中, tap 机制会调用
tap_packet_cb
类型的回调函数, 它的第一个实参就是这里所说的”tap 上下文数据”, 对于 eo 则就是export_object_list_t
指针了. 联系上下文, 也就是register_eo
结构体中的eo_func
成员, 具体到 HTTP 协议就是http_eo_packet
函数.
此结构体中,
- addentry: 把导出数据添加到链表, 这链表就是下文的”导出数据表”
export_object_list_gui_t
中的链表. add_entry 的实际函数是 _ui/cli/tap-exportobject.c 中的object_list_add_entry
- getentry: 把链表中的导出数据取回来. get_entry 的实际函数是 _ui/cli/tap-exportobject.c 中的
object_list_get_entry
gui_data:
export_object_list_gui_t
指针export_object_list_gui_t
导出数据会添加到导出数据表中.
// ui/cli/tap-exportobject.c
typedef struct _export_object_list_gui_t {
GSList *entries;
register_eo_t* eo;
} export_object_list_gui_t;
其中,
entries: 同一协议的多个导出数据会挂到 entries 列表中
- eo: 初始化注册的 eo 注册表项
数据结构关系
3 流程
eo 流程主要分 3 个阶段:
- 注册: 协议需要注册自己的 eo 处理函数, 而主程序(如 TShark)需要注册 tap listener
- 数据提取: 协议在解析过程中提出数据, 并暂存到内部数据结构中
- 数据导出: 将提取好的数据写入文件等
整体流程见下文中的图, 结合图看可以更容易理解.
注册
register_export_object
协议应在初始化时调用此函数进行 eo 注册, 其中的 export_packet_func
非常关键, 是协议特定的数据导出回调函数. 此函数会新建 eo 注册表项, 并添加到全局表中, 并注册 tap.
// epan/export_object.c
int
register_export_object(const int proto_id, tap_packet_cb export_packet_func, export_object_gui_reset_cb reset_cb)
{
register_eo_t *table;
DISSECTOR_ASSERT(export_packet_func);
table = wmem_new(wmem_epan_scope(), register_eo_t);
table->proto_id = proto_id;
table->tap_listen_str = wmem_strdup_printf(wmem_epan_scope(), "%s_eo", proto_get_protocol_filter_name(proto_id));
table->eo_func = export_packet_func;
table->reset_cb = reset_cb;
if (registered_eo_tables == NULL)
registered_eo_tables = wmem_tree_new(wmem_epan_scope());
wmem_tree_insert_string(registered_eo_tables, proto_get_protocol_filter_name(proto_id), table, 0);
return register_tap(table->tap_listen_str);
}
此函数的返回值是 tap 句柄, 之后实际提取协议数据时要用到.
在 HTTP 中的注册示例:
// epan/dissectors/packet-http.c
void
proto_register_http(void)
{
...
/*
* Register for tapping
*/
http_tap = register_tap("http"); /* HTTP statistics tap */
http_follow_tap = register_tap("http_follow"); /* HTTP Follow tap */
...
register_follow_stream(proto_http, "http_follow", tcp_follow_conv_filter, tcp_follow_index_filter, tcp_follow_address_filter,
tcp_port_to_display, follow_tvb_tap_listener);
http_eo_tap = register_export_object(proto_http, http_eo_packet, NULL);
}
其中 http_eo_packet
就是 HTTP 协议对就的 eo 回调函数, 其具体实现见下文.
eo_tap_opt_add
向 eo 传递命令行选项, 语法是 <protocol>,<destdir>
. eo 选项会被添加到全局的 eo_opts
.
// ui/cli/tap-exportobject.h
/* will be called by main each time a --export-objects option is found */
gboolean eo_tap_opt_add(const char *optarg);
// ui/cli/tap-exportobject.c
gboolean eo_tap_opt_add(const char *option_string)
{
gchar** splitted;
if (!eo_opts)
eo_opts = g_hash_table_new(g_str_hash,g_str_equal);
splitted = g_strsplit(option_string, ",", 2);
if ((splitted[0] == NULL) || (splitted[1] == NULL) || (get_eo_by_name(splitted[0]) == NULL))
{
fprintf(stderr, "tshark: \"--export-objects\" are specified as: <protocol>,<destdir>\n");
fprintf(stderr, "tshark: The available export object types for the \"--export-objects\" option are:\n");
eo_list_object_types();
}
else
{
gchar* dir = (gchar*)g_hash_table_lookup(eo_opts, splitted[0]);
/* Since we're saving all objects from a protocol,
it can only be listed once */
if (dir == NULL) {
g_hash_table_insert(eo_opts, splitted[0], splitted[1]);
g_free(splitted);
return TRUE;
}
else
...
}
g_strfreev(splitted);
return FALSE;
}
此函数在 TShark 的 main 函数中执行, 具体是在命令行解析的时候, 如果用户指定了 --export-objects
选项就执行:
// tshark.c
int
main(int argc, char *argv[])
{
...
case LONGOPT_EXPORT_OBJECTS: /* --export-objects */
if (strcmp("help", optarg) == 0) {
fprintf(stderr, "tshark: The available export object types for the \"--export-objects\" option are:\n");
eo_list_object_types();
exit_status = EXIT_SUCCESS;
goto clean_exit;
}
if (!eo_tap_opt_add(optarg)) {
exit_status = INVALID_OPTION;
goto clean_exit;
}
...
}
start_exportobjects
声明开启 eo 操作. 这会对全局 eo 选项(一个哈希表)中的每一项调用 exportobject_handler
函数.
// ui/cli/tap-exportobject.c
static void
exportobject_handler(gpointer key, gpointer value _U_, gpointer user_data _U_)
{
GString *error_msg;
export_object_list_t *tap_data;
export_object_list_gui_t *object_list;
register_eo_t* eo;
eo = get_eo_by_name((const char*)key);
...
tap_data = g_new0(export_object_list_t,1);
object_list = g_new0(export_object_list_gui_t,1);
tap_data->add_entry = object_list_add_entry;
tap_data->get_entry = object_list_get_entry;
tap_data->gui_data = (void*)object_list;
object_list->eo = eo;
/* Data will be gathered via a tap callback */
error_msg = register_tap_listener(get_eo_tap_listener_name(eo), tap_data, NULL, 0,
NULL, get_eo_packet_func(eo), eo_draw, NULL);
if (error_msg) {
...
}
}
void start_exportobjects(void)
{
if (eo_opts != NULL)
g_hash_table_foreach(eo_opts, exportobject_handler, NULL);
}
调用这个函数会调用 register_tap_listener
, 使得 eo 与 tap 机制彻底绑定起来. 此函数在 TShark 的 main 函数中执行:
// tshark.c
int
main(int argc, char *argv[])
{
...
prefs_apply_all();
/* We can also enable specified taps for export object */
start_exportobjects();
...
}
提取数据
在报文的处理过程中, 注册时构建好的 eo - tap 联动机制将发挥作用, 完成协议数据的提取. 具体来说, 协议解析器与 tap 机制要完成各自任务:
- 协议解析器: 解析报文, 在准备好协议数据时, 根据这些数据构造 eo 信息, 然后调用
tap_queue_packet
, 把 eo 信息加入 tap 队列. - tap 机制: 在实际解析报文前初始化 tap 队列, 而在之后遍历 tap 队列完成处理. 在处理 tap 队列的过程中, 会调用到协议注册的 eo 处理函数.
协议解析器的处理示例如 HTTP:
// epan/dissectors/packet-http.c
static int
dissect_http_message(tvbuff_t *tvb, int offset, packet_info *pinfo,
proto_tree *tree, http_conv_t *conv_data,
const char* proto_tag, int proto, gboolean end_of_stream)
{
...
if (datalen > 0) {
...
/* Save values for the Export Object GUI feature if we have
* an active listener to process it (which happens when
* the export object window is open). */
if(have_tap_listener(http_eo_tap)) {
eo_info = wmem_new(wmem_packet_scope(), http_eo_t);
eo_info->hostname = conv_data->http_host;
eo_info->filename = conv_data->request_uri;
eo_info->content_type = headers.content_type;
eo_info->payload_len = tvb_captured_length(next_tvb);
eo_info->payload_data = tvb_get_ptr(next_tvb, 0, eo_info->payload_len);
tap_queue_packet(http_eo_tap, pinfo, eo_info);
}
...
}
...
}
此时已经把 HTTP 报文解析完毕, 于是根据解析出的数据构造 http_eo_t
, 并将其加入 tap 队列.
tap 机制包含在协议解析流程中, 如 epan_dissect_run_with_taps
:
// epan/epan.c
void
epan_dissect_run_with_taps(epan_dissect_t *edt, int file_type_subtype,
wtap_rec *rec, tvbuff_t *tvb, frame_data *fd,
column_info *cinfo)
{
wmem_enter_packet_scope();
tap_queue_init(edt);
dissect_record(edt, file_type_subtype, rec, tvb, fd, cinfo);
tap_push_tapped_queue(edt);
/* free all memory allocated */
wmem_leave_packet_scope();
}
其中, dissect_record
中调用到上文的 dissect_http_message
, 它在必要时填充 tap 队列; 而 tap_push_tapped_queue 会处理现有的 tap 队列, 对队中每一项调用处理函数(tl->packet()
):
// epan/tap.c
/* this function is called after a packet has been fully dissected to push the tapped
data to all extensions that has callbacks registered.
*/
void
tap_push_tapped_queue(epan_dissect_t *edt)
{
tap_packet_t *tp;
tap_listener_t *tl;
guint i;
/* nothing to do, just return */
if(!tapping_is_active){
return;
}
tapping_is_active=FALSE;
/* nothing to do, just return */
if(!tap_packet_index){
return;
}
/* loop over all tap listeners and call the listener callback
for all packets that match the filter. */
for(i=0;i<tap_packet_index;i++){
for(tl=tap_listener_queue;tl;tl=tl->next){
tp=&tap_packet_array[i];
/* Don't tap the packet if it's an "error packet"
* unless the listener has requested that we do so.
*/
if (!(tp->flags & TAP_PACKET_IS_ERROR_PACKET) || (tl->flags & TL_REQUIRES_ERROR_PACKETS))
{
if(tp->tap_id==tl->tap_id){
if(!tl->packet){
/* There isn't a per-packet
* routine for this tap.
*/
continue;
}
...
/* So call the per-packet routine. */
tap_packet_status status;
status = tl->packet(tl->tapdata, tp->pinfo, edt, tp->tap_specific_data);
...
}
}
}
}
}
这里 tap listener 的报文处理函数 packet
, 就是在上文的 start_exportobjects 中配置好的, 因为其中调用 register_tap_listener
注册了回调函数.
对于 HTTP, 这个函数就是 http_eo_packet
:
// epan/dissectors/packet-http.c
static tap_packet_status
http_eo_packet(void *tapdata, packet_info *pinfo, epan_dissect_t *edt _U_, const void *data)
{
export_object_list_t *object_list = (export_object_list_t *)tapdata;
const http_eo_t *eo_info = (const http_eo_t *)data;
export_object_entry_t *entry;
if(eo_info) { /* We have data waiting for us */
/* These values will be freed when the Export Object window
* is closed. */
entry = g_new(export_object_entry_t, 1);
entry->pkt_num = pinfo->num;
entry->hostname = g_strdup(eo_info->hostname);
entry->content_type = g_strdup(eo_info->content_type);
entry->filename = eo_info->filename ? g_path_get_basename(eo_info->filename) : NULL;
entry->payload_len = eo_info->payload_len;
entry->payload_data = (guint8 *)g_memdup(eo_info->payload_data, eo_info->payload_len);
object_list->add_entry(object_list->gui_data, entry);
return TAP_PACKET_REDRAW; /* State changed - window should be redrawn */
} else {
return TAP_PACKET_DONT_REDRAW; /* State unchanged - no window updates needed */
}
}
此函数根据报文解析时得到的 HTTP 导出对象信息构造导出数据对象(entry), 并把它挂到全局链表上, 可参考上文的数据结构关系图来帮助理解.
HTTP 导出对象信息:
// epan/dissectors/packet-http.c
/* Used for HTTP Export Object feature */
typedef struct _http_eo_t {
guint32 pkt_num;
gchar *hostname;
gchar *filename;
gchar *content_type;
guint32 payload_len;
const guint8 *payload_data;
} http_eo_t;
可见包括解析出的 Host, Content-Type 字符串, 以及对应的数据与长度.
导出数据
整个 pcap 文件解析完成后(本文不涉及网卡抓包场景), TShark 会调用 draw_tap_listeners
, 最终导致之前提取的数据写入文件.
if (draw_taps)
draw_tap_listeners(TRUE);
此函数遍历所有 tap listeners, 调用其绑定的 draw
回调函数:
// epan/tap.c
/* This function is called when we need to redraw all tap listeners, for example
when we open/start a new capture or if we need to rescan the packet list.
It should be called from a low priority thread say once every 3 seconds
If draw_all is true, redraw all applications regardless if they have
changed or not.
*/
void
draw_tap_listeners(gboolean draw_all)
{
tap_listener_t *tl;
for(tl=tap_listener_queue;tl;tl=tl->next){
if(tl->needs_redraw || draw_all){
if(tl->draw){
tl->draw(tl->tapdata);
}
}
tl->needs_redraw=FALSE;
}
}
对于 eo tap, 这个 draw
函数就是 eo_draw
, 它是在 start_exportobjects 时配置好的:
// ui/cli/tap-exportobject.c
/* This is just for writing Exported Objects to a file */
static void
eo_draw(void *tapdata)
{
export_object_list_t *tap_object = (export_object_list_t *)tapdata;
export_object_list_gui_t *object_list = (export_object_list_gui_t*)tap_object->gui_data;
GSList *slist = object_list->entries;
export_object_entry_t *entry;
gchar* save_in_path = (gchar*)g_hash_table_lookup(eo_opts, proto_get_protocol_filter_name(get_eo_proto_id(object_list->eo)));
GString *safe_filename = NULL;
gchar *save_as_fullpath = NULL;
guint count = 0;
if (!g_file_test(save_in_path, G_FILE_TEST_IS_DIR)) {
/* If the destination directory (or its parents) do not exist, create them. */
if (g_mkdir_with_parents(save_in_path, 0755) == -1) {
fprintf(stderr, "Failed to create export objects output directory \"%s\": %s\n",
save_in_path, g_strerror(errno));
return;
}
}
while (slist) {
entry = (export_object_entry_t *)slist->data;
do {
g_free(save_as_fullpath);
if (entry->filename) {
safe_filename = eo_massage_str(entry->filename,
EXPORT_OBJECT_MAXFILELEN, count);
} else {
char generic_name[EXPORT_OBJECT_MAXFILELEN+1];
const char *ext;
ext = eo_ct2ext(entry->content_type);
g_snprintf(generic_name, sizeof(generic_name),
"object%u%s%s", entry->pkt_num, ext ? "." : "", ext ? ext : "");
safe_filename = eo_massage_str(generic_name,
EXPORT_OBJECT_MAXFILELEN, count);
}
save_as_fullpath = g_build_filename(save_in_path, safe_filename->str, NULL);
g_string_free(safe_filename, TRUE);
} while (g_file_test(save_as_fullpath, G_FILE_TEST_EXISTS) && ++count < prefs.gui_max_export_objects);
count = 0;
eo_save_entry(save_as_fullpath, entry);
g_free(save_as_fullpath);
save_as_fullpath = NULL;
slist = slist->next;
}
}
此函数遍历导出数据对象链表, 对每一项执行导出文件名构造, 写入文件等事务.
整体流程
图例:
- 实线箭头: 一般流程, A->B
- 虚线箭头: 函数嵌套调用等, A contains B
- 黄色背景: 相关的数据结构
参考
- Wireshark源码: epan/export_object.h
- Wireshark源码: ui/cli/tap-exportobject.h
- Wireshark源码: epan/dissectors/packet-http.c
- Wireshark源码: tshark.c
- Wireshark原理: tap