CGM将于2019年5月15日星期三举办第53期在线沙龙活动。我们邀请了BenchSci公司的陈启翔博士来做一期题为 “The quest for the ideal antibody, AI driven product knowledge extraction from scientific publications” 的学术报告。YouTube直播开始时间是太平洋时间 (Pacific Time) 下午6点。请点击文末阅读原文,前往CGM网站观看直播。国内的朋友也请留意CGM官网,录播视频会在直播结束以后上线哦!

嘉宾简介:

【CGM在线沙龙预告】找不到理想抗体?AI来帮忙 - 图2

陈启翔博士 (Dr. David Qixiang Chen) 2018年于多伦多大学Institute of Medical Science获得博士学位,现任加拿大BenchSci公司CTO,AI部门主管。研究领域主要包括:
▪ Neuroscience research with applied neuroimaging and machine learning.
▪ Neuroimaging in neuropathic pain.
▪ Anatomical visualization of small fiber tracts for pre-surgical planning.
▪ High angular resolution diffusion magnetic resonance imaging and tractography.
▪ Cortical/subcortical segmentation and structural analysis.

摘要概述(编者译):

抗体是免疫系统的重要组成部分,是生物医学实验中广泛使用的试剂。缺乏数据引发的抗体滥用导致了近50%的实验失败,使新药研发行业蒙受了大量的时间和金钱的损失。

人们常常通过查询科学出版物寻找抗体或其他科技产品的最佳使用证据,而现有的搜索工具 (pubmed, google scholar) 非专为产品搜索设计。我们通过联合应用文本挖掘,生信和机器学习,从开/闭源文献中解码了抗体实验的语境。

科学家们喜欢通过与文献报道中的图像做对比来评价自己的实验结果。我们从九百万份文献、三十万个语境的四百万个抗体中鉴别出正确的产物,并将其与三千七百万个蛋白质名称作出关联,借此我们将抗体实验的图像和语境联系了起来。我们分别在Spark和Elasticsearch上进行了计算和搜索,使用了深度神经网络来评判产品和语境使用关系,并以鉴别图片的技术子面板 (CNN) 微提升数据准确性。

BenchSci以连通科学idea与outcome为己任。我们通过消除科学迭代周期中的障碍来加快新发现的步伐。事实证明,在小团队必须通过深度学习来扩大数据处理规模的前提下,机器学习是不可或缺的研究路径。

摘要概述(嘉宾提供):

The antibody, an important part of the immune system, is a widely used reagent of biomedical experiments. Misuse of antibodies, often due to insufficient data, are responsible for up to 50% of failed experiments, and incur enormous cost in time and money for drug discovery.

The best evidence of antibody use, and other scientific products, are found in scientific publications. Existing publication search tools (pubmed, google scholar) are not meant for products. We decoded antibody experimental contexts from open and close-source publications with a combination of text mining, bioinformatics, and machine learning.

At the end of the day, scientists prefer to judge experimental outcome by inspecting the publication images. We linked antibody contexts to its figure image, by identifying the correct product from amongst 4M antibodies, within 9M publications, across 300K contexts, and associate them with over 37M protein aliases. This complex task was computed using Spark and the search served on Elasticsearch. Deep neural nets were used to judge product/context usage relationship (embeddings, LSTM with attention) , and to identify technique subpanel (CNN) in figures to fine-tune data accuracy.

The mission for BenchSci is to close the gap between idea to outcome in science. We accelerate the pace of discoveries by removing roadblocks in the scientific iteration cycle. ML has proven to be indispensable, where the scaling of data processing with a small team could only have been achieved through the use of deep learning.

参考文献:

Baker, M., 2015. Reproducibility crisis: Blameit on the antibodies. Nature News, 521(7552), p.274.
LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. nature, 521(7553), p.436.

关注我们:

【CGM在线沙龙预告】找不到理想抗体?AI来帮忙 - 图3
CGM网站: cgmonline.co
YOUTUBE频道: Chinese Genomics Meet-up
微信群: 请添加微信好友beckyhao13,并注明”CGM”