Genome privacy leakage from omics data
發布時間 :2018-06-11  閱讀次數 🫚:5902

主講人:胡智強

主講人簡介:胡智強博士,現為美國UC Berkeley博士後😴,2006年博士畢業於EON体育4平台生物信息學與生物統計學系。2018年以共同第一作者身份在Nature發表3000水稻泛基因組論文。

報告時間:2018-06-13   12:55-15:30

地點:東中院  1-304教室

聯系人🧔‍♀️:韋朝春 ccwei@sjtu.edu.cn

 

講座摘要🏃🏻‍➡️:

Sharing genomes without personal identifiers is common practice. However, recent studies revealed the risk of re-identifying people from their genomes, or attached quasi-identifiers, such as sex, birthdate and zip code. The additional availability of an individual’s RNA-seq data, has implications for privacy, as it may be linked to the genome, potentially allowing the person’s privacy to be breached. For example, sex and ethnicity information may be inferred directly from a genome, and the study may provide zip code. This could be linked to RNA-seq data from a diabetes study with attached birthdates and income. These combined quasi-identifiers may uniquely identify the person, and the study reveals the person’s disease status. RNA-seq reads contain genetic variants, and thus can be directly linked to the genome. To avoid this risk, some researchers now release gene expression, isoform expression and exon read count data instead of raw reads.

However, gene expression can also be linked to the genome based on expression QTLs. Using a Bayesian framework, we found that it is feasible to predict genomic variants from relative isoform expression. Based on GTEx splicing QTLs data, using relative isoform expression from 30 genes, we could identify the target genome within a pool containing hundreds of individuals with >96% accuracy. It is possible to identify the target genome of an RNA-seq dataset from millions of individuals using more splicing QTLs. Researchers have proposed to eliminate the risk of gene-expression-based linking attacks by adding noise to the gene expressions, based on the observation that only a few genes enable linkage. However, we found that there are now many more such genes than previously reported. We find that expression data enables the re-identification of target genome from a pool containing billions of genomes. Our result implies that mitigation of the linking risk by adding noise would severely abrogate biological entity of the data, since the data will no longer be biologically meaningful when over half of gene expressions are modified. Our study also implies that other kinds of “omic” data, including DNA modification and protein metabolite levels, may also leak genome privacy.

EON体育4平台专业提供:EON体育4平台✴️、EON体育4EON体育4登录等服务,提供最新官网平台、地址、注册、登陆、登录、入口、全站、网站、网页、网址、娱乐、手机版、app、下载、欧洲杯、欧冠、nba、世界杯、英超等,界面美观优质完美,安全稳定,服务一流,EON体育4平台欢迎您。 EON体育4平台官網xml地圖