ÎçÒ¹¾ç³¡

ÎçÒ¹¾ç³¡

The Range of Big Data

Publish: May 28, 2021

Writer Profile

  • Keisuke Kataoka

    School of Medicine Professor, Division of Hematology

    Specialization / Hematology, Cancer Genetics

    Keisuke Kataoka

    School of Medicine Professor, Division of Hematology

    Specialization / Hematology, Cancer Genetics

In recent years, the volume of data in medical sciences and healthcare has been increasing rapidly. The term "big data" first appeared in medical sciences in the 2008 special feature of Nature titled "Science in the Petabyte Era." Particularly in genomic medicine, with the spread of next-generation sequencers, more than an exabyte of data is generated annually, far surpassing the data volumes in astronomy, Twitter, or YouTube, which have traditionally handled big data.

Currently, while being involved in hematology clinical practice at ÎçÒ¹¾ç³¡, I am also affiliated with the National Cancer Center Research Institute, where I work on genetic analysis research of cancer, focusing on blood cancers. In genomic medicine, "cancer" has benefited the most from next-generation sequencers; many large-scale studies have identified various genetic abnormalities that act as cancer drivers and the molecular pathways where they accumulate. Furthermore, drugs targeting these abnormalities (molecular targeted drugs) have been developed in a short period, and there are several cases where they have actually led to improved patient outcomes.

While society has entered an era where it can enjoy the benefits of big data analysis in medical sciences and healthcare, the limitations of the third AI boom, centered on machine learning and deep learning, are also becoming clear as it begins to pass. Fundamentally, big data analysis is a retrospective observational study and is susceptible to various biases. Furthermore, the quality of individual data is a mixture of wheat and chaff, making the selection process crucial.

In actual analysis, the focus is on frequency and correlation analysis, and there are not many situations where causal relationships can be definitively stated. Therefore, the importance of conventional interventional studies through clinical trials and research into disease mechanisms remains unchanged, and they function complementarily with big data analysis.

In Japan, the importance of information science, including big data analysis, has long been emphasized in the fields of medical sciences and healthcare, but the understanding of its essence is insufficient. Currently, large-scale national projects such as the Action Plan for Whole Genome Analysis are underway. To promote the utilization of this big data and maximize its efficacy, it is vital to share the possibilities and limitations of big data¡ªthat is, its range.

*Affiliations and titles are as of the time of publication.