The field of genomics has been thriving during the past decade. After the first successful sequencing of the genome in 2003 within the framework of the Human Genome Project (1990-2003), genomic data of various organisms have been collected.
To successfully compile and store this data for further use, big data solutions are being implemented.
To understand what is genomics and what purpose it serves, it is important to understand what a genome is.
A genome is “the complete set of genes or genetic material present in a cell or organism”. In other words, a genome is the entire DNA sequence. It contains the genetic material of an organism and all of its DNA. The latter is a four-letter (A, C, G, T) code. The order of these letters alters from person to person, from organism to organism, making all of us unique.
Genomes are stored in cells. Each cell in the body contains the identical copies of nearly 3 billion DNA base pairs. All these DNAs form the genome.
Genomes are referred to as ‘the set of instructions needed to create an organism’. They are like an instruction manual that makes and runs the organism.
The human organism is very complicated, and so is the human genome. To study the entire set of nucleic acid sequences for humans, the science of genomics came to the surface.
The purpose of genomics is to understand, interpret and make use of DNA sequencing in order to come up with solutions to problems that affect humanity and the planet in general.
Genomics is an interdisciplinary science in the field of biology. It studies the complete set of the genetic material of an organism. And here lies the primary difference between genetics and genomics.
While genetics is also a science in the field of biology and is considered one of its main branches, it is concerned with studying the inheritance of traits of organisms and how they vary within a population. While genetics is focused on the behaviour of genes, genomics is concerned with the genome of the organism, which, as mentioned above, is the entire DNA sequence. This implies, that genomics studies all genes and their interaction. In the meantime, genetics is concerned with studies that involve a single gene.
Genetics, as a biology branch, has much deeper roots. This field of science was introduced by Gregor Mendel in 1860s. The term genomics originated more than a century later, in 1986, by geneticist Tom Roderick. However, it existed as a concept since the 70s of the 20th century. Fred Sanger was the first to sequence the complete set of DNAs of a virus and thus, gave start to the practice of sequencing and genome mapping.
In 1980, as a lot of information was being collected, bioinformatics (an interdisciplinary science between biology and computer science) and data storage started to develop. This set the ground for the bond that was to develop between big data and genomics.
Genomics deals with large amounts of data resulted from sequencing, mapping, assembling and analyzing genomes.
In the last century, it was taking genomics researchers a lot of time to be able to compile a certain amount of data. Now, however, genome sequencing, using high-tech techniques and tools, generates a lot of data. And although this data is extremely helpful and leads to major advancements, researchers are having troubles with storing it.
Being able to easily access data sets that store the entire genome information of human beings has transformed the industry of genomics.
Here’s how data science and innovative solutions have changed genomics. The Human Genome Project was conducted for 13 years. The project started in the last decade of the 20th century and came to an end in 2003. During this period, USD 3 billion was spent on the research. Also, thousands of researchers were involved in the process. The result was sequencing the first-ever human genome.
Now, a person’s genome can be mapped 30 times in just 40 hours. What is important is that scanning errors are now eliminated. And the cost is USD 1000.
Big data helped genomics to do a big leap and rapidly advance into new developments in researching DNS sequences.
Big data contribute to the advancements and rapid development of genomics as it provides:
Big data in genomics sets the ground for personalized healthcare, something that until recently was considered to be far off in the future.
Personalized healthcare is a result of personalized genomics. Researchers and medical professionals collect each person’s data and can analyze it and compare to the genomic data of other organisms. The result of this ‘cooperation’ between data storages and genomics is an individual approach to every person.
Being able to organize and study large amounts of biological data brings forth fascinating opportunities in the field of genomics:
Firstly, this allows researchers to annotate genomes, i.e. to interpret them to able to mark their features in a sequence of DNA.
Secondly, being able to collect genomic data of different organisms in separate databases allows researchers to compare them and establish relationships – see how they are alike or different.
Although big data has solved a lot of problems in many industries, big data integration in any field brings forth numerous issues, and genomics is not an exception.
The amount of genomic data that is generated regularly is insane. To get an idea of how much data is produced, imagine all of your genes transformed into sentences. This makes a genome that is a book long – a book filled ten thousands of words.
As the amount of data grows, challenges become more and more obvious.
The primary challenge concerning the genomic data and its growth is how it is going to be collected, stored and analyzed. Notwithstanding all the technical advancements, new technological solutions are yet to be introduced. Solutions that will be capable of dealing with extremely big, massive amounts of data.
The space that a genome of one human being occupies is 100 gigabytes. As sequencing genomes does not take 15 years anymore and is a matter of hours, it becomes obvious that more space is going to be needed. The storage needed for genome sequencing will grow beyond gigabytes – it is going to occupy not even petabytes, but exabytes. If you didn’t know, one exabyte equals 1 billion gigabytes.
It has been estimated that by 2025, researchers collecting genomic data will need 40 exabytes.This amount of storage capacity will be needed for human genomic data only.
The main solutions to be anticipated for solving the technical problems are the creation of genomic archives and using cloud computing service. As for many professionals, the latter is more realistic.
In 2014 Google launched Cloud Life Sciences (formerly Google Genomics), a computing service that allows researchers to transfer DNA data into its server, conduct experiments and process the biomedical data at scale.
Another big concern is the security of genomic data.
People’s genomic data is the most private information about their past, present and future. This is why this information requires being treated cautiously and kept with the highest level of confidentiality.
To protect the data from getting in the hands of third parties or getting hacked, most of the companies dealing with genomic data incorporate encryption systems.
Back in 2017, MyHeritage revealed that hackers had breached 92,283,889 accounts signed up for the research. Although the hackers did not reach the actual genetic information, this occurrence was a close call for many companies. Many of them have introduced blockchain to provide the security of sensitive data.
Aside from security, the confidentiality of data is another challenge that genomic data collectors should keep in the centre of attention. Although companies use anonymized data, if the information gets in the hands of third parties, anyone’s personality can be revealed utilizing their genomic data.
Although the possibility of storing big data has positively impacted the field of genomics, it has also come with several challenges, as no one expected DNA sequencing to become so ‘easy’ in such a short time. Putting aside the challenges, the genomic data collectors and researchers of the field are more than certain that the biggest changes related to the healthcare are yet to come.
Back to Insights