CATALOG DNA

Mode
3 min readJun 27, 2021

Storing data in DNA

Overview:

Think about the amount of time we spend online. The amount of data that is being generated from our online activity is mind boggling, and that amount is only going to increase. According to CATALOG, the world will generate 160 zettabytes of data in 2025. That is equivalent to 1 billion terabytes or 1 trillion gigabytes. As the world generates massive amounts of data, there needs to be a new technology that is capable of storing all of this information. CATALOG is building the world’s first DNA-based platform for massive digital storage and computation.

Problem:

As mentioned above, the world is producing more data than we are capable of storing. Conventional storage mediums such as flash-drives or hard-drives do not have the longevity, data density, or cost efficiency to meet global demand. Data centers are also using more electricity annually than the entire country of Austria.

Solution/Technology:

You can store roughly 1 exabyte (1 billion gigabytes) of information in DNA that is about the size of a sugar cube. 1 exabyte is roughly the amount of information that you can store in a standard data center, which is roughly the size of a football field for context.

Example: You take a picture. That picture is saved as a digital file like a .jpeg. This file is then turned into a string of binary that is then broken into different blocks. The blocks get translated into a pool of words, which are a pool of pre-synthesized DNA sequences that represent it. Now the building blocks are in a bunch of different wells where a robot combines the right building blocks together, and an enzyme is added to stick the pieces of DNA together. Finally, the robot can transfer all these assembled identifiers so they can be pooled together in one tube.

CATALOG has created its own dictionary of sorts where each identifier has its own inherent order to it. To retrieve the picture, you can take the DNA out of storage, run it through a sequencing machine, and the computer can read it right back into a .jpeg from the binary.

Financing:

September 2020: $10MM Series A led by Horizon Ventures

July 2018: $4MM Seed-2 round led by NEA

July 2017: $5MM Seed round led by NEA

(Source: https://www.crunchbase.com/organization/catalog-technologies/company_financials)

Summary:

The idea of storing information using DNA has been around for awhile now. The bottleneck to this idea has been the cost of DNA synthesis. Applying Mike Maples Jr. framework of backcasting, we can see that the technology inflection of creating a unique alphabet and leveraging pre-synthesized DNA has led us to a point where DNA storage is now possible in a cost efficient way. In addition, the benefits of DNA as a medium of storage: stability, durability, and it reduces the carbon emissions that are emitted from data centers. DNA storage is also “decentralized” which makes it more safe from a cybersecurity standpoint as compared to the data centers that are centralized and owned by 2 or 3 companies. The core business makes sense, and then you essentially get a free option on any new discoveries that come from this new storage and computation method.

Questions:

  1. How will the retrieval been done in an efficient way once at scale?
  2. How will the liquid be stored?
  3. Video hints at new applications or discoveries could come from this new innovation. Dig deeper on this.

Additional links:

--

--

Mode

Explorations into the Life Sciences; OKC/Boston