Big Data Search Engine Helps Cancer Patients
- Match cancer patients to cord-blood donors to retrieve stem cells for treatment.
- Create a modern, Internet-based matching system that is fast and easy to use.
- Integrate graph search and big data capabilities to rapidly compute data.
- CordMatch, an Internet-based donor matching system.
- Neo4j, a high-performance, enterprise-level graph database.
- A unique algorithm for a single and double cord searches.
- Enables blood banks, transplant centers and registries to communicate.
- Customizable search options simplify the matching of cord blood units.
- Uses a combination of innovative frameworks and a big data graph.
An innovative big data search platform is helping in the fight against cancer. A tool called CordMatch, by Berlin-based Cytolon, uses big data techniques and a unique matching algorithm to quickly find cord blood matches for cancer patients in need of a stem cell transplant.
Blood from the umbilical cord, or cord blood, is rich in stem cells that can be used to treat diseases such as leukemia, a cancer of the blood cells. The process for matching cord blood units to patients used to take hours. Working with Cytolon, an innovator in stem cell matching, CSC developed an Internet-based system that matches donor cord blood with patients quickly and easily.
Finding a match
Patients suffering from diseases such as leukemia can derive great benefits from stem cells found in cord blood, which have significant advantages over cells isolated from bone marrow. For cancer patients, finding the optimal match is essential for receiving a stem cell transplant that can help build up their immune system.
All humans have unique genetic codes that are structured in a complex system, with millions of relationships between those codes. Every code is connected to another code by a relationship with attributes, and matching donor cord blood to a patient was a lengthy process when using earlier technologies.
Cytolon asked CSC to create a modern, Internet-based matching system that was fast and easy to use. CSC technologists developed a solution that uses a big data-style graph database that matches DNA from a patient’s blood with DNA from the donated cord blood.
“CordMatch is Web-based and uses modern technologies with a graph-based database, which makes the search faster than other applications. We can search databases all over the world and quickly find the right solution for a patient,” says Harald Diehl, CSC’s lead architect on the project.
Diehl says CordMatch is like Facebook in that it uses semantic technology, which encodes meanings separately from data and content files. “Like Facebook, it is ‘who knows who,’ so we can match all corresponding codes to one code.”
Big data meets genetic codes
In order to search and compare millions of cell codes with a patient’s data in less than a few seconds, CSC’s team chose Neo4j, a high-performance, enterprise-level graph database. Graph databases store and represent data by using graph structures with nodes, edges and properties. CordMatch uses a relationship graph database to house more than 2 million codes with 60 million direct relationships between the codes.
The human leukocyte antigen (HLA) system is controlled by genes, and major HLA antigens are essential elements in the immune system. All known HLA codes in the world are stored inside the main graph, and the edges between the HLA nodes are used to identify the relationship and the hierarchy between them. To analyze the codes, CSC developed a unique algorithm for a single and double cord search that leverages existing algorithms to optimize search speed and accuracy.
“The biggest challenge was solving the complexity of the data and the high-level implementation. So we used the most modern frameworks and the most innovative data graph,” says Simon Hanika, a senior application developer for CSC. CordMatch is the first application that uses a combination of innovative frameworks and a big data graph to rapidly compute complex relationships between genetic codes.
CordMatch compares all HLA values, and 6 of 10 values must match for a successful match. New code variants are added to the system every day as they become available from sources such as the World Health Organization.
One word: Success
To develop CordMatch, Cytolon sought a company that had applications expertise, industry experience and global reach. For Thomas Klein, Cytolon’s founder and CEO, CSC proved to be a perfect match. “We were looking for a partner who understood our culture, understood our cutting-edge product, and who was able to transfer IT into the medical area. And we needed a partner in all the countries we are working with our customers, and CSC is more or less in all countries.”
For Klein, CSC also provided the quick reaction time needed to bring such an innovative solution to market quickly. “The biggest difference [from] other companies we have worked with is that the people at CSC react very fast — they respond directly, and we like that direct communication.”
Klein describes Cytolon’s relationship with CSC in one word: “success,” And, he says, “It’s the first time on earth that a personalized medicine platform works in a real existing market, thanks to a very good partnership.”