- A new open-source toolkit turns complex phenotypic data into actionable insights, accelerating the discovery of rare disease diagnoses and aids cohort building for cross-disease analytics.
- Pheno-Ranker helps researchers classify patients and find similar cases with just a few clicks, by processing diverse phenotypic data formats and applying semantic similarity algorithms.
- Developed at CNAG with international partners, and now published in BMC Bioinformatics, it is freely available to the global scientific community.
Understanding phenotypic data — the observable characteristics we see in individuals, such as diseases, symptoms, treatments, and even lifestyle factors — is fundamental for advancing clinical research. These data provide crucial insights into how diseases manifest and progress, enabling scientists and clinicians to make more accurate diagnoses, classify patients based on their unique traits, and develop personalised treatment plans. However, analysing and comparing phenotypic information across different patients and cohorts is a complex and time-consuming task.
Addressing this challenge,
the Biomedical Genomics Group at the Centro Nacional de Análisis Genómico (CNAG), led by its director Dr Ivo Gut, has made a significant breakthrough with the launch of Pheno-Ranker. This
novel, open-source toolkit is designed specifically for the scientific community to facilitate the comparison and analysis of phenotypic data. The project was developed by CNAG Biomedical Informatics researchers Manuel Rueda and Ivo Leist, in collaboration with
Pfizer and the University of Granada’s Junta de Andalucía
Centre for Genomics and Oncological Research, Granada (Spain) in the course of the 3TR project (
3tr-imi.eu). The study has been published in
BMC Bioinformatics, making this resource freely accessible to researchers worldwide.
Pheno-Ranker simplifies and streamlines the process of making sense of complex phenotypic profiles. The toolkit is flexible and easy to use: it accepts input files in widely used formats such as JSON, TSV and CSV, which can include free text or structured terms from ontologies such as the Human Phenotype Ontology (HPO). It processes these data through semantic similarity algorithms, allowing researchers to quantify the similarity between individuals and between patients and reference cohorts. The output is presented in clear, ranked lists of similarities, which greatly simplifies the identification of relevant cases, potential diagnoses, and clinically meaningful patterns.
Pheno-Ranker offers two complementary modes of analysis that help researchers explore phenotypic data more effectively. In cohort mode, the toolkit allows scientists to analyse groups of individuals as a whole. This helps to identify patterns, classify patient subgroups, or detect potential outliers within a cohort. For example, when studying patients with a specific disease, cohort mode can reveal subgroups who share certain features or respond differently to treatments, enabling more precise clinical insights. In patient mode, the focus shifts to the individual level. Researchers can compare a single patient's phenotypic profile against an entire cohort, a function especially useful in rare disease research. This mode supports the identification of similar cases and can assist in generating potential diagnoses based on phenotypic similarities. The toolkit’s powerful web-based interface streamlines data calculations and supports direct visual analytics.
According to one of the authors of the study and Bioinformatician at the Biomedical Genomics Group at CNAG: “One of the goals of our lab is to provide solutions to real-world challenges that others face when working with phenotypic and clinical data. A common issue is the need to cluster patients by similarity or identify the closest match for a given patient. That’s why we developed Pheno-Ranker — to allow users, for instance, to find the closest matches for rare disease (RD) patients in resources like OMIM or Orphanet, which together catalog thousands of rare diseases. Unlike a general GenAI or web search, Pheno-Ranker provides statistically meaningful results, making it a more reliable tool for clinical interpretation and decision-making.”
By being open-source, Pheno-Ranker promotes wide adoption and collaboration within the research community. This fosters innovation and accelerates discoveries in clinical genomics and rare disease research worldwide. With Pheno-Ranker, CNAG is contributing a powerful new tool to help translate phenotypic data into actionable insights, ultimately improving patient diagnosis and care.
USEFUL LINKS
- Links to the online documentation, including a Google Colab tutorial, and the tool’s source code are available on the Project home page: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker.
- AI-powered Podcast: https://cnag-biomedical-informatics.github.io/pheno-ranker/what-is-pheno-ranker/#listen-to-the-paper-audio-edition.
- Web App User Interface: https://pheno-ranker.cnag.eu
REFERENCE ARTICLE
Leist, Ivo C., et al. «Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond». BMC Bioinformatics, vol. 25, n.o 1, diciembre de 2024, p. 373. BioMed Central, https://doi.org/10.1186/s12859-024-05993-2.