What does mCodeGPT do?

Knowledge Bases and Ontologies in Context

Knowledge Bases (KBs) and ontologies encode structured domain knowledge that can be queried and reasoned with. General-purpose KBs, like Wikidata, contain broad contextual knowledge across various domains, while domain-specific KBs, such as Cancer Ontology (mCode STU3), are curated repositories detailing specific areas like cancer.

Large language model for Knowledge Extraction

Despite the most human knowledge being communicated via natural language (e.g., clinical reports), historical methods have made this knowledge largely opaque to machines. Recent progress in Natural Language Processing (NLP) techniques, including Large Language Models (LLMs), show promise in extracting knowledge from text but have limitations, including generalization issues, sensitivity to negations, and hallucinations. This can be improved by domain-specific ontology.

Integrating LLM in KB Construction and Validation

Rather than merely presenting the raw outputs from LLMs to the end-users, we have devised an integrated approach that combines KBs with LLMs. This integration allows for a joint extraction of both entities and relations, ensuring a holistic understanding of the text. The use of ontologies provides a robust framework for logical reasoning, serving as a potent mechanism for validation prior to assimilating these facts into the KB.

Automated Approach for Schema Population

Use hierarchical extraction of semantics to populate custom schemas and ontology models from text, mCodeGPT generates structured instances from clinical narratives, integrating the flexibility of LLMs with the reliability of publicly available KBs and ontologies. It is designed to handle complex schemas with nested structures and leverages a wide array of ontologies and lexical grounders for extraction tasks.

Unlock the Power of Cancer Data Standardization

Ready to enhance your research outcomes with comprehensive and accurate cancer data standardization? Contact mCodeGPT to discover how our innovative solutions can revolutionize your approach to cancer-related data analysis. 


This project is partially supported by NCI U01CA274576, CPRIT RR180012, and UTHealth internal funding