Revolutionizing Rare Disease Awareness: Building CureNet, an AI-Powered Medical GPT

diksha nasa
Jan 15, 2025
9 min read

Introduction

CureNet is a cutting-edge AI-driven medical GPT designed to improve access to accurate and reliable information about rare diseases. By utilizing a Retrieval-Augmented Generation (RAG) framework and dynamic learning, CureNet ensures users have access to up-to-date, source-verified medical information. Its innovative features, including real-time Google search integration, robust content chunking, semantic similarity scoring, and fallback mechanisms, address critical challenges in healthcare information accessibility.

GitHub Link : https://github.com/dikshanasa/CureNet

Prototype :

Understanding the Problem Space 🚧

💡 User Pain Points

Through user research and competitive analysis, key challenges in rare disease information were identified:

📑 Fragmented Information: Patients and caregivers spend hours scouring multiple websites and research papers, often finding incomplete or outdated information.

⚠️ Low Trust in Online Sources: Unregulated blogs and forums blur the lines between credible data and misinformation, making it harder for users to make informed decisions.

🔍 Limited Awareness of Treatment Options: Many rare diseases lack FDA-approved treatments, and access to emerging therapies or clinical trials requires specialized knowledge.

⏳ Delayed Diagnoses: Complex medical jargon and conflicting advice prolong diagnostic journeys, leaving patients uncertain about the next steps.

🌐 Accessibility Gaps: Existing tools often neglect inclusive design, making them less usable for individuals with diverse needs or non-technical backgrounds.

MVP: Delivering Immediate Value with Precision 🛠️

The CureNet MVP was crafted with a sharp focus on core functionality that could immediately address the most pressing needs of users searching for rare disease information. The goal was clear: create a solution that brings tangible value to the user without the distraction of feature overload, ensuring a rapid feedback loop to validate the approach and refine future iterations.

MVP Goals 🎯

Deliver Reliable and Accurate Medical Information
- Integrate real-time Google search to ensure up-to-date and credible medical references.
- Leverage RAG (Retrieval-Augmented Generation) to eliminate misinformation and ensure citation-backed responses.
Instill Trust and Transparency
- Incorporate source citations from trusted medical entities like Mayo Clinic, Cleveland Clinic, and NIDDK with every response.
- Display confidence scores to communicate the reliability of the response, allowing users to make informed decisions.
Maximize User Experience
- Build a user-friendly chat interface that simplifies interaction.
- Reduce response latency through efficient content chunking and semantic relevance checks.

MVP Features 📊

User Query & Response
- Simple queries answering disease overviews, symptoms, and general treatment options.
- Instant source links for user verification and trust-building.
Dynamic Retrieval System
- Real-time scraping of top Google search results, pulling relevant medical content.
- Content chunking to ensure relevant information is extracted for every query.
Confidence Scoring & Citations
- A confidence score is displayed with every answer to help users gauge the response's accuracy.
- Citations from at least two trusted medical sources are included for full transparency.
Fallback Mechanism
- In cases where chunk-based answers fall short, the system reverts to a broader search or analysis for additional context.
- Ensures users are never left without guidance.

Success Metrics 📈

Accuracy: Aiming for a confidence score of at least 70% for every query answered.
User Satisfaction: Collecting feedback to ensure that responses are clear, helpful, and provide enough actionable information.
Response Time: Keeping the average response time under 30 seconds.
Scalability: Ensuring the system can handle multiple user queries simultaneously without compromising performance.

Roadmap: Developing CureNet Step-by-Step 🚀

CureNet is designed to provide accessible, accurate, and up-to-date information about rare diseases, addressing critical gaps in healthcare information accessibility. This roadmap outlines the structured approach taken to develop and refine the platform, from initial concept to the creation of the MVP, while focusing on accuracy, user trust, and scalability.

Phase 1: Conceptualization & Planning 📝

Objective: Define the problem space and identify the core features required to address user pain points.

Key Steps:

Problem Definition
- Identified key challenges in accessing rare disease information, such as fragmented data and information overload.
- Noted that many research websites and databases don’t prioritize SEO, making it difficult for users to find reliable sources, potentially leading to misinformation.
Feature Prioritization
- Focused on integrating real-time Google search for dynamic retrieval of up-to-date information.
- Chose to implement Retrieval-Augmented Generation (RAG) to provide citation-backed, relevant, and accurate responses.
- Prioritized transparency by including confidence scores and citations from trusted sources.
Technical Planning
- Opted for a simple web interface to facilitate easy user interaction.
- Chose technologies like Node.js and TensorFlow.js for backend processing and semantic relevance scoring.

Phase 2: MVP Development 🛠️

Objective: Develop a functional prototype that delivers valuable, real-time medical information with credible source verification.

Key Steps:

Backend Development
- Integrated Google search scraping to retrieve up-to-date medical content.
- Built a processing pipeline for content chunking and summarization to provide concise, relevant information.
- Implemented TensorFlow.js to score the semantic relevance of content in response to user queries.
RAG Pipeline
- Developed a system that combines the retrieved content with generative AI to provide coherent and contextually relevant responses.
- Designed fallback mechanisms to ensure meaningful answers even when data is sparse or unclear.
Frontend Development
- Designed a simple chat interface that allows users to easily interact with the system.
- Displayed confidence scores and source links alongside responses to enhance transparency and trustworthiness.
Testing & Validation
- Conducted testing to ensure the accuracy of the responses and the relevance of the sources cited.
- Measured key performance metrics such as response time, confidence scores, and source consistency.

Outcome:

Delivered a working MVP capable of answering basic rare disease queries with verified sources and transparency features.
Achieved an average response time of ~28 seconds for the initial queries tested.

Phase 3: Refinement & Optimization 🔧

Objective: Improve system performance, response accuracy, and scalability for future use cases.

Key Steps:

Performance Optimization
- Enhanced content chunking and summarization processes to reduce response latency.
- Implemented caching for frequently queried diseases to improve speed.
Accuracy Improvement
- Fine-tuned semantic similarity scoring to better match user queries with relevant content.
- Enhanced fallback mechanisms to handle ambiguous queries more effectively.
UI/UX Enhancements
- Refined the user interface to improve the chat experience.
- Enhanced the display of source citations and confidence scores based on user feedback from initial testing.
Scalability Testing
- Conducted stress testing to ensure stable performance under concurrent user queries.

Outcome:

Improved accuracy with an average confidence score of 73%.
Reduced response time variability, ensuring consistency in results.

Phase 4: Future Roadmap 🌟

While the MVP is fully functional, there are several potential enhancements that can be pursued to expand the platform's capabilities:

Multi-language Support :Add multi-language support to ensure accessibility for a wider user base.
Symptom Analysis :Introduce a symptom checker to guide users toward relevant diseases or conditions.
Performance Tuning :Continue to optimize performance, reduce latency, and improve the semantic relevance of responses.

CureNet Workflow: A Seamless Journey from Query to Response 🔄

The CureNet platform offers a streamlined experience from the moment a user initiates a query to receiving the most accurate, transparent, and contextually relevant medical information. Below is an exhaustive breakdown of how the workflow is structured to optimize both user experience and data accuracy:

1. User Input: Natural Language Query 📝

The user begins by entering a query into the chat interface, such as:

“What are the symptoms of Duchenne Muscular Dystrophy?”
“Are there any clinical trials for Fabry disease?”

The system is designed to handle natural language queries, providing flexibility for users to phrase their questions in ways that feel intuitive and conversational.

2. Pre-Processing and Query Understanding 🤖

Once the query is submitted, it undergoes pre-processing:

Tokenization and Parsing: The system breaks down the sentence into smaller components (e.g., entities, keywords, and topics). This allows CureNet to understand the core request and context, like identifying the disease name or symptom type.
Query Categorization: CureNet identifies the specific area of focus (e.g., disease overview, symptoms, clinical trials) so that the search results are tailored to that topic.

3. Dynamic Retrieval: Real-Time Web Scraping 🌐

After categorizing the query, CureNet retrieves relevant content from credible medical sources:

Google Search Integration: The system performs a real-time Google search using a custom search engine designed display only trusted medical websites such as Mayo Clinic, Cleveland Clinic, and NIDDK and scrapes the relevant data from the top results.
Reputable Source Filtering: It filters out unreliable sites, ensuring only well-established, peer-reviewed sources are pulled into the system.
Content Extraction: Relevant text from the search results is extracted and processed to be used in the next step.

4. Chunking and Relevance Matching ⚙️

Once the raw data is retrieved, the content undergoes chunking and relevance analysis:

Content Chunking: The extracted content is broken into smaller, digestible chunks, such as individual sentences or paragraphs. This allows for precise analysis of the data and ensures that the most pertinent information is accessible.
Semantic Similarity Scoring: Using TensorFlow.js and the Universal Sentence Encoder, each chunk is scored for semantic relevance to the original query. This helps identify the most accurate pieces of information related to the user's question.

5. Response Generation and Summarization ✍️

The relevant content chunks are then summarized:

Top Chunks Selection: The highest-scoring chunks (the most relevant pieces of content) are selected.
Summarization: CureNet automatically generates a concise, user-friendly response based on the top chunks. This response is designed to answer the user's query in a clear and understandable way, while covering key aspects of the information requested.
Confidence Score Calculation: For every response, CureNet calculates a confidence score based on the semantic similarity and consistency of the retrieved chunks. The confidence score indicates how certain the system is about the response’s accuracy.

6. Source Citation and Transparency 🔗

Transparency is crucial in healthcare, and CureNet ensures that every response is backed by credible sources:

Citation Linking: Every response includes citations from at least two trusted medical sources, providing direct links for further reading. This ensures users can verify the information.

7. Fallback Mechanism: Dealing with Ambiguities ⚠️

In cases where the system is unable to retrieve enough data or if the summarized chunks do not fully answer the query, the fallback mechanism is activated:

Expanded Search: CureNet performs an additional search or checks broader contexts to fill any information gaps.
Fallback Responses: If the content still lacks sufficient detail, users are provided with a general response or a recommendation for next steps (e.g., consulting a healthcare professional or exploring additional sources).

8. Final Answer Delivery 🚚

Once the response is generated and citations are attached, the final answer is displayed to the user:

Clear and Concise Answer: The response is presented in a conversational tone, ensuring ease of understanding for both patients and caregivers.
Verification Links: Links to the referenced medical sources are prominently displayed, enabling users to verify the information independently.

Workflow Benefits

This seamless workflow serves several key purposes:

Precision and Relevance: Ensures users receive the most accurate and relevant information based on their specific query.
Transparency and Trust: Provides citations to promote trust in the system.
Real-Time Updates: The dynamic search ensures the most up-to-date information is provided to users, crucial in the ever-evolving medical field.
User-Centered Design: A simple chat interface, clear responses, and a transparent feedback loop prioritize the user experience.

The CureNet workflow, from the initial user query to response delivery, is designed to be efficient, transparent, and user-centric, ensuring users can easily navigate the complexities of rare disease information. This process not only demonstrates technical prowess but also highlights a keen focus on user satisfaction and trust—critical aspects for any healthcare product manager to consider.

Key Learnings from the Journey 📚

Building CureNet has been a deeply rewarding experience, filled with important lessons that shaped the product and the process. Here are some of the key takeaways:

1. Understanding User Needs is Everything: Focusing on the real struggles of patients and caregivers was critical. They often face the challenge of navigating fragmented and hard-to-find medical information. By understanding these pain points, I could ensure the product focused on providing reliable, easy-to-understand content that directly addressed their needs.

2. Clarity is Crucial in Complex Subjects: Medical information can be overwhelming, but it’s essential to present it in a way that’s simple to digest. Balancing accuracy with clarity was a constant challenge, but it taught me how important it is to make complex topics more accessible without sacrificing the quality of the information.

3. Continuous Improvement Makes a Big Difference: Building an MVP and refining it based on real user feedback was an eye-opening process. It showed me how valuable it is to iterate quickly and adjust the product as needed. Listening to users, testing assumptions, and refining features led to better results in less time.

4. Performance and Scalability Matter: As the product grew, I faced technical challenges, particularly around performance and scalability. Ensuring that the system could handle multiple queries while delivering fast, reliable answers was a key learning. Optimizing the backend and improving response times made a noticeable difference in the product’s quality.

5. Transparency Builds Trust: Including confidence scores and citations from trusted sources was one of the best decisions. It reinforced the importance of being transparent with users, allowing them to feel more confident in the information provided. In healthcare, trust is everything, and being open about where the data comes from helps users make better-informed decisions.

This journey has taught me how critical it is to combine technical solutions with a deep understanding of the user experience. Each step, from gathering feedback to tackling technical challenges, has been a valuable part of the learning process.

Revolutionizing Rare Disease Awareness: Building CureNet, an AI-Powered Medical GPT

Introduction

Prototype :

Understanding the Problem Space 🚧

💡 User Pain Points

MVP: Delivering Immediate Value with Precision 🛠️

MVP Goals 🎯

MVP Features 📊

Success Metrics 📈

Roadmap: Developing CureNet Step-by-Step 🚀

Phase 1: Conceptualization & Planning 📝

Key Steps:

Phase 2: MVP Development 🛠️

Key Steps:

Outcome:

Phase 3: Refinement & Optimization 🔧

Key Steps:

Outcome:

Phase 4: Future Roadmap 🌟

CureNet Workflow: A Seamless Journey from Query to Response 🔄

1. User Input: Natural Language Query 📝

2. Pre-Processing and Query Understanding 🤖

3. Dynamic Retrieval: Real-Time Web Scraping 🌐

4. Chunking and Relevance Matching ⚙️

5. Response Generation and Summarization ✍️

6. Source Citation and Transparency 🔗

7. Fallback Mechanism: Dealing with Ambiguities ⚠️

8. Final Answer Delivery 🚚

Workflow Benefits

Key Learnings from the Journey 📚

Recent Posts

Comments