“Here’s what really breaks my heart about this technological gap: the moment when a newly diagnosed rare cancer patient sits down to find answers and community, only to be met with silence from both directions.”
-Ben Green, Crossroad Media
There’s a troubling gap in how both search engines, such as Google and Bing, and large language models (LLMs), such as Gemini and CoPilot, handle queries related to rare cancers. Despite all the advances in artificial intelligence (AI) and search, patients with rare cancers still struggle to find the information they desperately need. The problem isn’t just frustrating—it’s potentially dangerous. [1, 2, 3]
The core issue? Neither search engines nor LLMs were built with rare diseases in mind. They’re optimized for the masses, designed to deliver quick answers to the questions millions of people ask every day. But, by definition, rare cancers don’t fit that model.
When There's Simply Not Enough Data
Think about how AI learns. These systems need massive amounts of information to identify patterns and deliver accurate results. But rare cancers don’t generate that kind of data footprint.
The numbers tell the story. When a cancer affects only a tiny fraction of the population, there’s naturally less research published, fewer clinical trials conducted, and minimal patient-generated content online. AI models trained on these limited datasets can struggle to capture the full spectrum of rare disease presentations, leading to inaccuracies in query results and responses to specific prompts. [4, 5, 6]
This data poverty hits LLMs particularly hard. While search engines can at least point you to what little information exists, LLMs are trained on text corpora that inherently underrepresent rare diseases. When you ask an LLM about a rare cancer, it’s working from a training set that may contain only a handful of relevant documents—or worse, conflates your rare cancer with more common conditions that share similar terminology.
Stacked Against the Uncommon
These data limitations are compounded by algorithmic bias, specifically in how search engines and LLMs actually rank and prioritize information. Search engines run on popularity metrics—PageRank, click-through rates, and incoming links determine what surfaces at the top of results. Content about common cancers generates massive traffic and earns countless backlinks; rare cancer information barely registers on that scale. Some search engines even prune low-frequency terms from their indexes to stay efficient, meaning the precise medical terminology you need to find information about a rare cancer might not even make it into the index. [1, 7, 8]
LLMs face a parallel problem. Their training prioritizes the most common language patterns, so when you ask about a rare cancer, the model draws on its much stronger knowledge base of common cancers—potentially giving you generic or misleading information. Now layer on the patient’s perspective: you’re dealing with complex, unfamiliar medical terminology. You might not even know the right words to search for. Early signs of rare cancers frequently mimic common conditions, so a symptom search returns pages about common diseases while burying the rare cancer you might actually have. Rare diseases also present differently from patient to patient, creating data inconsistencies that algorithms can’t easily pattern-match. [9, 10, 11, 12, 13]
The Double Isolation of a Rare Cancer Diagnosis
Here’s what really breaks my heart about this technological gap: the moment when a newly diagnosed rare cancer patient sits down to find answers and community, only to be met with silence from both directions.
You turn to Google first. You type in your diagnosis—maybe it’s a name you’ve never heard before, that the oncologist had to write down for you. The results are sparse. A few medical journal abstracts you can’t access. A clinical trial listing from 2018 that’s no longer recruiting. Maybe a Reddit thread with three comments from 2019. Nothing current. Nothing that sounds like your situation. Nothing that tells you what to expect next week, next month, or next year.
So you try ChatGPT or Claude, hoping AI can fill in the gaps. You ask what to expect, how others have coped, and what questions you should be asking your doctor. The response sounds confident but generic—survival rates that might be for a different cancer entirely, treatment descriptions that feel oddly vague, suggestions that don’t quite match what your oncologist said. You have no way to know if you’re getting information about your specific rare cancer or if the AI is extrapolating from more common diseases.
“You have no way to know if you’re getting information about your specific rare cancer or if the AI is extrapolating from more common diseases.”
The compounding effect is isolating in a way that’s hard to overstate. At the exact moment you most need community, validation, and specific guidance, both our most powerful information technologies fail you. You can’t find the blog post from someone who’s been where you are. You can’t find the patient forum where people discuss the side effects of the specific drug you’re about to start. You can’t even find enough information to know whether what you’re experiencing is normal.
This isn’t just an information gap—it’s an empathy gap. When technology can instantly connect you to millions of people’s experiences with common cancers but leaves you in the dark about your rare one, you’re left feeling like you’re the only person this has ever happened to. And that psychological burden, on top of everything else, shouldn’t be part of the diagnosis.
The Trust Gap
Even when information about rare cancers exists online, search engines struggle to surface the most reliable sources. Patients end up overwhelmed by irrelevant results or, worse, questionable information when they need trusted guidance most. [14]
With LLMs, the trust issue takes a different form. These models can sound authoritative even when they’re uncertain or working from limited data. For rare cancers, where the AI has seen minimal training examples, there’s a real risk of confident-sounding but inaccurate responses. Patients can’t easily distinguish between “the AI knows this well” and “the AI is doing its best with very little information.”
There Are Paths Forward
The good news? Innovation is happening in this space, though it needs to accelerate. Specialized search tools like FindZebra are tackling the problem head-on by focusing exclusively on curated databases of rare diseases. These tools cut through the noise that general search engines can’t filter out.
Newer AI approaches are learning to work with smaller datasets by incorporating prior medical knowledge and data augmentation techniques. This helps improve both pattern recognition and diagnostic accuracy for rare diseases. Advanced natural language processing is also showing promise—these systems can better interpret how patients actually describe their symptoms and point them toward more relevant information. [3, 15, 16, 17, 18]
For LLMs specifically, we need models that can acknowledge uncertainty, that can distinguish between “I have strong information about this” and “I’m extrapolating from limited data.” We need systems that can connect patients to the sparse-but-valuable human-created content that does exist, rather than generating plausible-sounding but generic responses.
“Organizations like TargetCancer Foundation (TCF) are working to close this gap … by building the kind of trustworthy, accessible content that both patients and AI systems can rely on.”
We’re not there yet, but the recognition that rare disease patients deserve better search and AI experiences is finally driving meaningful change in how these tools are built. Organizations like TargetCancer Foundation (TCF) are working to close this gap—not only by funding rare cancer research into diseases such as cholangiocarcinoma and cancer of unknown primary, but by building the kind of trustworthy, accessible content that both patients and AI systems can rely on. Through initiatives like the TCF-001 TRACK clinical trial and comprehensive patient-empowerment resources, TCF is creating the very information infrastructure that search engines and LLMs need to serve rare cancer patients better. The question is whether the systems can learn fast enough for the patients who need it now.
About The Author
Ben Green is the founder of Crossroad Media, where he leads precision digital strategy for health, life sciences, and biotechnology organizations. His work at the intersection of AI-driven discoverability + outcomes, traditional SEO, and content strategy focuses on making rare disease information easier to find, easier to trust, and harder for algorithms to overlook. He serves as a consultant to TargetCancer Foundation.
[1] https://www.sciencedirect.com/science/article/abs/pii/S1386505613000166
[2] https://www.sciencedirect.com/science/article/pii/S0169260724001020
[3] https://pmc.ncbi.nlm.nih.gov/articles/PMC3932942/
[4] https://www.nature.com/articles/s41467-025-59478-8
[5] https://www.technologyreview.com/2025/01/21/1110192/why-its-so-hard-to-use-ai-to-diagnose-cancer/
[6] https://www.conquer.org/news/understanding-rare-cancers-what-are-they-and-why-do-we-need-fund-them
[7] https://ppc.land/google-reports-65-surge-in-visual-searches-as-ai-mode-drives-multimodal-adoption/
[8] https://pmc.ncbi.nlm.nih.gov/articles/PMC10280215/
[9] https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000269
[10] https://pmc.ncbi.nlm.nih.gov/articles/PMC10883845/
[11] https://pmc.ncbi.nlm.nih.gov/articles/PMC4101532/
[12] https://www.tgen.org/patients/center-for-rare-childhood-disorders/stories/how-symptoms-of-rare-diseases-can-mimic-common-conditions/
[13] https://www.hilarispublisher.com/open-access/unusual-neurological-presentations-case-reports-shedding-light-on-rare-conditions-98810.html
[14] https://pmc.ncbi.nlm.nih.gov/articles/PMC11735183/
[15] https://www.tandfonline.com/doi/full/10.4161/rdis.25001
[16] https://medium.com/medtrack/from-health-information-data-overload-to-efficiency-how-medtracks-ai-powered-search-will-ad8c46965716
[17] https://www.ucsf.edu/news/2024/12/429136/can-ai-improve-diagnosis-rare-diseases
[18] https://www.youtube-nocookie.com/embedwatch?v=3uleV4mik_M
