When it comes to cancer, one of the best predictors of recovery is how early the disease is diagnosed, so it is a pity many people turn to their search engine long before they turn to their doctor when they have symptoms suggestive of serious lung disease.
Of course the vast majority of persistent coughs are bronchitis rather than lung cancer, but Microsoft researchers have discovered that, using machine learning to juggle a number of factors, they could predict with up to 40% accuracy which users making concerning searchers would actually end up with a cancer diagnosis.
Using a massive anonymized database of 5 million searchers and working backwards from users who had searches such as “I have just been diagnosed with lung cancer,” which are then followed up with behaviors that provide evidence of a recent diagnosis, such as multiple queries on treatment options and side effects, they have found using a number of criteria they could predict who would actually develop cancer up to a year in advance.
Some searches, such as those related to hoarseness and cigarettes, were obvious red flags, but Microsoft’s machine learning labs were also able to identify factors deduced from the data, such as age, gender and location, which indicated areas with economic deprivation, high smoking levels, older houses with higher radon levels, and frequent long distance travel as relevant pointers to the disease.
“Here, we are not just looking at the text of the queries; we also consider the locations that people are in when they issue these queries and we tie that back to contextual risk factors linked to those locations,” says study co-author Ryen White, chief technology officer for health intelligence at Microsoft Health in Redmond, Washington.
Using their model they could identify 1.5% to 40% of likely victims a year in advance, with the accuracy depending on how many false positives they would tolerate, ranging from a low 1 in 1000 to a very low 1 in 100,000.
“People tend to whisper their health concerns into search engines on a regular basis,” said co-author Eric Horvitz, technical fellow and managing director of Microsoft’s research lab in Redmond. “This kind of data can serve as a complement to more formal clinical information.”
The study has just been published in the JAMA Oncology and extends research that team members published last June on the feasibility of using the text of questions people ask search engines to predict diagnoses of pancreatic cancer. It will not find any immediate application but may inform future screening systems that can catch cancers earlier in their progression.
With Microsoft however committed to solve cancer computationally, can I suggest a pop-up from Cortana suggested a visit to your doctor would not go far amiss and may actually save a few lives.
Read the full report at Microsoft here.