The healthcare sector is brimming with data that holds the potential to revolutionize medical practices. However, mining this data is no easy task. To overcome this challenge, Truveta, a healthcare technology startup, has developed a large AI-powered model that can sift through medical texts from more than 20,000 clinics and 700 hospitals.
Truveta’s model is designed to extract patient diagnoses, medications, lab results, and other data from sources like physician’s notes and insurance claims. These sources often contain messy, unstructured text filled with abbreviations, jargon, and misspellings. Despite these challenges, the model achieves an accuracy rate of over 90%, according to the company.
The Truveta Language Model was introduced in a recent preprint publication and further detailed in a white paper and blog post. The model is trained on vast quantities of medical texts from the company’s 28 health system partners, which represent 16% of patient care in the U.S. Additionally, the company updates its datasets daily.
Jay Nanduri, Truveta’s Chief Technology Officer, explained that the sheer volume of data processed daily and made available for researchers in a timely fashion makes it a highly complex and significant data problem.
Healthcare and life sciences customers use Truveta’s platform to study events like adverse reactions to medicines or patient seizure frequency. Cancer researchers might use the platform to detect disease progression and the need for a change in treatment.
The model “normalizes” the unstructured data, enabling the system to understand that texts like “Acute COVID-19” and “COVID19 _ acute infection” mean the same thing. Truveta has access to 3.1 billion patient encounters and 2.4 billion medication orders thanks to its partnerships with major health systems.
Truveta’s model is distinct from GPT-4, the generative large language model from Microsoft-backed OpenAI, which produces content based on prompts. While GPT-4 can provide valuable insights, Truveta’s specialized training on medical datasets sets it apart and provides more accurate and relevant information in the healthcare context.
In addition to its AI model, Truveta partners with other companies to build applications on top of its system. Users can create generative or extractive tools using Truveta’s data, as well as discriminative tools, such as models for predicting cancer.
Truveta’s collaborators include Pfizer, which uses the platform to monitor the safety of COVID-19 vaccines and therapies, and Seattle-based Alpine Immune Sciences, which employs Truveta to match patients to clinical trials. Truveta’s Language Model was built and trained over two years, working in tandem with other technology efforts at the company to ensure privacy and data standardization across multiple health systems.
As Truveta continues to expand its network of health systems and refine its AI model, the company is poised to become a significant player in the healthcare sector, offering accurate and efficient data analysis for researchers and medical professionals alike.