Pyxis Learns Lipid Structure: De Novo ID for Lipidomics
Lipidomics is producing some of the most interesting biology in the last decade.
Lipids are identifying early predictors of Alzheimer's [5], refining cardiovascular risk stratification [4], and mapping how individual lipid species drive cancer biology [6]. The same pattern runs through oncology, neurology, and immunology [7, 8, 9, 10].
Yet, in practice, reference library coverage sets the ceiling on which species can be identified, and, for a large portion of the lipid space, experimental reference spectra simply do not exist. As a result, the scope of discoverable biology remains constrained.
Most lipids can't be identified
Lipid identification has a lot of room to grow. LIPID MAPS catalogs over 60,000 lipid species [1], but public spectral libraries contain experimental MS2 for only a small fraction of them. Even within existing libraries, the tools disagree. A 2024 cross-platform comparison found that MS-DIAL and Lipostar, run on the same dataset with default settings, agreed on only 14% of lipid identifications. Including MS2 data raised agreement to 36% [3].
Part of why this has persisted is that reference data is genuinely hard to generate. Building libraries requires synthesizing or purchasing lipid standards, which is expensive and slow. And the combinatorial space is large enough that comprehensive coverage through standards alone isn't realistic.
Many tools sidestep the library problem with rules-based identification, matching spectra against predefined fragmentation patterns. But these rules rely on the presence or absence of diagnostic ions without accounting for their relative intensities, making it difficult to distinguish true matches from coincidental fragment overlaps.
The first fully de novo lipid ID model
Today, we're expanding our Large Spectral Model into the first commercially available de novo lipid identification model, capable of identifying lipids no reference library has ever contained.
The model learns the relationship between lipid structure and fragmentation behavior directly from spectral data. Given a new MS2 spectrum, it generates the lipid structure most likely to have produced it without consulting a library. It's the same model behind our metabolite ID work, extended to a new chemical domain.
Expanding a well-characterized lipidome by 40%
We tested Pyxis on the raw data from AdipoAtlas, a comprehensive lipidomics study of human white adipose tissue across lean and obese individuals [11]. The original authors identified over 500 unique lipid species, using three independent lipidomics software tools with manual curation, making AdipoAtlas one of the most thoroughly annotated public lipidomics datasets available.
Pyxis covered every lipid class in the original study and identified 222 lipid species not in the original report, a 40% expansion of the characterized lipidome from the same raw files.
New lipids are only worth finding if they carry signal. We ran differential abundance analysis on the 222 Pyxis-only species, comparing obese and lean individuals in both subcutaneous (SAT) and visceral (VAT) adipose tissue. Many showed clear differences between groups. Fold changes above 2x, p-values below 0.05.
Recovering a known biomarker
One of these IDs is PC 32:3, a known biomarker highly relevant to the biology of obesity that was not reported in the original AdipoAtlas study. PC 32:3 is a lower-abundance phosphatidylcholine, easy to miss against dominant PC species; but it's been a fixture of commercial targeted lipidomics panels for years, with a steady accumulation of evidence tying it to insulin resistance and metabolic disease.
Prior work has linked PC 32:3 to insulin resistance in cultured adipocytes [12], ranked its ratio to related PC species among the most predictive metabolite ratios for insulin resistance across cohorts [13], and shown that higher LPCAT3 activity consumes PC 32:3 and related species to produce the polyunsaturated phospholipids that characterize obese, dysfunctional membranes [14].
In the AdipoAtlas data, PC 32:3 shows clear differential abundance between obese and lean groups across both SAT and VAT, consistent with the prior literature and surfaced here by a model that didn't need to have seen it before.
Per-sample abundance distributions for PC 32:3 in SAT and VAT, split by obese (orange) and lean (blue) groups. PC 32:3 was identified only by Pyxis and not reported in the original AdipoAtlas study, yet shows clear differential abundance between metabolic phenotypes, consistent with prior evidence linking this lipid to insulin resistance and obesity.
Getting started
The C18 AdipoAtlas data is loaded in Pyxis as a demo. Sign up today at app.matterworks.ai/sign-up to explore the data behind this post.
If you want to run your own data, simply sign up and upload. We’ve also put together a biological interpretation guide (link) with some tips on using the results you see in Pyxis.
See what Pyxis finds in your data
Upload a dataset you've already analyzed. You'll see similar results, faster, and you may even learn something new.
Sign up for PyxisReferences
[1] LIPID MAPS: update to databases and tools for the lipidomics community. Nucleic Acids Research, 2024, 52(D1), D1677. Link
[2] Kind, T. et al. LipidBlast in silico tandem mass spectrometry database for lipid identification. Nature Methods, 2013, 10, 755–758. Link
[3] von Gerichten, J. et al. Challenges in Lipidomics Biomarker Identification: Avoiding the Pitfalls and Improving Reproducibility. Metabolites, 2024, 14(8), 461. Link
[4] Hilvo, M. et al. Development and validation of a ceramide- and phospholipid-based cardiovascular risk estimation score for coronary artery disease patients. European Heart Journal, 2020, 41(3), 371–380. Link
[5] Mapstone, M. et al. Plasma phospholipids identify antecedent memory impairment in older adults. Nature Medicine, 2014, 20, 415–418. Link
[6] Ogretmen, B. Sphingolipid metabolism in cancer signalling and therapy. Nature Reviews Cancer, 2018, 18, 33–50. Link
[7] Wang, N. et al. Lipid metabolism drives dietary effects on T cell ferroptosis and immunity. Nature, 2026. Link
[8] Liu, L. et al. NCBP2 drives colorectal cancer growth and metastasis through LIPG-mediated lipid droplet accumulation. Communications Biology, 2026. Link
[9] Damiza-Detmer, A. et al. Lipid alterations and endothelial dysfunction are associated with multiple sclerosis pathophysiology. Scientific Reports, 2026. Link
[10] Lu, J. et al. Lipidomic profiling identifies key pathways and a 5-lipid panel with high diagnostic efficacy for ischemic stroke. Scientific Reports, 2026. Link
[11] Lange, M. et al. AdipoAtlas: A reference lipidome for human white adipose tissue. Cell Reports Medicine, 2021. Link
[12] Böhm A. et al. Metabolic Signatures of Cultured Human Adipocytes from Metabolically Healthy versus Unhealthy Obese Individuals. PLOS One, 2014. Link
[13] Molnos S. et al. Metabolite ratios as potential biomarkers for type 2 diabetes: a DIRECT study. Diabetologia, 2017. Link
[14] He M. et al. Inhibiting Phosphatidylcholine Remodeling in Adipose Tissue Increases Insulin Sensitivity. Diabetes, 2023. Link