Proprietary Data Becomes Weapon in LLM Competition
The companies that process the world’s payments have spent decades building a record of how money moves across merchants, geographies and account types.
Every transaction carries a timestamp, a merchant identity, an amount and an account history. Unlike the text and images that train most artificial intelligence (AI) systems, that data is structured and tied directly to real economic outcomes.
Now, payment processors are training AI models on it.
Mastercard announced in March that it is building a generative AI foundation model trained on billions of anonymized payment transactions, positioning the model as an insights engine for payments and commerce, with applications in cybersecurity, loyalty programs, personalization, portfolio optimization and data analytics.
Two weeks later, Plaid introduced a transaction foundation model the company is building to power what it calls “intelligent finance.”
Payments Networks Move to Own the AI Layer
A foundation model is a large-scale AI system trained on broad data that can be adapted to a range of downstream tasks. Popular generative AI systems are built on large language models trained on unstructured data such as text, video and photos.
According to Mastercard, its model is a large tabular model, a deep-learning network trained on structured transaction datasets. The company plans to scale training to hundreds of billions of transactions and add merchant location data, fraud data, authorization data, chargeback data and loyalty program data over time.
Plaid said its model was built to address a specific problem in transaction data: the same merchant can appear in dozens of string variations across institutions, and the same string can mean different things depending on context. Plaid trained its model on large-scale, anonymized transaction data to resolve that ambiguity at scale, creating what the company describes as a shared backbone for tasks, including entity recognition, merchant normalization, categorization and risk signaling.
One-of-a-Kind Data
Mastercard processes billions of transactions annually across a global network, while Plaid’s network covers thousands of financial institutions across the United States, giving them both a data moat and network advantage no LLM creator has.
According to Mastercard, their model could reduce the need to build, train and maintain thousands of separate AI models for different markets, use cases and customers.
Pahal Patangia, head of global industry business development for payments at Nvidia, said in a statement reported by PYMNTS that financial services needs specialized AI models capable of capturing the full complexity and scale of global commerce in real time.
Cybersecurity is one of the first areas where Mastercard is applying the model.
According to the company, existing cybersecurity AI models rely on data scientists to manually add features that help identify patterns such as spikes in purchase activity. The new model analyzes the same data with limited human input, learning independently which characteristics matter.
In testing, Mastercard said the model outperformed standard industry machine learning techniques and was better able to identify legitimate but infrequent transactions, such as a wedding ring purchase, which tend to trigger false positives in existing systems.
Plaid disclosed performance gains across products drawing on the new model. According to the FinTech, income classification improved by 48%, loan payment detection improved by 14% and bank fee classification improved by 22%. Plaid also noted that the model captures economic signals that allow it to disambiguate merchants operating across multiple verticals and handle edge cases that simpler systems cannot.
Morgan Stanley noted in a November 2025 analysis of AI and financial information services that proprietary financial datasets are difficult to replicate and that re-creating decades of verified historical data with consistent identifiers is both technically challenging and prohibitively expensive for any outside player.
CFOs Are Watching From a Distance
The executives who would deploy domain-specific financial AI are still setting limits on how far they will let it run.
According to PYMNTS Intelligence research published in December, all 60 CFOs surveyed report using some advanced form of AI for at least one finance task, but deployment stays concentrated in structured, rules-based functions where outcomes are measurable and the stakes of an error are contained.
A separate PYMNTS Intelligence study found that 45% of CFOs use AI to monitor working capital and cash flows, while adoption lags in forecasting and cross-system coordination due to data integration challenges and trust concerns. CFOs show higher willingness to expand AI’s role in analytical tasks but draw back when cross-system coordination or external risk is present.
The same research found that 52% of CFOs would accept AI-generated recommendations on liquidity and payment timing, but fewer than one in three would automate month-end close orchestration or multi-system workflow coordination.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.
The post Proprietary Data Becomes Weapon in LLM Competition appeared first on PYMNTS.com.