In wealth management, we’re always looking for better ways to bridge the distance between a client’s financial plan and their actual household portfolio. In the last week, the narrative around artificial intelligence (AI) in wealth management shifted from “chatbots” to “agents” with the arrival of specialized “plugins”. For the financial advisor the promise is intoxicating; automated rebalancing, instant tax-loss harvesting and personalized proposals.
I’ve spent more than a decade building and using risk factor models and portfolio optimization systems. When you spend that much time in the plumbing and mathematics of these workflows, you start to develop a healthy respect for how easily small errors can compound.
As I’ve experimented with these new AI tools, I’ve noticed a pattern I’m calling the Logic-Layer Gap. I don't have all the answers on where this technology is headed, but after looking under the hood of the GitHub repositories corresponding to the recent announcements, I have a few observations.
Markdown vs. Math
When I pulled the repository for these new financial services plugins,1 I wasn’t terribly surprised to find they aren't exactly deterministic math engines. They are what Anthropic calls "skills", sophisticated markdown files that provide the AI with instructions on how to simulate a fiduciary workflow.
It’s an amazing orchestration of processes, but it’s still a workflow wrapper around a probabilistic engine. While it can act like a portfolio manager, it doesn't necessarily handle the sequential, hardened logic required for technical portfolio construction.
Bucketing
In my own tests, I’ve found instructional agents often struggle with what researchers call "multi-hop reasoning”.2 For example, the AI might calculate a rebalance by always treating short-term and long-term capital gains as independent buckets. It applies the respective tax rates to each bucket but in my case overlooked the crucial cross-category netting step, whereby a long-term loss can offset a short-term gain, or vice-versa.
I’m not sure if this is a training data limitation or a reasoning gap, but it highlights why a human gut-check is still so valuable. Without a deterministic calculator, the AI is effectively pattern-matching the tax code. Or even worse, the latest interpretation of the tax code on Reddit.
Heuristics
I’ve also observed that as problems become more complex, AI models often revert to "heuristics"; simple rules of thumb.3
In a tax-aware rebalancing context, you might see the AI prioritize a "greedy" liquidation, or selling the largest loser to harvest a quick win. But wealth management optimization is focused on a set of multivariate tradeoffs. The best move is often path- or risk-dependent. A smaller loss taken today might be a more risk-aware portfolio decision. The plugin provides a fascinating glimpse into the future of automated workflows, but the part where the math meets the mandate requires the deterministic oversight of a human practitioner.
Granularity
Finally, there is the issue of data abstraction. AI models are designed to find the most coherent narrative, which often leads them to round or summarize complex data sets to make them more digestible.4
It’s much easier for a markdown-based skill to discuss portfolio-level drift than to parse the cost-basis, acquisition date and wash-sale implications of 500 individual tax lots. Yet, the real value in tax-aware portfolio management is found in those specific details. The AI is a brilliant summarizer, but we must ensure the granular lot-level context doesn’t get lost in the summary.
A Collaborative Path Forward
I use large language models every day. They help me pressure-test ideas and accelerate development across our products. I'm not a skeptic of this technology. I'm a practitioner of it.
But that daily use is exactly what makes the Logic-Layer Gap so visible to me. I can see where the seams are because I'm running my hands along them constantly. The AI plugin is extraordinary at orchestrating a workflow, drafting a narrative, or helping me reason through a complex problem. It is not yet a substitute for a deterministic engine that can reliably parse and solve highly complex, tax-aware household wealth optimization problems.
The risk isn't that advisors will adopt AI. They should. The risk is that the packaging is so polished that the absence of a hardened calculation layer goes unnoticed until a client's tax bill reveals it.
So here's what I'd advocate: treat the AI as the orchestration layer, not the calculation layer. Let it draft, summarize and route. But demand that the math underneath is deterministic, auditable and testable. If your vendor can't show you the engine under the hood, you don't have a rebalancer—you have a ghost in the machine making probabilistic guesses with your client's capital.
The nuance has always been where we provide our greatest value. That hasn't changed. The stakes of ignoring it just got higher.
Important Disclosures & Definitions
1 Anthropic PBC (2026). Financial Services Plugins Repository (Skills and MCP). GitHub.
2 Internal Quantitative Research (2026). Comparative Analysis of Deterministic Netting Logic vs. Probabilistic LLM Netting Calculations.
3 Apple Machine Learning Research (2025). GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.
4 Association for Computational Linguistics (ACL) (2025). Numerical Abstraction and Context Compression in Transformer-Based Architectures. ACL Anthology.
AAI001109 03/03/2027