As fund managers, AI models are mediocre — new research finds that they recommend trendy stocks, ignore diversification, and don’t generate excess returns.
In a working paper published by the U.S. National Bureau of Economic Research, a trio of researchers test the portfolio building prowess of AI tools for retail investors — tracking the performance of AI-managed portfolios over the past year. Both actively managed portfolios and portfolios built on “buy and hold” recommendations were included.
The tested models include various versions of ChatGPT, Claude, Gemini and Grok.
Overall, their performance was underwhelming.
Among other things, the researchers find that “AI primarily recommends stocks based on how much media attention firms receive.”
Indeed, when the researchers asked the models to justify their stock picks, they found that the volume of the companies mentions in corporate news was a major driver of their recommendations.
In particular, the models recommended “undiversified portfolios that positively load on momentum, large companies, and low book-to-market firms.”
Specifically, AI largely recommended portfolios that are heavily concentrated in large tech stocks, and ignored many other sectors of the economy, it noted.
“When we ask AI to form portfolios but leave it otherwise unconstrained, AI chooses high beta stocks,” the paper said.
And, when the models were asked for low-beta portfolios, AI “persistently recommends portfolios” at the higher end of the defined range and sometimes even violates the constraints, it found.
“This mirrors similar risk-seeking behaviour documented in the academic literature for investment managers who receive bonus compensation and have limited liability,” the paper noted.
Moreover, these portfolios didn’t generate statistically significant abnormal returns, the researchers found. While the AI models did beat the S&P 500 index, they didn’t earn excess returns when trading costs are considered.
The paper stressed that the research examined how the models might work for a retail investor, rather than a professional investor that might produce different results.
“This is a significant distinction because typical households are well-known to make systematic mistakes and rely on the advice they receive,” it noted.
Based on their initial results, the researchers conclude, “it appears that more oversight is needed to assure that people do not misuse this powerful source of information and experience welfare losses.”