Fraud and Deception Detection: Text-Based Analysis

Among the many factors we consider as fundamental investors are assessments of a company’s strategy, products, supply chain, employees, financing, operating environment, competition, management, adaptability, and so on. Investment professionals conduct these assessments to increase our understanding, yes, but also to increase our trust in the data and the people whose activities the data measure. If we cannot trust the data and the people who created it, then we will not invest. In short, we must trust management.

Our fraud and deception detection methods are only okay.

But by what repeatable method can we evaluate the trustworthiness of companies and their people? Usually the answer is some combination of financial statement analysis and “trust your gut.” Here is the problem with that:

1. Time and Resource Constraints

Companies communicate information through words more than numbers. For example, from 2009 to 2019, the annual reports of the Dow Jones Industrial Average’s component companies tallied just over 31.8 million words and numbers combined, according to AIM Consulting. Numbers only made up 13.5% of the total.

Now, JP Morgan’s 2012 annual report is 237,894 words. Let’s say an average reader can read and comprehend about 125 words per minute. At this rate, it would take a research analyst approximately 31 hours and 43 minutes to thoroughly read the report. The average mutual fund research analyst in the United States makes around $70,000 per year, according to WallStreetMojo. So that one JP Morgan report costs a firm more than $1,100 to assess. If we are already invested in JP Morgan, we’d perform much of this work just to ensure our trust in the company.

Moreover, quantitative data is always publicly released with a significant time lag. Since a company’s performance is usually disclosed quarterly and annually, the average time lag for such data is slightly less than 90 days. And once the data becomes public, whatever advantage it offers is quickly traded away. Most investment research teams lack the resources to assess every company in their universe or portfolio in near real time, or just after a quarterly or annual report is released.

Conclusion: What is that old line? Oh, yeah: Time is money.

2. Trusting our gut does not work.

Despite the pan-cultural fiction to the contrary, research demonstrates we cannot detect deception through body language or gut instinct. In fact, a meta-analysis of our deception-spotting abilities found a global success rate just 4% better than chance. We might believe that as finance pros we are exceptional. We would be wrong.

In 2017, we measured deception detection skills among finance professionals. It was the first time our industry’s lie detection prowess had ever been put to the test. In short: ouch! Our overall success rate is actually worse than that of the general population: We did not score 54%, we earned an even-worse-than-a-coin-toss 49.4%.

But maybe our strengths are in our own sector. Put us in a finance setting, say on an earnings call, and we’ll do much better, right? Nope, not really. In investment settings, we could detect deception just 51.8% of the time.

There is more bad news here (sorry): Finance pros have a strong truth bias. We tend to trust other finance pros way more than we should. Our research found that we only catch a lie in finance 39.4% of the time. So that 51.8% accuracy rate is due to our tendency to believe our fellow finance pros.

One other tidbit: When assessing statements outside of our domain, we have a strong 64.9% deceptiveness bias. Again, this speaks to our industry’s innate sense of exceptionalism. In an earlier study, our researchers found that we believe we are told 2.14 lies per day outside of work settings, and just 1.62 lies per day in work settings. This again speaks to the truth bias within finance.

Finally, we believe we can detect lies within finance at a 68% accuracy rate, not the actual 51.8% measured. Folks, this is the very definition of overconfidence bias and is delusion by another name.

Conclusion: We cannot trust our guts.

3. Auditors’ techniques audit numbers.

But what about auditors? Can they accurately evaluate company truthfulness and save us both time and money? Yes, company reports are audited. But auditors can only conduct their analyses through a micro-sampling of transactions data. Worse still, auditors’ techniques, like ours, are largely focused on that very small 13.5% of information that is captured numerically. That leaves out the 86.5% of text-based content.

Further, because financial statement analysis — our industry’s fraud detection technique — is one step removed from what the auditors see, it is hardly reliable. Indeed, financial statement analyses are just table stakes: Ours probably won’t differ much from those of our competitors. Just looking at the same numbers as everybody else is unlikely to prevent fraud or generate alpha.

And what about private markets? The investment research community has spent an awful lot of time looking for investment opportunities in that space in recent years. But while private market data are sometimes audited, they lack the additional enforcement mechanism of public market participants’ due-diligence and trading activities. These can sometimes signal fraud and deception.

Conclusion: There has to be another tool to help us fight deception.

Scientifically Based Text Analyses to the Rescue

Starting with James W. Pennebaker’s pioneering work, researchers have applied natural language processing (NLP) to analyze verbal content and estimate a transcript’s or written document’s credibility. Computers extract language features from the text, such as word frequencies, psycholinguistic details, or negative financial terms, in effect, dusting for language fingerprints. How do these automated techniques perform? Their success rates are between 64% and 80%.

In personal interactions, as we noted, people can detect lies approximately 54% of the time. But their performance worsens when assessing the veracity of text. Research published in 2021 found that people have about a 50% or coin-flip chance to identify deception in text. A computer-based algorithm, however, had a 69% chance.

But surely adding people to the mix improves the accuracy? Not at all. Our overconfidence as investors sabotages our ability to catch deception even in human-machine hybrid models. The same researchers explored how human subjects evaluated computer judgments of deception that they could then overrule or tweak. When humans could overrule, the computer’s accuracy dropped to a mere 51%. When human subjects could tweak the computer judgments in a narrow range around the algorithms’ evaluation, the hybrid success rate fell to 67%.

Computers can give investment pros a huge advantage in evaluating the truthfulness of company communications, but not all deception detection methods are one size fits all.

One computer-driven text-based analysis, published in 2011, had the ability to predict negative stock price performance for companies whose 10-Ks included a higher percentage of negative words. By scanning documents for words and phrases associated with the tone of financial communications, this method searched for elements that may indicate deception, fraud, or poor future financial performance.

Of course, those businesses whose stock prices were hurt by this technique adapted. They removed the offending words from their communications altogether. Some executives even hired speech coaches to avoid ever uttering them. So word-list analyses have lost some of their luster.

Where Do We Go from Here?

It may be tempting to dismiss all text-based analyses. But that would be a mistake. After all, we have not thrown away financial statement analysis, right? No, instead we should seek out and apply the text-based analyses that work. That means methods that are not easily spoofed, that assess how language is used — its structure, for example — not what language is used.

With these issues in mind, we developed Deception And Truth Analysis (D.A.T.A.) with Orbit Financial. Based on a 10-year investigation of those deception technologies that work in and out of sample — hint: not reading body language — D.A.T.A. examines more than 30 language fingerprints in five separate scientifically proven algorithms to determine how these speech elements and language fingerprints interact with one another.

The process is similar to that of a standard stock screener. That screener identifies the performance fingerprints we want and then applies these quantitative fingerprints to screen an entire universe of stocks and produce a list on which we can unleash our financial analysis. D.A.T.A. works in the same way.

A key language fingerprint is the use of articles like a, an, and the, for example. An excess of these is more associated with deceptive than truthful speech. But article frequency is only one component: How the articles are used is what really matters. And since articles are directly connected to nouns, D.A.T.A is hard to outmaneuver. A potential dissembler would have to alter how they communicate, changing how they use their nouns and how often they use them. This is not an easy task and even if successful would only counteract a single D.A.T.A. language fingerprint.

The other key findings from recent D.A.T.A. tests include the following:

Time and Resource Savings: D.A.T.A. assesses over 70,400 words per second, or the equivalent of a 286-page book. That is a 99.997% time savings over people and a cost savings of more than 90%.

Deception Accuracy: Each of the five algorithms are measured at deception detection accuracy rates far above what people can achieve in text-based analyses. Moreover, the five-algorithm combination makes D.A.T.A. difficult to work around. We estimate its accuracy exceeds 70%.

Fraud Prevention: D.A.T.A. could identify the 10 largest corporate scandals of all time — think Satyam, Enron — with an average lead time in excess of six years.

Outperformance: In one D.A.T.A. test, we measured the deceptiveness of each component of the Dow Jones Industrial Average each year. In the following year, we bought all but the five most deceptive Dow companies. From 2009 through 2019, we repeated the exercise at the start of each year. This strategy results in an average annual excess return of 1.04% despite the sometimes nine-month lag in implementing the strategy.

The writing is on the wall. Text-based analyses that leverages computer technology to detect fraud and deception results in significant savings in both time and resources. Future articles in this series will detail more D.A.T.A. test results and the fundamental analysis wins that this kind of technology makes possible.

Fraud and Deception Detection: Text-Based Analysis

Dividend Growth for Today and the Long-Term

The Hidden Power of Silence in Sales Conversations

Racing the Tariffs: How Frontrunning Is Fueling Economic Momentum

Why Authenticity and Referrals Beat Algorithms in the Age of AI

An Alternative Way To Invest in the S&P 500

5 Ways To Engage Clients During Market Volatility

Trading the Trump Two-Step: What Investors Need to Know Now

Transformative Advice: Turning Clarity Into Confidence

Research analysis relies on our trust.

Our fraud and deception detection methods are only okay.

1. Time and Resource Constraints

2. Trusting our gut does not work.

3. Auditors’ techniques audit numbers.

Scientifically Based Text Analyses to the Rescue

Where Do We Go from Here?

Fraud and Deception Detection: Text-Based Analysis

Trending

Research analysis relies on our trust.

Our fraud and deception detection methods are only okay.

1. Time and Resource Constraints

2. Trusting our gut does not work.

3. Auditors’ techniques audit numbers.

Scientifically Based Text Analyses to the Rescue

Where Do We Go from Here?