Discover more from Fintech AI Review
Fintech AI Review Volume #4
Just have ChatGPT do it? Toy examples vs. real-world products in financial services...
“Just have ChatGPT do it!” It’s almost become a meme. In fact, while writing this newsletter, a friend suggested I do just that instead of spending so much time writing. In case you’re wondering, I rejected this idea. The suggestion, however, made me think about the difference between experimentation and building for scale in a consequential and highly-regulated industry.
Toy examples are easy. Building something usable, cost-effective, and valuable for real world use at scale requires serious knowledge and effort. In financial services, where accuracy and compliance are top priorities, this is even more true. The news linked below demonstrates this in a couple ways.
A super interesting post on the Ntropy blog shows how GPT-4 when thoroughly-prompted is quite good at transaction classification but that its accuracy can be matched with 100x lower cost and latency by a purpose-built solution, which of course requires significant expertise. The CFPB released a report on chatbots in consumer finance, which have a lot of potential for improved customer service if done well but could also provide a poor experience if used only as a cheap, drop-in replacement for human agents. Intuit is apparently using its vast financial dataset and codified knowledge of accounting standards and the tax code to develop purpose-built applications with generative AI.
The greatest useful innovation will occur at the intersection of deep technical expertise, large quantities of highly accurate and relevant training data, and a well-informed, highly-focused orientation to solving particular problems.
As always, please share your thoughts, ideas, comments, and any interesting content. Happy reading!
Latest News & Commentary
Ilia Zintchenko, CTO of Ntropy, a financial data enrichment platform, wrote a detailed and fascinating article on various methods of extracting meaning from financial transaction data. If you’re in the world of payments or lending, you’re likely familiar with the difficult-to-parse and highly unstandardized nature of bank transaction descriptions. Categorization of these transactions is intuitively useful for many applications, such as underwriting a loan, authorizing a payment transaction, or understanding the expenses of a business. However, categorization is a truly hard problem, in large part because categories depend on context (consider how the early directory-based web search engines failed, in part because there is no universally-relevant system of ontology that works for every user in every context). Ilia explains the difficulty and tradeoffs of multiple approaches to solving this problem, including human expert labeling, large language models, small language models, and rules-based systems. Most interestingly, this post shares detailed benchmarking of Ntropy’s transaction tagging vs. human tagging as well as multiple GPT-4-based approaches, comparing accuracy, cost, and latency. The code to run the benchmarks is even available on github. It’s really nice to see Ntropy share both theory and practice in such detail.
One takeaway for those interested in generative AI for financial services applications: GPT-4 with a long and well-crafted prompt is capable of impressive accuracy. However, because OpenAI charges by the token, long prompts can be quite expensive. Similar or perhaps greater accuracy can be achieved with a custom solution, as Ilia demonstrates, at orders of magnitude lower cost and lower latency. Of course, this requires technical skill, accurate training data, and real investment.
Fraud is an unfortunate reality in financial services. If you’re in the business of lending money, someone out there will likely try and steal it, and the criminals are often smart and highly motivated. It takes time, money, and technology to stay a step ahead of the bad guys. Fortunately, there are companies like Sentilink, profiled in this Forbes piece by Jeff Kauflin. The article tells the story of how its founders - who are both incredibly smart and nice people - started the company after identifying surprising and novel fraud patterns and have now scaled it to over 300 customers. It details examples of how the company uses its AI models to detect and identify synthetic ID fraud, as well as how it uses human fraud analysts to review cases and spot previously unencountered fraud techniques, which they can then build into their algorithms.
The CFPB released a new report on the use of chatbot technology in consumer finance. Chatbots are now in widespread use, and the report shares findings that all of the top 10 U.S. banks have some form of chatbot, and 37% of the country’s population interacted with a bank’s chatbot in 2022. Some of these bots are quite simple, using rule-based systems and question-and-answer hierarchies, whereas others use more recent innovations such as large language models, often supplied by third-party vendors such as Kasisto. The report uses information from the CFPB complaint database to identify several risks posed by the use of chatbots. These include: inability to solve more complex problems, giving sub-optimal advice, wasting customer time if unable to resolve a dispute, or potentially revealing unauthorized personally-identifiable information. Of particular concern is the tendency for LLMs to provide incorrect information in a highly confident tone. Of course, many of these issues exist with human customer support agents as well! The warnings are valid and timely, though as with many uses of technology, the thing to be regulated should be the activity itself, not the technology used to accomplish it. Specifically, banks need to comply with all existing laws, and it shouldn’t really matter whether a violation is committed by a human or non-human agent of a bank.
In this press release, Intuit announced their creation of what they call GenOS, an in-house platform for the creation of product experiences powered by generative AI. This appears to be a set of capabilities for Intuit’s developers and not (yet, perhaps) available to the outside world. The release doesn’t go into a lot of detail on the specific product use cases, but it does make sense that Intuit is in a very good position to develop vertically-specific AI, powered by their vast treasure trove of consumer and business financial data. If done right, there’s a real opportunity to help streamline the process of understanding and managing one’s finances. For example, things like understanding your personal credit, managing your taxes, or doing bookkeeping for a business would conceivably be helped by a specialized, well-trained LLM agent that also had access to reliable and accurate data, as well as an overlay of law, accounting standards, and tax code. Making these activities easier and less stressful and error-prone will be compelling to a large population, and it will be interesting to see how Intuit’s GenOS materializes in their customer-facing products.
In a passionate, logical, and well-argued essay, Marc Andreessen outlines his optimistic vision for why AI has the potential to drastically improve the human condition and why the doomers and opportunists spreading panic about AI are misguided and wrong. If you’re reading this, you’ve probably already read Marc’s piece, but if you haven’t, you should. The case he makes is incredibly convincing, listing the many opportunities for AI to solve the world’s problems and debunking some of the most common fears with real historical data. He contends that the real risk is the U.S. not sufficiently pursuing AI and ceding AI dominance to the Communist Party of the PRC, with its dark and dystopian vision. It’s compelling, inspiring, and the most important essay Marc Andreessen has ever written.
Thanks for reading Fintech AI Review! Subscribe for free to receive new posts and support my work.