Spreadsheets in the age of LLMs

It seems that with the advent of large languagem models, the viability of many software application businesses has been called into question, as evidenced by the sell-off of stocks such as ADBE, CRM, and MSFT. I am not trying to predict the future of software businesses (although I am very bullish, especially at these prices), but rather think about how LLMs would impact the workflow of one of the most widely-used pieces of software: the spreadsheet.

I am not concerned with a particular brand of software, so let us say that my analysis would apply to Microsoft Excel, Google Sheets, or Apple Numbers. What I’m interested in is how these applications might be either replaced or integrated with LLMs. What would the workflow of data analysis in the future look like?

It might be instructive to examine a common workflow for financial analysts, which is to model the future earnings of an asset (equity or bond). This seems like the perfect problem for an LLM to crunch through. The LLM can read financial statements, document assumptions, and write a Python script that generates a discounted cash flow report in a nicely formatted markdown file. The user could ask the LLM to change assumptions about the inputs, such as the growth and discount rates and re-generate the report as needed. In principle, a spreadsheet is not required in this workflow.

Nonetheless, I think that there is still value in having the model implemented by spreadsheet rather than Python script. The primary reason is that spreadsheets are easier to understand for those without a programming background. LLMs are not a replacement for understanding, for they do not have any sense of understanding themselves. And a lot of problems are naturally better expressed as tables with logic within. These reasons have not changed from the pre-LLM era.

So what effect will LLMs have on spreadsheet workflows? Although spreadsheets are easier to understand than code, just like code, writing is easier than reading. With Python libraries that can manipulate XLSX files, LLMs can assist with both authoring and making sense of these documents. But its text output is no substitute for understanding, and a large part of that aid comes from the visualization of the sheet itself, the user interface. The biggest gap I see here is the non-obvious flow of data between cells. Since spreadsheets are mostly WYSIWYG, it’s difficult to immediately tell how the cells relate to each other.

I propose a new UI design that separates the table styles from its core logic, similar to markdown source vs. rendered preview. The idea is to enable syntax highlighting for different data types without having to click into each cell. Styles can be edited in a different mode, where the user is only focused on appearance. But the first and foremost concern for the user is if the series of inputs produces a “valid” spreadsheet program.

Some orthogonal ideas for improving comprehensibility:

  • Jump to cell reference definition
    • note: Google Sheets highlights the constituent cells in a formula, but I think jumping to the actual cell would make things easier, especially since it can live in a different sheet
  • Previewing the formula and evaluated value cell on hover