Finance for AI : The AI Cost Formula


In 2018, I joined Microsoft finance after 7 great years with Procter & Gamble. After gaining a lot of background in IT, product, and sales team finance at the world’s largest packaged goods company I wanted a change of pace. So I joined the world’s largest software company. Specifically, I joined the finance group for Azure (Microsoft’s cloud business) specializing in our Cloud AI Services. It has been an awesome year and a half of learning and business building. This blog is a summary of some of my key learning along the way.

AI Financial Models

The AI space is changing rapidly, and it can be challenging for finance managers to model out the exact cost and payout from a new AI project. In this nascent space, there are no standard NPV templates which can get to a reasonable number. The result: you end up with a massive SWAGgy number with 100% variability north or south.


The first and most confounding cost component of an AI financial model is the cost of data. Data can be free or really expensive. It can have huge local gravity, or move freely across oceans. You may need only a few GBs or you may need 100,000 labeled images. Regardless, it is always a necessary component and getting to the bottom of the data need and data cost is important to a good AI financial model.

What is labeled data

Most AI models today are supervised prediction engines which, using data as inputs (images, text, recordings), they generate prediction outputs (classifications, translations, transcriptions). The process of building a prediction engine using data is called training.

Modeling data cost

To get started with data cost modeling it is important to ask the necessary questions to complete the following formula:

  • Cost of labeling — The bulk of the cost of data is in the labeling as it requires a lot of human time. You can in-house this work, but mostly data is purchased pre-labeled from a 3rd party vendor.
  • Cost of storage — This is cheap per unit, but can add up given the large amoutn of data need. On Azure, you can buy blob storage for = $0.00081/GB per month.
  • Cost of data cleaning — Significant data scientist time will be spent cleaning up and preparing data for model building, testing, and validating. This is not measured in $s spent but months of time between data purchase and model building.

Future techniques

Data acquisition is likely at it peak expense/unit given the low supply and high demand. There are a few techniques which will make data acquisition cheaper into the future, which will enable more development.

  1. Unsupervised learning — Over the last year, there has been an explosion of “pre-trained models” which leverage unsupervised learning (no data label needed) vs. supervised learning (data label needed). Large transformer models (GPT-2 and BERT) are two key examples of this unsupervised pre-trained model. Here, large models have been pre-trained on a huge corpus of data (reddit,, Wikipedia). Without labels, the unsupervised model learns how to categorize groups of inputs. After enough examples, GPT-2 learns by training on web text, how to fill in the blank in the quote: “when they go low, we go ___”. This is the basis of auto-reply on Gmail, but will also be used for many other applications in language understanding.


The second key question to ask about the cost of building an AI system is related to the cost of compute. Overall, the cost of computing will be the largest component of AI project cost over time. Data acquisition and training costs are often the first concern as they are the first components of the model building, but doing something meaningful with that data often requires a lot of spending on compute.

What is compute cost

Compute cost is the price of renting many hours of server time on the cloud. In traditional machine learning, millions of calculations are needed to convert data inputs into prediction outputs. To speed these calculations up, they are often run in parallel on many premium servers in the cloud. While the hardware, software, and time per calculation varies, the consistent trend is that compute power (and quantity) is paramount.

The rise of GPU and other accelerator hardware

Because of their computing capacity, the current trend in AI investment is to use more and more GPUs to build large deep learning models. All major AI companies have access to large amounts of GPU hardware in large server data centers. These data centers can be very expensive as the GPU is a premium chip and requires large amounts of power (3–5X or more power draw than standard CPU based computing).

Managing Compute Cost

To offset the continually rising cost of GPU compute, there are two trends in cost reduction. First, many companies have released specialized hardware designed for their process/job. CPUs are great for general purpose computing (running a laptop), but in an increasingly demanding task, there is cost benefit specialized chips dedicated to the tasks (ex. Graphics chips for video). Specialization drives up utilization of all the resources in the chip (memory/networking) and maximizes speed of processes.


The final, and most critical, component of an AI investment is the cost of headcount. A prerequisite to delivering impactful AI projects is having the right talent and sufficient staffing. While this investment will not likely be the largest cost in the AI investment, it is critical because of the timing and difficulty of acquiring rare AI talent. Here are a few characteristics of the AI talent market that make this cost pool challenging:

  • Skills are specialized the skills needed are hard to train and the fields are splitting into groups. Someone who is great at ML is not necessarily great at deep learning. Someone who is talented at developing vision systems may have no ability with language systems (even though they are in the same category of perceptive AI).
  • The teams are multi-disciplinary requiring skills from a number of disciplines: mathematical theory, coding, researcher, data science, engineering, and hardware development.

Headcount Risk

Funding HC is often the least expensive overall cost of the project, but it is the most critical portion for decision making as the cost per HC can be very high and staffing poses the largest timing risk. Compute is ubiquitous and data can be purchased as a one-time expense, but the hiring process and onboarding for specific work may take a long time.

“Finalizing” a finance model

When building any finance model, it is smart to include a risk adjustment (30–50%), and go back to update assumptions as actuals come in. In AI it is even more important as the space is changing so rapidly. Within months, a new technique or free data source may pop up, completely revolutionizing the space or dropping one of your largest cost buckets to zero. On multiple occasions, I have encountered engineering breakthroughs which improved the speed of a model by 10X within the first 6 months of “locking” the financials. This improvement and efficiency is the payout from investing in experts, but also a key benefit of working with software-based solutions. You can’t get that amount of payout improvement from physical assets (servers, manufactured goods, ride-share rides) or even the human processes that these AI models will be replacing.

Miami University Alum. Microsoft - Finance & Accounting.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store