Finance for AI : The AI Cost Formula


In 2018, I joined Microsoft finance after 7 great years with Procter & Gamble. After gaining a lot of background in IT, product, and sales team finance at the world’s largest packaged goods company I wanted a change of pace. So I joined the world’s largest software company. Specifically, I joined the finance group for Azure (Microsoft’s cloud business) specializing in our Cloud AI Services. It has been an awesome year and a half of learning and business building. This blog is a summary of some of my key learning along the way.

AI Financial Models

The AI space is changing rapidly, and it can be challenging for finance managers to model out the exact cost and payout from a new AI project. In this nascent space, there are no standard NPV templates which can get to a reasonable number. The result: you end up with a massive SWAGgy number with 100% variability north or south.

On the cost side, this usually looks like a large under-estimation of how cheap it will be to build an algorithm/tool/process. On the payout side, the tendency is for wild inflation of the total addressable market and ROI. Neither are wrong in principle, but both are inaccurate in practice. AI will be a massive marketplace (Gartner reports AI market will be $3.9T by 2022) and digital technologies have the potential to be developed cheaply using public/open-sourced data. In theory this will generate very high ROI projects at low cost.

“I think the analogs are the agricultural revolution, the industrial revolution, and the computer revolution and I think the AI revolution will be bigger than any or those three or bigger than all three of them put together.” — Sam Altman (Open AI) on the potential value of AGI:

In reality, AI investments can surprisingly large and the payout will likely be a slow 3-year flywheel starting to spin. Additionally, the cost profile of AI product investment is very front-loaded as there is a lot of initial data and up front development cost.

This is neither good or bad, but an exciting challenge for finance managers to pin down project cost and payout estimates. To help, I will focus this post on how to model out cost for AI projects using a framework which captures essential costs and allows for flexibility. While it will not provide you the perfect answer, following this logic has enabled me to ask the right questions to get to a better ROI estimates and learn a lot in the process.

The AI Formula


The first and most confounding cost component of an AI financial model is the cost of data. Data can be free or really expensive. It can have huge local gravity, or move freely across oceans. You may need only a few GBs or you may need 100,000 labeled images. Regardless, it is always a necessary component and getting to the bottom of the data need and data cost is important to a good AI financial model.

“In deep learning, there’s no data like more data. The more examples of a given phenomenon a network is exposed to, the more accurately it can pick out patterns and identify things in the real world.”

― Kai-Fu Lee, Google Brain and Google China founder, and writer of AI Superpowers: China, Silicon Valley, and the New World Order.

Data is the new oil in the 4th industrial revolution. This is why we are seeing some of the best AI models from the companies which have a lot of access to user data (Google, Baidu, Facebook). However, availability of data is not often the cost driver. Refining that oil into a LABELED data product is where the true cost and value can be derived.

What is labeled data

Most AI models today are supervised prediction engines which, using data as inputs (images, text, recordings), they generate prediction outputs (classifications, translations, transcriptions). The process of building a prediction engine using data is called training.

Supervised training requires a large amount of “labeled” data. These are images, recording, or other input which are tagged (“labeled”) with the correct answer. A model to predict an image of a cat may require 100,000 images of cats, each tagged with the text = “cat”. Training on those labels, the model will learn to predict if any image it is presented (unlabeled) is a cat.

While gathering 100,000 images of cats is relatively cheap/free. The process of tagging these images with the word “CAT” is like filling out a 100,000 line spreadsheet by hand. Labeling is even more challenging for voice recording. Building a voice translator requires someone sitting at a microphone and saying a word (eg: “le chat”) over and over for 10 minutes while typing the translated word (“cat”) to tie a label to a phrase. So while many say that data is the fuel of the 4th industrial revolution, it is more accurate to say that labeled data is that fuel.

Modeling data cost

To get started with data cost modeling it is important to ask the necessary questions to complete the following formula:

F(X) = cost per unit of data acquired * cost per hour of hand labeling + cost of storing data + cost of data wrangling/cleaning

  • Data acquired = often cheap or free sources of data, but sometimes very expensive new-to-the world data. While Google’s recent BERT model was built using all text on Wikipedia (available for free), there is not yet a good free source for labeled translation or video data. This can be very expensive per unit.
  • Cost of labeling — The bulk of the cost of data is in the labeling as it requires a lot of human time. You can in-house this work, but mostly data is purchased pre-labeled from a 3rd party vendor.
  • Cost of storage — This is cheap per unit, but can add up given the large amoutn of data need. On Azure, you can buy blob storage for = $0.00081/GB per month.
  • Cost of data cleaning — Significant data scientist time will be spent cleaning up and preparing data for model building, testing, and validating. This is not measured in $s spent but months of time between data purchase and model building.

Future techniques

Data acquisition is likely at it peak expense/unit given the low supply and high demand. There are a few techniques which will make data acquisition cheaper into the future, which will enable more development.

  1. Using AI for AI — To reduce the cost of labeling, lots of recent work (link) has gone into building automated ML platforms that can learn the labels of similar pictures and remove this tedious labeling task. Imagine an auto-fill feature that recognized that most of the pictures on your 100,000 line item list were cats and pre-labeled them. This has tremendous potential to increase model workflow time and cost (democratizing AI) as well as reducing the reliance on questionable 3rd party data labeling companies.
  2. Unsupervised learning — Over the last year, there has been an explosion of “pre-trained models” which leverage unsupervised learning (no data label needed) vs. supervised learning (data label needed). Large transformer models (GPT-2 and BERT) are two key examples of this unsupervised pre-trained model. Here, large models have been pre-trained on a huge corpus of data (reddit,, Wikipedia). Without labels, the unsupervised model learns how to categorize groups of inputs. After enough examples, GPT-2 learns by training on web text, how to fill in the blank in the quote: “when they go low, we go ___”. This is the basis of auto-reply on Gmail, but will also be used for many other applications in language understanding.

The universal benefit of these two future techniques is that they are much cheaper than hand labeling data. As these techniques improve, the cost of AI will decrease and the amount of AI integration in all products — new and old — will increase. It is important to note that these two new techniques are constrained by the second cost component of AI… COMPUTE.


The second key question to ask about the cost of building an AI system is related to the cost of compute. Overall, the cost of computing will be the largest component of AI project cost over time. Data acquisition and training costs are often the first concern as they are the first components of the model building, but doing something meaningful with that data often requires a lot of spending on compute.

What is compute cost

Compute cost is the price of renting many hours of server time on the cloud. In traditional machine learning, millions of calculations are needed to convert data inputs into prediction outputs. To speed these calculations up, they are often run in parallel on many premium servers in the cloud. While the hardware, software, and time per calculation varies, the consistent trend is that compute power (and quantity) is paramount.

GPU compute Example

Today, most AI models are trained using a special type of compute called a GPU (graphics processing units). Traditionally used for video games and computer screens, this hardware developed a niche benefit as a machine learning staple. The reason for this is due to the GPU chip ability to do parallel processing, batch processing, and hold large memory. This enables large amounts of calculations necessary for deep learning models across huge amounts of labeled data.

The rise of GPU and other accelerator hardware

Because of their computing capacity, the current trend in AI investment is to use more and more GPUs to build large deep learning models. All major AI companies have access to large amounts of GPU hardware in large server data centers. These data centers can be very expensive as the GPU is a premium chip and requires large amounts of power (3–5X or more power draw than standard CPU based computing).

In addition to the premium nature of GPU hardware, the cost of compute can be very large due to the sheer volume of processing needs. As the amount of data available grows exponentially and current deep learning model architecture enables more-and-more data per model, there is exponential growth in GPU compute usage. The benefit is the AI model will get more accurate by “throwing more compute at a problem”. If there is a strong ROI to increased accuracy, the upper limit of the amount of available compute is incredibly large running in the cloud. For example, the world’s largest supercomputer has 125 Petaflops of computing capacity. For reference, the computer (Deep Blue) that beat the first human grand master in chess had .0000113 Petaflops, and a Samsung galaxy has .000142 petaflops of compute. However, $3.00 per GPU hour, the cost of running those machines would be upward of $400M+ per year.

Managing Compute Cost

To offset the continually rising cost of GPU compute, there are two trends in cost reduction. First, many companies have released specialized hardware designed for their process/job. CPUs are great for general purpose computing (running a laptop), but in an increasingly demanding task, there is cost benefit specialized chips dedicated to the tasks (ex. Graphics chips for video). Specialization drives up utilization of all the resources in the chip (memory/networking) and maximizes speed of processes.

A good example of specialization is Google’s TPU (Tensor processing unit) which has replaced CPU and GPU in a number of AI related tasks within Google. These chips are specifically designed for the needs of Machine learning at Google. Other companies are now moving toward this model as there are significant gains in cost that can be derived.

The second form of compute cost management is through reducing waste. Due to large amounts of available compute and new to the world models, there can be a huge amount of wasted resources in training a deep learning model. With large models and expensive compute, the cost of your total model hinges heavily on the utilization % of the compute asset. For example: If the server needs to be on 24/7, or if the machines are sitting idle while AI jobs are not running, there is tremendous wasted cost.

A large portion of the AI architecture field is dedicated to solving this problem through driving up efficiency of assets. Some ways to do this include scheduling AI training jobs to run 24/7 so there are no wasted hours and leveraging the spare hours when other machines are off (Google can train on compute while users are asleep and not searching). However, the largest possible benefit to reduced compute cost is through model-size efficiency. If model complexity can be eliminated, or the number of computations needed to deliver the same result can be reduced. A great example of this is with the BERT transformer model released by Google in 2018. Within 1 year, the cost to train using this model had dropped from 100K to 10K and then down to 1K (see the more accurate and smaller alBERT model). This efficiency came from engineering work to optimize software and hardware for a specific task.

These types of efficiencies are a game changer because they allow us to continue increasing accuracy and performance on the same budget, enabling growth despite constraints and improving ROI over time. As a side benefit, this improvement is consistently a reason for a finance manager to have confidence in the short term investment given that there is often a very large amount of future cost savings opportunity through software or utilization efficiency. If the project has a low ROI in the current infrastructure, there is still upside in the eventuality that models will get compressed and cheaper.


The final, and most critical, component of an AI investment is the cost of headcount. A prerequisite to delivering impactful AI projects is having the right talent and sufficient staffing. While this investment will not likely be the largest cost in the AI investment, it is critical because of the timing and difficulty of acquiring rare AI talent. Here are a few characteristics of the AI talent market that make this cost pool challenging:

  • Talent is scarce There are few big markets for AI talent: US, Canada, China, India. Skills are limited and the talent supply is low (based on the Stanford AI Index 2019). Top computer science schools include Carnegie Mellon, Stanford, MIT, University of Toronto, Tsinghua University.
  • Skills are specialized the skills needed are hard to train and the fields are splitting into groups. Someone who is great at ML is not necessarily great at deep learning. Someone who is talented at developing vision systems may have no ability with language systems (even though they are in the same category of perceptive AI).
  • The teams are multi-disciplinary requiring skills from a number of disciplines: mathematical theory, coding, researcher, data science, engineering, and hardware development.

These characteristics make the supply of rare talent limited and fragmented, while also increasing the demand as each project needs more than just one type of headcount. Additionally, the specializations of each of these team members may be hard to find or limited to certain regions and educational institutions.

Team example: an AI project may need a project manager to coordinate, systems engineer to manage capacity, vision model researcher to develop the mathematical model, data scientist/engineer to visualize data, and ML developer to build models which will run at scale.

AI talent pipelines currently cannot keep up with the large amount of demand growth. This drives the cost per HC up significantly, which is a trend that will continue until AI training becomes more ubiquitous and more people enter the field. Today the top PhD talent is centered around a few global schools, but talented engineers can come from anywhere. Additionally, a large corpus of skills training is now available open source and free from EdX, Stanford, and Youtube channels.

When planning for AI HC, there will likely be a number of HC needed to kick off and to manage the work ongoing. This investments will look like a number of dedicated researchers up front, followed by investments in data science and PMs to manage the pipeline of data, data engineers to process the data, ML experts to build and run models, and technical PM types to lead the team and drive deliverables. When investing in these projects, it is key to ensure there is an appropriate balance of technical talent and business management talent. Overly technical projects often stall out on business projections and proving out the ROI of their basic research, while less technical teams often do not demonstrate the product success needed in a space where the state of the art improves every 6 months.

Headcount Risk

Funding HC is often the least expensive overall cost of the project, but it is the most critical portion for decision making as the cost per HC can be very high and staffing poses the largest timing risk. Compute is ubiquitous and data can be purchased as a one-time expense, but the hiring process and onboarding for specific work may take a long time.

For example: given a few days and a dozen computers, you can scrape the internet for all of reddit comments to build a NLP model. However, hiring a person with the talent needed to do this may take up to 3 months. Oddly, during this same hiring time, Reddit could release a public data stream that makes this person’s work unnecessary.

“Finalizing” a finance model

When building any finance model, it is smart to include a risk adjustment (30–50%), and go back to update assumptions as actuals come in. In AI it is even more important as the space is changing so rapidly. Within months, a new technique or free data source may pop up, completely revolutionizing the space or dropping one of your largest cost buckets to zero. On multiple occasions, I have encountered engineering breakthroughs which improved the speed of a model by 10X within the first 6 months of “locking” the financials. This improvement and efficiency is the payout from investing in experts, but also a key benefit of working with software-based solutions. You can’t get that amount of payout improvement from physical assets (servers, manufactured goods, ride-share rides) or even the human processes that these AI models will be replacing.

Across all 3 AI components, the consistent trend is that scale of inputs are growing exponentially. Compute capability has been growing exponentially following Moore’s law for 50 years, delivering a doubling of computing capability roughly every 2 years (2X # transistors which fit on a chip). Data is also growing exponentially via smartphones, connected devices, and increased connectivity. As demand for AI inputs increases (more labelled data), there will initial increase in cost, but soon there will be downward pressure on price per unit due to supply increase (eg. ubiquitous compute on more efficient servers). While the number of researchers and AI practitioners is not growing exponentially, it is expanding very fast. Over the next few years, the amount of data, computing capacity, and cheaper modeling techniques available to this growing number of data scientists and ML practitioners, will become astoundingly large.

This is what makes financial modeling for AI so exciting. Each new project has a few new-to-the world aspects, and the opportunity to interview experts provides a ton of personal learning and growth. What these teams can achieve with the proper resources and business management support will be earth-changing. Following the framework laid out here, I believe we can ask the right questions, make the right investments, and hopefully play a part in those large scale impacts.

— — — — — — — — — — — — — — — — -

More on Finance for AI, linked below.

2019 year in review:




Miami University Alum. Microsoft - Finance & Accounting.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Natural Language Processing (NLP)

Natural Language Processing

The double whammy of AI and overpopulation is coming

Conversational AI: The Battle to Be Heard

Augmenting The Work of OutSystems Developers With AI-Assisted Development

Researching AI and Machine Learning for the Future of Low-Code

Web Summit 2017 — stories

AI-Powered Health Services

Project Geneva: State Censorship beat by Genetic Algorithms

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
David Hall

David Hall

Miami University Alum. Microsoft - Finance & Accounting.

More from Medium

Retail Price Optimization usinf

Retired Programmer Tries AI Programming in Python. 16) AI — Read Several Moves Ahead (MiniMax)

A Technical Case Study on The Preprocessing and Modeling Phases of an Aquatic Solar Site…

Applying Antinomianism over mainstream LHP Idea of Godhood — Unilateral Power