Conversations about the potential of A.I. have reached near-hysterical levels. But, for predictions of an A.I.-powered future to come true, thousands of new A.I.-focused companies need first to identify use cases, then to build, train and test the A.I. models required to address the use cases. Not all will work and fewer still will ‘win’ their market sector. If you’re starting such an A.I. company, how do you access the high level of computing power you’re going to need? And how can you access it cost-effectively – in a way that reassures investors that the costs won’t clobber your runway? A gift to the aspiring A.I. entrepreneur/novice CTO, this is a dummies’ guide: how best to access A.I.-grade computing power.
If you’re interested in A.I. then you’ll already know that there is presently a severe global shortage of high-performance graphic cards – cards which, it turned out, are as excellent for A.I. applications as they are for their originally intended, graphics-processing purpose. The well-known consequences of high demand and limited supply mean that the current prices of these cards are as equally as high as supply is low, largely down to demand from organisations investing in A.I.-related projects and/or datacentres.
The high costs of acquisition of these cards impacts the price of both – and there really are only two – ways of accessing the kind of computing power needed for A.I. applications.
Option 1: lease the power from a cloud computing provider
The cost of a high-powered, high-performance HTX box – arguably the minimum you might need simply for Machine Learning, not even generative A.I. – is currently something like $400,000. Now, it’s true that vast sums of money seem to be available for investment into A.I.. But that kind of investment – in just one part of the infrastructure that you will need to make good use of it, remember – is one that is often best made by cloud providers rather than an individual company.
This is even more likely to be the case if you’re in the early stages of experimenting with an idea or use case. Yes, your operational costs will be higher, but you will be able to avoid the capital costs of investing in your own hardware while you prove your concept.
However, while many cloud computing providers have already invested in the powerful boxes needed for A.I., it’s still a big investment even for many cloud providers. The provider(s) you approach may not have what you need – but you may find them interested in talking to you nonetheless. For such providers, it’s an opportunity to put together a business case to invest in the necessary hardware that will give them access to the A.I. market (including you). Cloud providers have investors too, after all.
Don’t forget to talk with your I.T. partners
An alternative approach to finding a suitably tooled-up cloud provider yourself is to talk to your system integrator. They may have existing relationships or be able to make a recommendation – or, again, they may be looking for an opportunity to start building those relationships. Certainly, a partner can help you look at your infrastructure and should be able to say, okay, you're going to need x, y and z, and that's going to cost you this.
I find that the small integrators tend to offer more tightly-defined specialisms and/or specific expertise. If you go to the big integrators, you may find yourself working through an account manager; he/she might be able to put you in touch with pre-sales specialists with the knowledge you need. But I certainly recommend considering smaller consultants; ASUS’s partner network, for example, includes many smaller consultancies and even individual consultants, who are often uniquely experienced in a particular niche.
While not exactly the definition of an IT partner, you might even ask Nvidia itself to point you in the right direction; or AMD, which has its own solutions.
Option 2: own your own hardware
Perhaps you’ve proved the concept, and you’re confident that your A.I. use case has legs. Is it time to acquire and operate your own, suitable hardware? Well, it might be. And, if you are set on doing so, here’s what I suggest.
- Look for a server designed for multiple GPUs. This is important because it will also have been designed with the proper airflow to cool all that processing power.
- Consider scalability requirements from the start. Perhaps you can get into the game without filling all the GPU and CPU slots, and you might be able to do that for less than $50,000, upgrading and expanding that server as the need arises - just scaling it up as you're scaling.
But owning your own hardware is only likely to be the right choice if you're very sure that AI is the way to go. If you're still looking at possibilities, cloud is by far the best and most cost-effective solution.
Bandwidth, data and storage considerations
Bear in mind that, if your computing power is in the cloud, your bandwidth to get the data there will be a factor. If your server is on-premises, you can transfer data at massively high speeds. You’ll need to deploy a high-speed, NVMe-based storage system, and high-speed switches.
Think, too, about the kind of workloads you're running and make sure you have access to the data you’re going to need for those. Access the data, understand it and organise it; it has a bearing on your technology choices. If you have some really big sets of data that need to be analysed, constantly and/or often, consider putting a lot of storage on this system and then running low bandwidth: you will be transferring files to the system, performing the analysis and getting results back. Locally, on the other hand, you can now achieve 400 gigabits of data per second per NIC – and, in an ASUS system for example, you can have 10 of these.
Wise choices save time and money
We’re still at the start of the A.I. boom and everyone wants to benefit from first-mover advantage. Getting out of the blocks quickly feels like an imperative; your investors may be high on expectations and low on patience. But moving quickly can bring tripwires into play, including a propensity to make what turn out to be the wrong technology choices – costing you time rather than saving it. And there’s no getting away from the fact that the costs of access to A.I.-grade computing power is still expensive, however you do it; making the wrong choices may also cost you money, as well as time.
Take the time to assess (and be realistic about) your actual needs, and choose your route to A.I. computing power wisely. Broadly, cloud providers offering the right degree of flexibility are the right way to go for most people. While initiatives such as the USA’s CHIPS Act 2023 should, ultimately, increase the global supply of semiconductors, the time it takes to build new foundries or scale-up existing fabrication plants means that supply of A.I.-grade server power will continue to lag demand until at least Q2 2024, keeping the price of access to A.I-level computing power high for some time yet. Cloud power is most likely to offer you the flexibility and performance you need until that situation changes.
But praise for vendor initiatives shouldn’t let procurement and IT managers in the market for new servers off the hook. Perhaps the best way to audit us is for you to be demanding; to prioritise sustainability and make ever-more-sustainable choices? Whether you have formal sustainability goals already or they’re about to come your way, the computing products made by ASUS and other manufacturers are of huge environmental significance and are more than worthy of your attention. Ask us the right questions!
Find out more
Find out more about ASUS Servers and Workstations and the Energy efficiency in the data centre report from ASUS.