The decision to fine-tune locally or rent the cloud usually gets argued on principle when it should be settled with a spreadsheet. The maths is not complicated. A capable local rig has a known upfront cost and runs at almost nothing afterwards, while cloud GPUs charge by the hour every single time. Where you land depends entirely on how often you actually train and infer, and most engineers underestimate how quickly recurring rental adds up.

Quick Answer

A dual-24GB GPU rig holds a 70B Q4 model's roughly 43 to 48GB footprint across its combined VRAM, making it the practical break-even point against ongoing cloud rental. Moderate, regular usage typically pays back the hardware in 6 to 12 months, after which local inference and fine-tuning runs at near-zero marginal cost. Infrequent one-off jobs still favour the cloud.

The VRAM threshold that defines the rig

Everything starts with fitting the model in memory. A 70B model quantised to Q4 occupies roughly 43 to 48GB, which is more than any single consumer 24GB card can hold. Two 24GB GPUs together give you 48GB of combined VRAM, which is the sensible consumer-grade entry point for keeping a model this size in memory rather than spilling it to system RAM at a heavy speed penalty.

It is worth being honest about the catch: splitting a model across two cards adds complexity, since the work has to be sharded across the GPUs and that can lengthen training times compared to a single large-memory accelerator. A QLoRA fine-tune of a 70B model uses around 46GB, which fits a dual-24GB setup with careful optimisation. For engineers speccing this kind of build, the AI PC range at Evetech covers the multi-GPU workstations that make it possible.

What cloud rental actually costs

Cloud is cheap per job and that is exactly what makes it deceptive. A 70B QLoRA fine-tune runs roughly 8 to 12 hours on a high-end data-centre GPU, costing somewhere around the low tens of dollars per run at typical rates, and longer on older or cheaper cards. For a single experiment, that is trivially affordable, and if you fine-tune twice a year the cloud is unquestionably the right answer.

The cost shape changes when usage becomes regular. Every fine-tune, every batch of inference, every re-run after a data fix is another metered charge. Run that loop weekly and the dollars compound, and for a South African team the exchange rate makes each of those charges land harder in rand terms. The cloud's strength, paying only for what you use, becomes its weakness once you use it constantly.

Where the break-even actually falls

The widely cited figure is a 6 to 12 month payback for moderate, regular usage. A consumer dual-GPU workstation lands in the few-thousand-dollar range to build, and that one-time cost is weighed against the recurring rental it replaces. If your monthly cloud spend on training and inference is steady, the rig pays for itself inside that window and then keeps running at the cost of electricity alone. The more you train, the faster it crosses over.

Beyond the headline cost

Money is not the only variable. Local hardware keeps proprietary training data entirely on your own machines, which matters when the dataset is sensitive or contractually restricted. It also removes the friction of spinning instances up and down, so the experimentation loop is immediate, you train when you want, for as long as you want, without watching a meter.

The cloud wins on flexibility at the top end. If you occasionally need far more compute than your rig holds, a much larger model or a bigger batch, renting that for a day is far cheaper than buying hardware you will use twice. The mature setup for many teams is a hybrid: a local rig for the constant day-to-day fine-tuning and inference, with cloud reserved for the rare heavy job that exceeds it. The best-selling PCs at Evetech show the range of machines that can anchor that local side.

The hidden costs people forget on both sides

A fair comparison counts more than the obvious numbers. On the local side, the GPU price is only the start: a dual-GPU rig needs a power supply big enough to feed both cards, a case and cooling that can shed the heat, and a motherboard with the right slots and lanes. Electricity is an ongoing cost too, and two high-end GPUs under sustained load draw meaningfully, which adds up over a year of regular training. None of this changes the conclusion for heavy users, but it should be in the sum.

The cloud has its own overlooked costs. Storing datasets and model checkpoints between runs is billed separately from compute, and moving large datasets in and out incurs transfer charges and time. Spinning instances up, configuring environments and waiting for availability all consume engineer hours that never appear on the invoice. For a team that trains constantly, those frictions are a real recurring tax that a local rig simply does not have once it is set up.

Why the rig keeps paying after break-even

The most underrated part of the local case is what happens after the hardware is paid off. The rig does not just stop costing money, it keeps producing value at the price of electricity. Beyond the fine-tuning it was bought for, the same machine runs local inference for development, hosts a private model for the team to test against, and handles experimentation that you would otherwise have metered in the cloud. One purchase covers many workloads.

That is the compounding benefit cloud rental never delivers. Every cloud hour is a fresh charge regardless of how many you have already paid, while a local rig's marginal cost trends toward zero the more you use it. For a team whose AI work is growing rather than shrinking, that trajectory is the strongest argument of all: the busier you get, the more the owned hardware outperforms the rented alternative.

How to decide for your own workload

Be honest about frequency. Tally your real monthly hours of training and inference, price what that costs in the cloud, and compare it to a rig amortised over a year. If you train often enough that the cloud bill is a recurring line item you notice, local hardware almost certainly wins on a 6 to 12 month horizon. If you fine-tune occasionally and unpredictably, the cloud's pay-per-use model is the cheaper and simpler choice. There is no universal answer, only your usage pattern against the numbers.

Frequently Asked Questions

How much VRAM does a 70B Q4 model need?

Roughly 43 to 48GB. That exceeds any single consumer 24GB card, so a dual-24GB rig with 48GB combined is the practical consumer entry point for holding the model in memory rather than offloading to slower system RAM.

How long until a local rig pays for itself?

For moderate, regular usage the common estimate is 6 to 12 months against equivalent cloud rental. After that, local fine-tuning and inference run at near-zero marginal cost beyond electricity, and heavier use shortens the payback.

Is the cloud ever the better choice?

Yes. For infrequent, one-off fine-tunes, a single run costs only the low tens of dollars, so buying hardware makes no sense. The cloud also wins when you occasionally need far more compute than your rig can hold.

Does splitting a model across two GPUs hurt performance?

It adds complexity. The model has to be sharded across both cards, which can lengthen training compared to a single large-memory GPU. A 70B QLoRA fine-tune uses around 46GB and fits a dual-24GB setup with careful optimisation.

Why does local hardware appeal to South African teams specifically?

Cloud GPUs are billed in dollars, so the exchange rate makes every recurring charge land harder in rand. A one-time local purchase converts that fluctuating monthly cost into a fixed asset and keeps proprietary data on your own hardware.

Run the numbers and the rig wins? Spec a multi-GPU workstation from the AI PC range at Evetech and stop paying the cloud meter for work you do every week.