The few routes I've looked at recently are:
96 GB VRAM via 4x Tesla P40 graphics cards in a server grade motherboard + 500 GB of regular RAM, comes out to about $4000, especially with water cooling.
128 GB Unified RAM in a Mac M4 Ultra setup, $10,000. Slower at inference compared to the 96GB VRAM of older generation graphics cards. Very expensive.
32 GB VRAM 5090 (or whatever is available). Pricing in the US is $3000 + or more. Fastest at inference, but not as much room to hold the larger models.
glhf.chat , Run almost any open source model that's available, and pay for usage. Roughly $0.01 to $0.10 per prompt/answer. I signed up for it as a beta before it was paid and they gave me $10 credit after they started requiring money. In the long run, I don't like this a lot because I want to run an agent or some local coding or local RAG or fine tuning and only pay for electricity.
128 GB Unified RAM (DDR5) NVIDIA DGX Spark, $3000 to $4000
https://www.hardware-corner.net/nvidias-dgx-spark-digits-specs-20250319/
These are some pretty exciting times.