I've been using llama.cpp on Mac Silicon for months now, and my brother, Chimezie has been nudging me to give MLX a go. I finally set aside time today to get started, with an eventual goal of adding support for MLX model loading & usage in OgbujiPT. I've been warned it's rough around the edges, but it's been stimulating to play with. I thought I'd capture some of my notes, including some pitfalls I ran into, which might help anyone else trying to get into MLX in its current state.
As a quick bit of background I'll mention that MLX is very interesting because honestly, Apple has the most coherently engineered consumer and small-business-level hardware for AI workloads, with Apple Silicon and its unified memory. The news lately is all about Apple's AI fumbles, but I suspect their clever plan is to empower a community of developers to take the arrows in their back and build things out for them. The MLX