Running an LLM on Your Mac - The Missing Guide - Part 2
In case you missed part 1, make sure you start there to get the LLM running via command line first. That includes getting the core environment set up and the Mistral 7B model downloaded and ready for use.
For part 2 we'll be digging into swift-chat, a Swift-based app that allows loading a model and running prompts against it.
Getting Started
Make sure you're running the latest beta MacOS (I’m currently running MacOS Sequoia 15.1 beta) and Xcode (I'm using 16.1 beta 2 16B5014f).
Part 2 - Setting up Swift-Chat as an LLM UI
Let's get started in our working directory again. In the Terminal navigate to the top-level working folder you created in part 1.
Get swift-chat from GitHub. Note that the preview branch is required.
Launch Xcode Version 16.1 - I'm using beta 2 (16B5014f) at the time of writing this guide.
In Xcode, open the swift-chat/SwiftChat.xcodeproj. Make sure it’s using the Xcode beta so don’t just double-click the file in the Finder, and instead navigate to the project file using the Xcode 'Open' menu.
Now let's get the project set up. Set your dev team and change the bundle ID if you need to (for a third party project I usually just append my initials at the end to keep it unique if other team members are testing out the same things).
‼️⚠️💥Critical Fail Step - Don't Skip
We need to set up Xcode similarly to what we had to do for the command line - we need to set the Hugging Face token as an environment variable. At the top of the Xcode window, select the SwiftChat scheme and edit it.
Use the Run settings tab and add your environment variable. Make sure the name is exactly HUGGING_FACE_HUB_TOKEN and put your token value in.
To make sure the token is working correctly for you, here’s some quick debug you can add to the SwiftChatApp.swift to print out the token for verification at run time.
Run the project on My Mac. When the app window appears, click the model button and navigate to the Downloads/apple-llm-ai/swift-transformers/Examples/Mistal7B directory (or wherever you put yours) and pick the model file. It'll take a while to load the model depending on the speed of your Mac. On my Mac it took about a minute to load.
When you launch the app in the future it will automatically reload the last model (so that'll take a bit when you first fire it up).
And then chat away…
You can adjust the length of the response from the default of 20 tokens with the Maximum Length setting on the left panel. Touch the bar to slide the value - not the most intuitive of controls but it works.
You'll notice that you'll get the exact same response each time with the default values. Select the 'Sample' checkbox and try changing the Temperature to 0.8 or 0.9 to start seeing some variations in the results.
Next Steps
Enjoy your local LLM, and experiment with speed and the type of results you can get.
We'll wrap up this guide in part 3, where we'll cover how to bring Swift Chat over to your iPad and possibly the iPhone.
Sources
Feel free to dive into the source material if you really want to get under the hood.
Enterprise Marketing Manager - Worldwide Product Marketing
10mohow much RAM does Mistral 7B take? it's been interesting to see how good small mini and micro models are getting. (including Apple Foundation Models!)