All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Optimizing Matrix Multiplication in Swift for LLM Training on Apple Silicon

By

Matt Gallagher

21d ago· 26 min readen

Summary

This article explores optimizing handwritten matrix multiplication code in Swift for training Large Language Models on Apple Silicon. It covers 10 different implementations ranging from plain C and Swift to Metal, focusing on performance improvements from Gflop/s to Tflop/s. The author provides insight into key optimization steps for mathematical code in Swift and explains the capabilities of different Apple Silicon units including CPU, SIMD, AMX, and GPU. This is the first part in a series about training neural networks in Swift on Apple Silicon.

Key quotes

· 3 pulled
The aim is to give some insight into the key steps for optimizing mathematics code in Swift.
I also hope that these examples will offer a sense of scale about the capabilities of the different units on Apple Silicon – CPU, SIMD, AMX and GPU.
10 implementations of handwritten matrix multiplication: from plain C and Swift through to Metal
Snippet from the RSS feed
10 implementations of handwritten matrix multiplication: from plain C and Swift through to Metal

You might also wanna read