All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

ONNX Runtime May Silently Convert Models to FP16 on Apple MPS Backend: Causes and Solutions

By

Two_hands

5mo ago· 24 min readenInsight

Summary

The article details a technical issue discovered in ONNX Runtime where models may be silently converted to FP16 (half-precision) when running on Apple's MPS (Metal Performance Shaders) backend, leading to different outputs compared to CPU execution. The author shares their experience benchmarking the EyesOff model and finding discrepancies between MPS and CPU outputs. The article provides technical analysis of the problem, explains why this silent conversion occurs, and offers solutions to prevent it, including specific code configurations and settings to enforce FP32 precision.

Key quotes

· 5 pulled
I noticed that the metrics from the model ran on ONNX on MPS had a different output to those on ONNX CPU and PyTorch CPU and MPS.
When I say ORT and MPS, I mean the ONNX Runtime with the MPS execution provider.
The issue is that ONNX Runtime with MPS may silently convert your model to FP16, which can lead to different outputs and potentially affect model accuracy.
This silent conversion happens because MPS backend may automatically optimize for performance by using half-precision floating point (FP16) instead of single-precision (FP32).
To prevent this silent conversion, you need to explicitly configure ONNX Runtime to use FP32 precision when running on MPS.
Snippet from the RSS feed
Having trained the EyesOff model, I began evaluating the model and its run time. I was looking into the ONNX format and using it to run the model efficiently. I setup a little test bench in which I ran the model using PyTorch and ONNX with ONNX Runtime (O

You might also wanna read