All Topics

Technology

Art

Implementing Highly Efficient Transpose Kernel for Hopper Architecture with Mojo

timmyd

11mo ago· 5 min readen

92/100

Golden Brown

Bagelometer↗

Sesame, salt, and substance. A flagship bake.

Score92Typehow-toSentimentneutral

Summary

The article explains how to implement a highly efficient transpose kernel for the Hopper architecture using Mojo, achieving a bandwidth of 2775.49 GB/s. It compares the performance with pure CUDA on the same hardware, showing similar results.

Key quotes

· 3 pulled

The best kernel archives a bandwidth of 2775.49 GB/s, i.e. 84.1056%.

Mojo can achieve CUDA-like performance on the same task.

You may comp

Snippet from the RSS feed

In this blogpost I will step by step show you how to implement a highly efficient transpose kernel for the architecture using Mojo. The best kernel archive...

You might also wanna read

Four practical steps to control Azure Foundry token costs for agentic AI workloads

This article provides practical guidance on controlling token costs in Microsoft Azure Foundry, particularly for agentic AI workloads where

purplefrogsystems.com·42m ago

MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines

MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types

arxiv.org·49m ago

Why small pull request policies can backfire on software quality

The article critiques a common software engineering policy that limits pull requests (PRs) to small sizes (e.g., 500 lines, few files). Whil

apenwarr.ca·2h ago

How Anthropic contains Claude's expanding access across its products

Anthropic describes how it has evolved its approach to granting Claude, its AI assistant, increasingly broad access to internal systems over

anthropic.com·4h ago

Testing Cursor's Jira integration: How ticket quality affects AI agent performance

Cursor launched a Jira integration that lets developers assign tickets directly to an AI agent, eliminating context switching. The author te

bit.ly·4h ago

Netflix engineer's open-source tool cuts AI token usage by up to 90%

Netflix senior engineer Tejas Chopra created software called "Project Headroom" that prunes redundant tokens from AI agent instructions befo

theregister.com·4h ago