All Topics

Technology

Art

Computing Hessian Inverse Products for Deep Neural Networks to Speed Up Gradient Descent

rahimiali

4mo ago· 2 min readenCode

75/100

Toasty

Bagelometer↗

Crackles when you bite it. Shows the baker did the work.

Score75TypeanalysisSentimentneutral

Summary

This article presents a GitHub repository that demonstrates how to compute the inverse of the Hessian matrix for deep neural networks and multiply it with a vector. The method enables solving the equation Hx = v for x, where H is the Hessian and v is a vector, which is more efficient than naive approaches. The technique builds on Pearlmutter's work for Hessian-vector products but focuses on Hessian-inverse-products, with the goal of using this as a preconditioner to accelerate stochastic gradient descent optimization in machine learning.

Key quotes

· 5 pulled

This package shows how to multiply the inverse of the Hessian of a deep network with a vector.

The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.

Pearlmutter showed a clever way to compute the Hessian-vector-product for a deep net.

By contrast, the paper and code in this repo shows how to compute the Hessian-inverse-product, the product of the inverse of the Hessian of a deep net with a vector.

Solving this system naively requires a number o

Snippet from the RSS feed

The Hessian of tall-skinny networks is easy to invert - a-rahimi/hessian

You might also wanna read

MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines

MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types

arxiv.org·8h ago

DeepSeek-V4: Hybrid Sparse-Attention Architecture Enables Efficient Million-Token Context Inference

DeepSeek-V4 introduces a hybrid sparse-attention architecture combined with on-policy distillation across domain specialists, enabling 1M-to

artgor.medium.com·14h ago

Rotary GPU: Enabling Large Mixture-of-Experts Models on Consumer Laptop GPUs with Limited Memory

This paper presents Rotary GPU, an exploratory approach to running large Mixture-of-Experts (MoE) language models on consumer-grade hardware

arxiv.org·1d ago

LinkedIn cuts GPU training hours by 65% with Generative Recommender system optimizations

LinkedIn has developed a Generative Recommender (GR) system that models user activity as token sequences, offering richer long-context perso

startuphub.ai·3d ago

Rank-Aware Decomposition Technique Reduces Computation in Recommender Systems by 87.5%

This paper presents a rank-aware decomposition technique for deep ranking models in industrial recommender systems. The key insight is that

arxiv.org·4d ago

Hands-on evaluation of MiniMax M2.7 via API on ML and coding workflows

The author evaluates MiniMax M2.7 by using it through Claude Code on three real-world ML and coding workflows: scaffolding a Kaggle competit

andlukyane.com·12d ago