Computing Hessian Inverse Products for Deep Neural Networks to Speed Up Gradient Descent
By
rahimiali
Crackles when you bite it. Shows the baker did the work.
Summary
This article presents a GitHub repository that demonstrates how to compute the inverse of the Hessian matrix for deep neural networks and multiply it with a vector. The method enables solving the equation Hx = v for x, where H is the Hessian and v is a vector, which is more efficient than naive approaches. The technique builds on Pearlmutter's work for Hessian-vector products but focuses on Hessian-inverse-products, with the goal of using this as a preconditioner to accelerate stochastic gradient descent optimization in machine learning.
Key quotes
· 5 pulledThis package shows how to multiply the inverse of the Hessian of a deep network with a vector.
The hope is to soon use this as a preconditioner to speed up stochastic gradient descent.
Pearlmutter showed a clever way to compute the Hessian-vector-product for a deep net.
By contrast, the paper and code in this repo shows how to compute the Hessian-inverse-product, the product of the inverse of the Hessian of a deep net with a vector.
Solving this system naively requires a number o
You might also wanna read
MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines
MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types
DeepSeek-V4: Hybrid Sparse-Attention Architecture Enables Efficient Million-Token Context Inference
DeepSeek-V4 introduces a hybrid sparse-attention architecture combined with on-policy distillation across domain specialists, enabling 1M-to
Rotary GPU: Enabling Large Mixture-of-Experts Models on Consumer Laptop GPUs with Limited Memory
This paper presents Rotary GPU, an exploratory approach to running large Mixture-of-Experts (MoE) language models on consumer-grade hardware
LinkedIn cuts GPU training hours by 65% with Generative Recommender system optimizations
LinkedIn has developed a Generative Recommender (GR) system that models user activity as token sequences, offering richer long-context perso
Rank-Aware Decomposition Technique Reduces Computation in Recommender Systems by 87.5%
This paper presents a rank-aware decomposition technique for deep ranking models in industrial recommender systems. The key insight is that
Hands-on evaluation of MiniMax M2.7 via API on ML and coding workflows
The author evaluates MiniMax M2.7 by using it through Claude Code on three real-world ML and coding workflows: scaffolding a Kaggle competit
