All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Per-Instance TSP Solver Using PPO Achieves 1.66% Gap on d1291 Without Pre-training

By

jivaprime

5mo ago· 1 min readenNews

Summary

A developer presents a novel approach to solving the Traveling Salesman Problem (TSP) using deep learning without pre-training. Instead of relying on large datasets, the solver uses Proximal Policy Optimization (PPO) to learn from scratch for each specific TSP instance. The core hypothesis is that optimal solutions consist mostly of 'minimum edges' (nearest neighbors) with difficulty arising from a small number of 'exception edges' outside local scope. The approach achieved a 1.66% gap on the TSPLIB d1291 benchmark in about 5.6 hours on a single A100 GPU.

Key quotes

· 5 pulled
Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets.
I wanted to see if a solver could learn 'on the fly' for a specific instance without any priors from other problems.
I built a solver using PPO that learns from scratch per instance.
My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope.
It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.
Snippet from the RSS feed
OP here.

You might also wanna read