Tile

arXiv Feeds

Latest computer science updates from arXiv.org.

11 feeds
Loading tags...
Tile

View full article
Tile
Select a feed to view its content

Loading feed entries...

HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction

HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction

See No Evil: Semantic Context-Aware Privacy Risk Detection for AR

See No Evil: Semantic Context-Aware Privacy Risk Detection for AR

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

FreqFormer: Hierarchical Frequency-Domain Attention with Adaptive Spectral Routing for Long-Sequence Video Diffusion Transformers

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models

PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging

WeatherSeg: Weather-Robust Image Segmentation using Teacher-Student Dual Learning and Classifier-Updating Attention

WeatherSeg: Weather-Robust Image Segmentation using Teacher-Student Dual Learning and Classifier-Updating Attention

SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation

SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation

Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

DGHMesh: A Large-scale Dual-radar mmWave Dataset and Generalization-focused Benchmark for Human Mesh Reconstruction

DGHMesh: A Large-scale Dual-radar mmWave Dataset and Generalization-focused Benchmark for Human Mesh Reconstruction

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test

Lost in the Vibrations: Vision Language Models Fail the Dynamic Gauges Test

2D Pre-Training for 3D Pose Estimation

2D Pre-Training for 3D Pose Estimation

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

WebSerial Vision Training for Microcontrollers: A Browser-Based Companion to On-Device CNN Training

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

ParkingScenes: A Structured Dataset for End-to-End Autonomous Parking in Simulation Scenes

AgentRVOS for MeViS-Text Track of 5th PVUW Challenge: 3rd Method

AgentRVOS for MeViS-Text Track of 5th PVUW Challenge: 3rd Method

OAMVOS:2nd Report for 5th PVUW MOSE Track

OAMVOS:2nd Report for 5th PVUW MOSE Track

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

AeSlides: Incentivizing Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards

AeSlides: Incentivizing Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards