Systematic Analysis Reveals Widespread Information Leakage in Preprint Archives
By
oldfuture
Warm and crisp on the edges. A bagel with a bit of bite.
Summary
This research paper presents a systematic security analysis of preprint archives like arXiv, revealing significant information leakage risks. The study analyzed 1.2 TB of source data from 100,000 arXiv submissions using the LaTeXpOsEd framework, which combines pattern matching, logical filtering, and large language models. The analysis uncovered thousands of PII leaks, GPS-tagged files, exposed cloud credentials, confidential author communications, and conference submission credentials. The researchers urge immediate action to address these security gaps while releasing their methods for open science.
Key quotes
· 4 pulledIn the absence of sanitization, submissions may disclose sensitive information that adversaries can harvest using open-source intelligence.
Our analysis uncovered thousands of PII leaks, GPS-tagged EXIF files, publicly available Google Drive and Dropbox folders, editable private SharePoint links, exposed GitHub and Google credentials, and cloud API keys.
We also uncovered confidential author communications, internal disagreements, and conference submission credentials, exposing information that poses serious reputational risks to both researchers and institutions.
We urge the research community and repository operators to take immediate action to close these hidden security gaps.
You might also wanna read
Wi-Fi Router Beamforming Feature Can Be Exploited to Identify Individuals With 99.5% Accuracy, Study Finds
Researchers at Germany's Karlsruhe Institute of Technology discovered that standard Wi-Fi routers using beamforming feedback information (BF
Behavioral feature engineering, not deep learning models, key to Trojan malware detection study finds
A study on Trojan malware detection focuses on behavioral feature engineering for Windows-based IoT and industrial systems. Rather than emph
MemoAttack: A Memory-Driven Framework for Automated LLM Jailbreak Attacks
This paper introduces MemoAttack, a novel memory-driven black-box jailbreak framework for large language models (LLMs). Unlike existing meth
CAPTCHAs remain viable for detecting AI agents by exploiting process differences
The article discusses how while AI vision language models (VLMs) can now solve traditional CAPTCHA image recognition tasks (like identifying
How AI coding agents are reshaping social science research: Opportunities and concerns
This article examines how AI coding agents are transforming social science research by automating core research tasks traditionally performe

Researchers discover hidden audio signals can hijack AI voice systems
AI-powered voice and audio systems (large audio-language models) are increasingly used in daily life for voice commands, transcription, and
spectrum.ieee.org·13d ago