Tile

View full article
Tile
Select a feed to view its content

Loading feed entries...

Even 'Uncensored' Models Can't Say What They Want

Even 'Uncensored' Models Can't Say What They Want

Security signal: Claude Code source code leaked via npm source maps. ~1,900 files, 512K+ lines of TypeScript exposed including internal "Tengu" codename a...

Security signal: Claude Code source code leaked via npm source maps. ~1,900 files, 512K+ lines of TypeScript exposed including internal "Tengu" codename a...

AI Behavior signal: GPT-5.5 ships with a verbatim system-prompt rule — confirmed by @ChatGPTapp itself — forbidding any mention of "goblins, gremlins, raccoo...

AI Behavior signal: GPT-5.5 ships with a verbatim system-prompt rule — confirmed by @ChatGPTapp itself — forbidding any mention of "goblins, gremlins, raccoo...

AI Capability signal: One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete workin...

AI Capability signal: One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete workin...

Interpretability signal: Goodfire's Adversarial Parameter Decomposition (VPD) breaks a 67M-parameter LM's weight matrices into ~10,000 rank-one subcomponents, rec...

Interpretability signal: Goodfire's Adversarial Parameter Decomposition (VPD) breaks a 67M-parameter LM's weight matrices into ~10,000 rank-one subcomponents, rec...

AI Behavior signal: OpenAI pulls back the curtain. "Where the Goblins Came From" is their own account of the system-prompt rule banning goblins, gremlins, ra...

AI Behavior signal: OpenAI pulls back the curtain. "Where the Goblins Came From" is their own account of the system-prompt rule banning goblins, gremlins, ra...

Platform Risk signal: Anthropic abruptly shut down an entire organization (60+ users) over an unspecified TOU violation, with appeals routed through a Google F...

Platform Risk signal: Anthropic abruptly shut down an entire organization (60+ users) over an unspecified TOU violation, with appeals routed through a Google F...

EpsteinBench: We Brought Epstein's Voice Back. We Got More Than We Wanted.

EpsteinBench: We Brought Epstein's Voice Back. We Got More Than We Wanted.

Abliteration vs Heretic vs Obliteratus: one trick, three layers of tooling

Abliteration vs Heretic vs Obliteratus: one trick, three layers of tooling

AI Behavior signal: "All the SOTA models are really bad at deleting code." They leave behind throw Error(...), deprecation copy, and stale tests.

AI Behavior signal: "All the SOTA models are really bad at deleting code." They leave behind throw Error(...), deprecation copy, and stale tests.

Business signal: "AI inference margins are a race to the bottom." Anthropic: -94% gross margin in 2024. MiniMax: -25%.

Business signal: "AI inference margins are a race to the bottom." Anthropic: -94% gross margin in 2024. MiniMax: -25%.

Security signal: 26 LLM routers were found injecting malicious tool calls and exfiltrating credentials. One incident drained a client wallet for $500k, an...

Security signal: 26 LLM routers were found injecting malicious tool calls and exfiltrating credentials. One incident drained a client wallet for $500k, an...