Appears on
Articles3
Optimizing Code Efficiency: Avoiding Premature Pessimization
Insight
H
FP8 is ~100 tflops faster when the kernel name has "cutlass" in it
News
Investigating Monitoring and Control of Thinking Processes in Large Reasoning Models
The article explores how large reasoning models monitor and control their thinking processes, focusing on models that segment computations using specific tokens. It delves into the internal dynamics of the 'thinking phase' within these models.
Insight
royeisen.github.io10mo ago

