Research Log

Ongoing projects & their respective implementation notes, architecture decisions, and empirical findings from training and deployment.

🌱 implementation stage 🌿 training in progress 🌳 evaluating results & writing up analysis

Code

🌳

GRPO + GR

Explicit Gradient Regularization within GRPO

Explore Repository →

This research explores explicit gradient regularization applied during GRPO training for large language models, aimed at improving convergence and reducing catastrophic forgetting.

Articles

2026-04-12 Depth Scaled GradReg

Towards a new, mechanistically informed architectural design

2026-03-30 Mechanistic Analysis of GradReg

Analyzing attention sinks, entropy collapse, and their implications through a MechInterp lens

2026-03-21 Using Gradient Regularization to Train Qwen-4B

Discussing hurdles during training, and displaying results

2026-02-26 Dr. GRPO with Gradient Regularization

Implementing Dr. GRPO and Gradient Regularization from scratch

Code

🌿

Power Sampling

Replicating RL's policy sharpening at inference

Explore Repository →

Inference time sampling technique using a power distribution and MCMC that tries to match the effects of RL on the model's policy.

Articles

In progress...

Code

🌿

Self-Distillation Fine-Tuning

Performing SDFT on domain specific knowledge

Explore Repository →

Investigating the efficacy of SDFT on OOD tasks - specifically on corporate specific information - and comparing them against Experiential RL.

Articles

In progress...

Code

🌱

GradMem

Learning to write context into memory at inference time

Explore Repository →

An alternate form of memory compressino that involves test-time Gradient Descent to continually 'bake in' knowledge into the model.

Articles

In progress...