Discussion about this post

User's avatar
David Holmer's avatar

I would guess they are training something like a LoRA with the search being a non-gradient based optimization instead of the typical gradient based iterative update.

That would make sense to do at fit time. Have negligible inference time impact as it works with base model and is efficient to store per user/customer. Base model effectively contains many priors and objectives from pre training. This likely would apply an application specific balance adjustment.

I wonder if some more testing might show a pattern in inference time delta between thinking and base models that might help narrow the possibilities.

No posts

Ready for more?