NokiMo
mrseeker
mrseeker

patreon


April update

I wish I could write this sooner, but it takes quite some time to write posts. With all the people pinging me up to the point where my wife starts complaining, I am taking more time talking to people than actually doing work.

Either way, I found two papers that I found interesting. I am putting my thoughts on paper. For each of the documents, I will make some comments on them. They are two papers that I think are worth investigating, and with the generous donation from Vast.AI (thank you), I might be able to do some work on them.

Blockchain for GPT models

This one is an exciting idea that popped up in my feed. I have been doing it for two weeks now, and it's something that I find pretty interesting for KoboldAI. The paper is something I have been investigating for the ModronAI network. Its proposal is more of a fleshed-out concept, and they have a working example.

Unlike what they are using, I am focussing on a "private" blockchain. You don't need a "PoW", but you need to be invited to the network instead. I don't want to fall into the same trap that the original developers forgot to notice, which is how economics work. They don't have enough clients and too many servers. The way it is currently set up forces its network to become unstable in the long run. I am looking at a way to prevent too much inflation and push the incentive towards running bigger models.

New language models

This one is a variation of the paper mentioned before. Still, this one focuses mainly on a different setting, which is that if you train a model for a more extended period, you actually might benefit more from the model. It means that a 60B trained for a more extended period might outclass the GPT-3 175B models. The downside is that it needs more data and is most likely to be introduced through the ModronAI network.

Future work

I will look at each of them, make a small 150M model as a PoC, and see how well it performs compared to their counterparts. If this works the way I believe it will work, expect better models in the future with a smaller footprint. For now, I think I will stay with the small models and optimize them first.


Related Creators