Hi guys, as you know in parallel to working on the core Pornspective application, I've got another stream of work with AI tagging. During the summer I created a basic Machine Learning model based on a selection of around 1000 images to train the algorithm. I chose something called YoloV5 as the basis of my model, this was based primarily on the fact that YoloV5 is pretty quick to train (around 3 hours).
I saw some good initial results from the model and using it in the conventional way through Python. There were some stray hilarious results but on the main I was very impressed. I then went on to give it a much larger test by creating a small Pornspective ML application in .NET. The idea is this application grabs the video thumbnails (same as the core Pornspective) and then throws these images through the YoloV5 engine for object recognition. The results I get back per image are a list of things it thinks it has found with a confidence rating, plus coords to be able to highlight the location on an image. So for example it might return "DoublePen" with a 96% confidence and the location it thinks it is. If we then combine this with other results of same video i.e. its spotted "DoublePen" with > 95% several times in a section of the video then can we assume "DoublePen" is happening? Well hopefully, that's the whole idea!
Don't worry I'm not about to introduce such automation without the user having some control. My thinking is that this is ran in the background and makes suggestions which user can agree with, as confidence grows the user might decide to allow the suggestions to be automatic. Ultimately any tags picked up via AI would be flagged as AI tagged, the user will therefore be able to use AI tagging suggestions or ignore them based on how well they think it works. This type of stuff is very hard to gauge at the moment and will no doubt get better over time, so I think the application needs to work along that principle.
Okay, where I left it in the summer was a .NET application calling my Python code to test the model. This is not an ideal solution for Pornspective as it means a complex install of Python, especially on Windows. So I paused the work until Microsoft had caught up, or if I'm being honest someone had cracked how to do it via the Microsoft ML framework. You have to keep in mind this is real cutting edge technology and so examples are very rare.
I'm pleased to say though, this last couple of weeks I've hit a milestone on this. I've managed to get the entire model working in pure .NET meaning I can easily package it with Pornspective. It's still very much in a prototyping stage, today I'm building a new model in YoloV4 to be much more accurate. This may take several days just to train the model. I also want to test the model engine on an NVIDIA GPU as currently it takes around 3 seconds to process a single image. Still not bad though, a couple of mins and it's done an entire 20 min video with 40 frames extracted. From what I read an NVIDIA GPU will reduce that 3 seconds to milliseconds, meaning 20 seconds per video maybe.
So there is still lots to do to make this happen, and of course I'm hoping to hit the goal of 100 patrons. However if I do get something working over Christmas I will be looking for some beta testers, give me a shout if you'd like to take part!