codensuch

Insight into overlay text segmentation, and dedicated server update

Added 2021-11-29 04:03:28 +0000 UTC

I'll get the latter out of the way first. The dedicated server is about half way built. I just got Tensorflow built on it and is in process of building and hooking up rest of software. Vacation time is coming up so I'll be able to dedicate way more time to work on this project. I expect to have something working before end of the year.

In other news, I've been experimenting with overlay text segmentation. It recently came up frequently as a topic of conversation with several Patreon members and one of my testers. So I spent some time to experiment to see what's possible.

Overlay text segmentation is an extremely difficult computer vision (CV) problem. Unlike bubbles which is text against solid background, overlay text can overlay any background, often extremely noisy background like this:

Almost all reliable OCR systems today work with black text on white background. Any noise in the background will significantly degrade the OCR quality and confidence. There are many methods to deal with specific types of overlay text, but I need a solution that works well for ALL overlays. So you can see how this becomes a very tricky problem.

Let's briefly discuss the two approaches that I might take to solve this:

1. Algorithmic CV with some sort of heuristic approach.

We need to figure out some common pattern among most manga overlay text, and apply classic computer vision algorithms to try and extract the text. For example, most overlay text is typeset with a border. If we can trace that border we can extract the text. This is not always reliable because of background uncertainty. There are noise inside characters in the image above. They are not apart of text but hug so closely to them that it's even difficult for a human to manually remove.

2. Train a neural net to do it.

There are two ways we can tackle via neural nets. One is to train a full OCR system. You know, get every single Hiragana, Katakana, and Kanji and all possible variations of them in thousands of fonts and train a neural to recognize them when they are overlayed over random backgrounds. The second way would be train a segmentation network, with isolated text as ground truth, and overlay it over random set of backgrounds, and train against that (similar to what this project is doing). The problem with both of these approaches is that the data set is extremely difficult to create. Both cases require creation of intricate manual per character labeling. We can certainly try to create an auto labeler. For example I can write program to auto generate the data set. It will generate Japanese sentences and overlay them on top of random set of manga backgrounds in varying formats.

In my opinion the neural net approach is long term winner. It's going to out perform any classic CV approach by far over time, and far more powerful for generalizing text extraction in any manga. The classic CV approach can work well, but long tail of work required to generalize it for all manga is going to be a nightmare.

So what am I going to do? I currently don't have the time nor the aptitude to create such a neural net. So I'm going to stick with classic CV approach for now. Here are some results from my experiments:

Not bad right? It's sufficient for cleaning but still too noisy for OCR purposes. This was based on a single CV stack worked on all these texts. Keep in mind that it's still largely experimental, so there is much room for improvement.

Hopefully this gives you some insight to the difficulty of the overlay text problem. I'm fairly confident that it is solvable problem. I just need more time and resources to make it happen.

That's all for now. I apologize for the lack of updates over the last two month as life got a bit nuts. Vacation time is coming up so I'll have way more time to dedicate to this project.

More updates to come, stay tuned!