In my last post, I mentioned that I set out 4 upcoming milestone. First of which was an UI overhaul that puts the emphasis on the human translator. Over the last few weeks I ended doing my milestone 2 first...
I was on a roll training new machine learning models and decided to just see it through and hold off on the editor UI. I experimented with brand new models that did some amazing things. The two that I want to show off today are the Japanese text detector, and retouch/inpainting model.
There are several existing projects online that people have attempted. I've tested their models and they don't work well. The two primary issues being that they work poorly on overlay text, and difficult to create training data. To address these issues, some innovation was required.
In one of my previous posts, I documented my experiments on extracting overlay text through image processing. I was able to use the techniques that I developed to effectively auto generate masks for a hand picked dataset. I was able to net around 400 pages out of my 4000 page OPM dataset that was also rich in overlay text. I then manually go in and fix everything that needed touch ups. This sped up data labeling process by at least few factors.
I also used an improved neural net architecture with this dataset, which produced results that vastly outperforms anything out there for overlay texts.

I've also been experimenting on inpainting models that can clean overlay texts. This effectively replaces human editing when doing retouch and redraws. I tested various models including those that used stable diffusion (what DALL-E uses). I ultimately settled on a model that is very fast and produced great results. I expect it to improve with more data added.


The OCR models remains unchanged, but OCR confidence is vastly increased due to the new text detector model. You should see many more text being translated. With that being said, I'm going de-prioritize refining OCR model. I think the value of having better OCR diminishes with the coming updates to the editor. It will be apparent very soon.
I've also trained and deployed models behind the scenes that I will save for a future post. They are quite exciting and will be more apparent with the new editor UI.
You can try out the all of the above from the current public alpha-test. I'll likely announce a new alpha-test in the coming months.

At this point, I believe I solved all of the puzzle pieces required for lightning fast manga translation. Going forward it's about expanding the dataset beyond OPM, adding quality of life features to the editor to make the translation process seamless.
I am now resuming my milestone 1 work on UI overhaul. Will update soon. Stay tuned!