codensuch

Demystifying common translation issues

Added 2023-02-28 09:32:38 +0000 UTC

I thought it might be fun to address some of the common "issues" that you may see during the alpha tests. I am aware of almost all issues, but may not address them in a timely manner. Hopefully this post demystifies how I tackle these issues based on my development style.

Inconsistent font size in bubbles

Background: Professional publications will tend to stick with a preset font sizing scheme regardless of original Japanese font size. They tend to be more American style. Scanlator on the other hand tends to follow the Japanese style. In the end, it's very subjective. It's more art than science.

What I do: The app currently simply maximizes the font within the bubble with automatic margins. In my opinion, this is currently good enough for automated end-to-end translation. Ultimately I want the editor to be in charge of all font related settings, so font sizing is primarily an editor feature.

Bubble/text cleared but not translated

Background: This means that the text was detected but it was not translated either because OCR confidence is too low or the bubble geometry was unusable.

What I do: OCR confidence can be improved through better image processing and more ML training, and bubble geometry handling improvement is forever ongoing. Overtime you will see both gradually improve.

Double translation text

Background: this happens more frequently on bubbles that look like overlay text or vice versa. Many translucent bubbles are like this. The AI classifies them twice and they end up being translated twice.

What I do: This can be easily fixed either through refining ML dataset or heuristically. I'm currently working on a big feature that implicitly fixes this.

Awkward typesetting in certain bubbles

Background: this again tends to happen in translucent bubbles where it's not a solid background.

What I do: This can be fixed by improving heuristics of bubble geometry detector. This is an ongoing and gradual effort.

Wrong text color/outline

Background: Sometimes you see the background color being the text color or outline added where it shouldn't

What I do: This will gradually get better with improved OCR text preprocess, as well as improved AI text detector. I'm also exploring training a "one-shot font analyzer" neural network that may make this problem trivial.

Tiny unreadable text

Background: This is caused by translations that have no whitespace to enabling line breaking. For example if the translation is "Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaahhhhhhhhh" There is no appropriate place to hyphenate or line break, thus machine typesetting ends up shrinking the text and putting it on a single line.

What I do: There is already a list of lettering rules. This is just another lettering rule to add.

Weirdly redrawn areas

Background: Before text is erased and redrawn, the text is "enlarged" to make sure it covers all of the text and improve the quality of redraw. The current enlargement is a "fixed sized" enlargement. Which means a small font text gets the same enlargement as large fonts, which ends up erasing more than it should.

What I do: Can be solved by detecting font size and scaling the enlargement, and/or adding more dataset to the retouch model.

Conclusion

A lot of the issues above are simple to fix. Others are either naturally solved by adding more training data to the data set, or are apart of quality of life and ongoing updates to existing features. There are many things that I would like to add or fix "if I had more time". But I always prioritize getting the core architecture right so I can easily scale and add fixes/features like the above.

In short, getting high-level design and features right is king, fine-tune details later.