Text boxes have been a bit of looming elephant in the room of the action RPG series. It's a topic that's very necessary to cover for the genre, it's one that takes up a lot of time even at a basic level, but also it's something that you can spend close to infinite time working on to make as modular as possible. It's honestly addictive to do so. It's one of those systems that's fun and also really easy to keep adding little tweaks to the foundation of.
It was very tempting to just drop in one of the many excellent dialogue engines that have been written by others and direct some traffic to other cool creators in the process. Given that realistically, that's a very likely and pragmatic solution that I would take myself if actually building this kind of game. Don't re-invent the wheel, and all that.
But there's many good lessons to be learned in building a simple text box system yourself, and while it steps us away from the meat of the gameplay for longer than I like, (and is also a topic I've covered before in other contexts) I think it's worth doing from scratch.

My goal is to strike a balance between showing you how to make effective text boxes, that can be extended and made more modular, but without taking a hundred videos to do so. Ultimately forgoing any features or modularity we simply don't need, choosing instead to explain further in the videos where and how things could be expanded without actually doing so. This isn't just for the sake of the video series either, this is what you should do in your projects. Developers have a tendency to try and build a system that can handle every possible edge case, instead of first deciding which cases are actually important. The more cases you can exclude, the better.
For example, do we need to be able to include portraits in our text boxes? do we need to be able to position them anywhere on the screen? or do they always appear in the same spot? Do we want to be able to vary the background? The fonts? Do we need dynamic text effects?
You have to answer these questions ahead of time because they decide the foundations of your system. If you need to dynamically animate your text and create shakey/wavey effects then that involves drawing one letter at a time in (potentially) different local positions rather than drawing a percentage of one long string, just once. This is why it's tempting to get stuck in a vortex with dialogue systems. There's SO much you can add! But think about your game, think about what you need. Think about what adds significant value.
Cut. Everything. Else.
We'll be cutting a lot, but we'll be spending a little time now and again talking about where you can add these sorts of things in.
-S