Why is [mgba_rumble] Libretro core noticeably more performant than regular mGBA? [Discussion] [Suggestion]

I can feel that this will be a slightly long write-up, and I apologize in advance for my future rambling, but I do think the subject matter merits a post in expressing my curiosity, sharing my findings, as well as a petition of sorts for the MuOS devs.

I recently got an RG34XX SP exclusively for playing GBA titles (mostly Rom Hacks) due to its perfect integer screen, flashed the latest MuOS 2508.4, then but found the H700 processor lacking headroom while running more demanding Rom Hacks—my findings on running a few Pokemon hacks here, but TL;DR:

  • Demanding GBA Rom Hacks like Pokémon Unbound/Odyssey and FireRed Rocket Edition WILL visually AND audibly stutter with mGBA core on the default [Ondemand] governor (Yes, even during title screen for the first two games, regular traversal for the latter)
  • Switching to [Schedutil] or [Conservative] (no need for [Performance]) locks them to a smooth 60fps for the most part (title screen and normal traversal). You get ~85–100fps during fast-forwarding in these demanding Rom Hacks, which is not a lot of overhead and can get eaten up if the visuals get hectic; not to mention a warm device (esp. when charging) and that ~20-40% speedup is not much of a fast forward at all, even for speeding through battle animation and dialogues.

Frustrated, sometime later I came across this older Reddit post https://www.reddit.com/r/ANBERNIC/comments/1fdedje/benchmarking_gba_retroarch_cores_rg35xxsp/ and was intrigued by the [mgba_rumble] core, since it was listed under MuOS GBA Supported Emulators page, even with the same version number as well.

Curious, I went on to test the same titles using the same in-game scenes and power settings.

First, my findings:

[mgba_rumble_libretro.so] seems to have much better performance compared to the regular mGBA core for these Rom Hacks for some reason. 160fps vs 100fps in the same area traversal during fast forwarding in FireRed Rocket Edition.

Even on default [Ondemand] governor: No stuttering/audio crackling in places where the regular mGBA core with default governor repeatedly struggles. (Easy to test: title screen of Pkm. Odyssey, Rocket Edition traversal stuttering is gone, Unbound title screen crackling still exist but much more minimal; Same scenes, same power profile, tested back and forth multiple times.)

For a device like RG34XX SP that’s designed pretty much exclusively for playing GBA games, it makes the whole experience with the console vastly better.

Titles I previously had to tinker to make playable (I find GBA audio crackling extremely unpleasant) is now great out of the box, with a much more usable fast-forwarding function.

This applies to normal, less-demanding GBA titles as well. The device draw less power as it achieves 60fps more efficiently, and presumably producing less heat, with longer battery life. It really unburdens the device’s potential in a big way in multiple aspects.

Delightfully impressed, now for a little bit of a rant:

One thing that worries me is that currently the [mgba_rumble] core is not shipped with MuOS install (at least for the 2508.4 Loose Goose I installed, the first MuOS build I tried).

For something that that enhanced my experience with the device and OS in such a big way, I had to:

First dig deep into the docs and old Reddit posts to find notice its existence, then digging through the menus in Explore->Content Option->Core->Core Downloader->Nintendo Game Boy Advance, then quit out of the menu and find it among the options. There are no documentation specific to the [mgba_rumble] core at all.

:up_arrow: All of the above, for one suggestion:

For the amount of difference it made, this really needs to be shipped with the OS Image itself at the very least. >99% of people who use these devices (especially RG34XX and SP devices that are made for GBA) and MuOS will never learn of its existence or get the chance of trying it out. As if more people get to test it and compatibility is not a concern, it really should be set as the default GBA core.

A few questions for the MuOS Devs to satisfy my curiosity, as [mgba_rumble] seems to be a MuOS exclusive (at least I couldn’t find mentions of it anywhere else):

Why are the performance between the two so different? What is [mgba_rumble] (regular mGBA does support ‘rumble’ as far as I can tell), and where did it come from?

For others who see this post who also play more challenging GBA titles and Rom Hacks, or those who just wants to save one some heat and battry life, please give [mgba_rumble] a try; I am really curious for your findings.

1 Like

The simple and quick answer is that mGBA-rumble is an old outdated fork on regular mGBA for rumble support.

It has since been merged into main mGBA, which is why mGBA now supports rumble.

There was a recent update to mGBA that has slightly regressed performance, but I’m sure there will be another update to the core to fix these regressions - these changes are out of our hands as we are not the mGBA Devs, we just package the latest retroarch cores.

When the core is updated, you’ll be able to use the core downloader to install an updated core without having to update muOS itself.

Regardless, you should be able to reach full speed in mGBA by turning on threaded video in retroarch settings. Make sure you change the option when no content is loaded - open retroarch from apps in the muOS main menu, then load the mGBA core, then turn on threaded video, then save a core override.

2 Likes

Here is capture from my device of the Unbound title screen.

I couldn’t hear any crackling. 99% it’s a config issue somewhere.

2 Likes

@Delibirb77

Thanks for the clarification! So that’s where the Rumble core is from. I suppose it is unavoidable to have slight performance fluctuations between releases as software updates.

Turning on the ‘threaded video’ setting is quite an improvement over the default, as you may observe below. Is there any side effects to turning on the setting, or should I leave the setting on for my RE34XX SP always?

@duncanyoyo1
Thanks for testing!

I know that the crackling/stuttering problem can easily be solved by adjusting any one of the settings below, or a combination of them. However I did have performance problems with GBA on the default settings; as things stand right now:

In the name of science, I did 5x40 seconds sound captures on the five different scenarios:

AUX Capture 1: Default core, [Ondemand] Governor, threaded video [OFF]

Audio gets real bad past around 20s mark.

AUX Capture 2: Default core, [Schedutil] Governor, threaded video [OFF]

Pretty much perfect

AUX Capture 3: Default core, [Ondemand] Governor, threaded video [ON]

Much better than capture 1, singular slight stutter at 00:12

AUX Capture 4: Rumble core, [Ondemand] Governor, threaded video [OFF]

Much better than capture 1, singular stutter at 00:33

AUX Capture 5: Default core, [Conservative] Governor, threaded video [OFF]

Pretty much perfect

One curious thing I found is, similar to my initial findings in the first post, that both Schedutil and Conservative governors behave very similarly, with the latter supposedly uses less power than default Ondemand. My guess is unlike Conservative, Ondemand clock the CPU way down as soon as there’s a decrease in load, and the fluctuating clocks can’t keep up with the load spikes; and mGBA really doesn’t like that..?

Slight performance regression between core versions and the hard performance cap on H700 aside, I suppose the performance issues I was running into is more of a problem of the default Ondemand governor on demanding mGBA workloads than anything else, lol.

My capture was with ondemand governor. That isn’t the issue.

Using schedutil will essentially be the same as performance. Pretty much keeps the CPU at max freq the whole time.

Conservative is pretty much a slower ondemand which in practice means it also essentially works the same as performance. Any load will make it spike to max freq and it is slow to decrease.

On this hardware, some GBA titles need threaded video to not have performance issues. It’s especially noticeable with rumble enabled titles that use the Game Boy Player intro.

There is also the fact that on this hardware, switching freq comes with a latency penalty on the CPU as it reclocks itself. That can cause stutters/audio hitches.

1 Like

What’s the best way to stay on top of when cores get updated?

Your best bet is to keep an eye on the libretro/mgba GitHub.

Having said that, there is actually already an updated version of the core that has been out since 14 October (c758314). However, in the Loose Goose release notes, it seems like MuOS devs have reverted to an older (less performant) version of mGBA:

Reverted mGBA Libretro core to affc86e branch

I would recommend you try updating the core using the RetroArch online updated, and see if that resolves any issues. This issue is well-documented within this GitHub issue raised, and in the libretro/mGBA dev response within this pull request, which shows that the 14 October update was intended to resolve this slow-down issue.

This issue has been around for the last few Goose releases now, I posted a thread back in September and my solution in the end was to switch to the gpSP core.

2 Likes

I absolutely love how you’ve come to that exact conclusion instead of understanding why we would have chosen said commit for building that specific mGBA version.

The simple reason is the fact that during many hours, days, and weeks, of testing. Going back and forth on many different mGBA versions this revision was the best bunch of them all. Just because something is newer doesn’t necessarily mean that it is going to perform better.

In this case the most recent commit didn’t “fix” anything and generally made things worse overall.

1 Like

My statement was not meant to come across as an attack, and I apologise if it was hostile. The dev response from @duncanyoyo1 said that you guys package the latest RA cores, but I know for mGBA this is not the case.

In my experience on my 40XXH and 34XXSP, the latest core has fixed slowdown issues for me in games such as Duel Masters: Shadow of the Code and Metal Slug Advance. If the optimal solution is to use the older branch with threaded video enabled, perhaps this can be a default configuration?

1 Like

The latest core actually runs worse and has audio issues with games that use different sample rates.

That is why we are running the specific commit we are.

I’ve built and tested the newest commit as well as other testers and there was noticeable slowdown in a lot of titles, as well as audio issues.

Also if you are using the Retroarch Online Updater on muOS that is using our cores repo by default, which will be the specific commit that was tested to work well (affc86e). It is not the latest git commit, and Retroarch Buildbot doesn’t even build aarch64 cores. The ones in there are from 2020.

So unless you built the core yourself I don’t know how you would have the latest commit to test that it does indeed work as you say.

If we update using retroarch core it will get the latest update or a 2020 build?