Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length

 
Advanced search

1397260 Posts in 67453 Topics- by 60670 Members - Latest Member: skaymore

December 06, 2021, 12:11:07 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsCommunityDevLogsRFS - 2.5D raycasting game making goodness!
Pages: [1]
Print
Author Topic: RFS - 2.5D raycasting game making goodness!  (Read 879 times)
els
Level 0
**


View Profile
« on: August 29, 2021, 09:02:17 PM »

Retro FPS Studio (RFS) is my current attempt at a DOOM/Hexen-like game maker, still early in development:



Why I hope it'll turn out to be cool:

  • True software rendering via quirky sparse Raycasting. No OpenGL, DirextX, or such. While a final upscale is optionally 3d accelerated, you can also run that in software.
  • Janky fixed point maths.
  • Polygonal room layouts like DOOM, not just a Wolfenstein grid.
  • Despite being CPU reliant, it's ok fast even for Linux phones, or Raspberry 3+, or other somewhat low spec stuff with its Linux ARM64 build. There's even a Windows x86 legacy build to use for as far back as Windows XP SP3. No actual DOS era machines though, sorry, it's not that fast. (I wish it was!)
  • Advanced 2.5D features, supports dynamic lights with falloff like late Realms of the Haunting, Lands of Lore 2, or other releases just near the switch to the 3dgfx era. Also does vertical shearing and everything. No slopes however, that is a lovely retro map making restriction.
  • Renders in 16-bit depth per pixel with no need for palettes like 8-bit per pixel would require, while still retaining plenty of cozy retro color banding, with quantized light to appear even more color reduced. If you also posterize your textures it looks very oldschool indeed.
  • General MIDI 1/2 .mid file support gives that familiar retro vibe for music. Since synthesizer and soundfont are baked in, it sounds the same for all platforms with no reliance on actual OS synths.
  • Will have an integrated map editor, if I don't abandon this project lol.
  • Some info on engine architecture and internals.

List of tech used: C/gcc/MinGW, SDL2 (window creation, audio, input, final upscale & viewport display), midi-parser/TinySoundFont (midi playback), dr_mp3/dr_flac/dr_wav/stb_vorbis (audio codecs), stb_image (image/texture codecs), various crypto snippets/OpenSSL/libcurl (updater), miniaudio (optional alternate audio backend), Lua 5.4 (high level program logic & UI code), miniz/physfs (VFS, zip archive handling). All statically linked to avoid runtime dependencies.

Currently working on some renderer details like decals, sky, additive geometry so I have those out of the way, then heading on to the map editor. Wish me luck!
« Last Edit: December 03, 2021, 04:02:53 PM by els » Logged

just a coding gal
Schrompf
Level 9
****

C++ professional, game dev sparetime


View Profile WWW
« Reply #1 on: August 29, 2021, 10:27:34 PM »

This is pretty, and it invokes all the feelings. Restricted level geometry can bring out the best on one's creativity :-)

Go ahead!
Logged

Snake World, multiplayer worm eats stuff and grows DevLog
sambloom
Level 0
**



View Profile
« Reply #2 on: August 31, 2021, 12:17:09 AM »

Interesting, could turn out as a really neat sandbox to play around in during the evenings Smiley Yknow, to scratch that itch
Logged
els
Level 0
**


View Profile
« Reply #3 on: November 11, 2021, 04:18:40 AM »

Small update: I've been busy with infrastructure code still, not having headed into much editor UI yet (so the program is still very useless), but I have those things now:

  • A fully system-independent MIDI implementation via TinySoundFont and an integrated .sf2 file. RFS continues to just output digital audio to the Operating System, and to that end it renders all the music internally. This is cool because it avoids any potential MIDI issues on a user's system, and it sounds the same everywhere.
  • An upcoming 32-bit legacy build for Windows that after some work actually runs on Windows XP SP3! So if your PC is from around 2002 onwards, chances are it will run even if your hardware was never upgraded and you stuck with XP. Since it runs optionally without 3d acceleration, even a Windows XP VM with no GPU passthrough or any guest extensions works.
« Last Edit: November 11, 2021, 10:08:05 AM by els » Logged

just a coding gal
vdapps
Level 3
***


Head against wall since 2013


View Profile WWW
« Reply #4 on: November 11, 2021, 12:29:29 PM »

Hey, looks pretty cool Kiss . I have also project with "ray-something" in its description, albeit mine is something different by its nature. But anyway, welcome aboard and looking forward for more screenshots / animations! As you're doing everything software, this fact itself says you're capable programmer. Beer!
Logged

els
Level 0
**


View Profile
« Reply #5 on: November 13, 2021, 08:46:02 AM »

capable programmer
Thanks so much for the praise!! However I still feel like there are many more capable people including those who made the original DOS engines, which despite my own code doing okay'ish on something like a Raspberry 3 run circles around mine on a hardware with like 1/10th or less of the horsepower. Oof Blink

Let me rant about software renderer perf:

Part of it is I'm just not lowlevel enough. My C code is quite optimized for my own usual level, including duplicate specialized copies to avoid some hot path branching, but it's still C and not the assembly magic those old engines did. Or which SDL2 has which I use, but I wrote the 2D blitters from scratch too, doing really every operation manually (without SDL2's assembly blitters I mean) up to the final pre-upscale image. This was very fun, but not the smartest choice. I suspect I also use 64-bit ints too excessively where 32-bit or 16-bit would have done, but it's not always easy for me to predict in advance what precision really will work without later overflow headaches and I often preferred to be on the safer side. (The grainy fixed point precision artifacts on e.g. nearby walls are achieved by intentionally ruining/reducing the precision during select operations, not because I would have picked the perfect efficient low range ints.) This made coding many parts more comfy, but it had its consequences.

I suspect my choice to render in 32-bit 16-bit color is also responsible for some unavoidable perf differences. Many coloring look-up tricks which the DOS engines used to cut down on per-pixel divisions seem to not scale to that fidelity due to exploding table sizes. I didn't realize starting this project how impactful 16-bit+ vs 8-bit is in the core blit loop beyond just copying 2x the data when doing color tinting, so that was an interesting and surprising lesson.

I did my best to get branching and divisions out of the core blitting loops (walls, ceiling/floors) despite the above with some choices that proved smart, the biggest one being that lights have infinitely tall column-shaped falloff rather than spherical. While this was mostly to mirror the old engines visually, turns out this is also huge for perf since it means one wall slice is always one single tint of color. For floors/ceilings I just compute the color for the top and the bottom end of a slice and interpolate, and the interpolation is only updated every nth pixels (depends on resolution, higher res -> larger chunks). Also, using fixed point maths almost everywhere including for all scaling, even with uneven factors, and texture coordinates etc., sped up the few remaining core loop divisions too, and even allowed me to replace some with bit shifts and some modulo with bit masks. Add in me doing sparse raycasting to also speedup the loops around the core blitter loop somewhat, and it at least allows me to run circles around likely 95%+ of casually written modern raycasters, yay! So that's nice. (And I'm not even using BSP like Doom!)

I think the biggest missed perf gain on modern hardware was to not just use the GPU to draw the slices, and all the 2D UI blits. Not only using pure software blitters, but also only self-written ones in C with no lowlevel assembly, probably has the biggest overall perf impact.

Still, no huge regrets in overall: sticking with C and no assembly and large ints has made implementation fun and multiple architecture builds easy, currently there's an x86 Windows one for down to Win XP SP3 even, x64 for Windows and Linux, and ARM64 Linux. Sticking with no GPU means it hilariously runs even in an XP VM with zero 3d accel drivers, which is useless but badass. Sticking with 16-bit color has made it less beginner hostile since 8-bit color doesn't allow dumping in any ripped graphics without tweaking to make them work better with a pre-existing palette, let alone the headaches of tweaked per level palettes altering all assets on top that would confuse many newcomers. In overall I'm still proud of how fast it is, and I learned a lot - but it does low-key annoy me compared with the actual MS-DOS engines speeds I still tank Who, Me? oh well.

Hope the rant was interesting!
« Last Edit: December 03, 2021, 04:04:06 PM by els » Logged

just a coding gal
vdapps
Level 3
***


Head against wall since 2013


View Profile WWW
« Reply #6 on: November 13, 2021, 01:17:22 PM »

Rant was definitely nerdy and interesting! Smiley

I think, it's totally ok, you not went into assembler and stayed in C. Today, it's almost non-sense. Maybe, if you can find one-two bottleneck routines, you can write just them in assembler, but moving bigger parts into assembler can violate code readability thus also its further development. Plus, you must write assembler for every arch separately. I know assembler very well (I started programming in 8-bit era, but used it also on PC in DOS times), but honestly, this is not a thing I miss very much :D . Let assembler coding for 256 bytes and 4kB demo-scene maniacs. Cheesy I also thinks blitters C vs asm will be quite close in perf (I can imagine such code can be compiled nearly ideally).

Anyway, very nice to see untraditional software renderer. That's quite rare these days. Even games which today mimic old DOS era look, are using full HW acceleration, not many coders wants (or can) to go pure software way.
Logged

JobLeonard
Level 10
*****



View Profile
« Reply #7 on: November 15, 2021, 01:04:41 AM »

Quote
Janky fixed point maths
Nice. For a person like me who spends too much time on old, outdated algorithms this is the kind of nerd detail that instantly tells me this is a thread worth following Hand Thumbs Up Left

And I agree with vdapps, good rant!

Quote
I suspect I also use 64-bit ints too excessively where 32-bit or 16-bit would have done, but it's not always easy for me to predict in advance what precision really will work without later overflow headaches and I often preferred to be on the safer side.
I mean, other than hitting a memory wall it shouldn't make a significant difference on modern hardware, no? Although I guess hitting a memory wall is often the biggest bottleneck on modern hardware. Hmm

By the way, it's interesting to come across this a few days after the "SDL will support TRIANGLES!" news: https://www.patreon.com/posts/58563886

... but of course that won't have that delicious retro jank
Logged
els
Level 0
**


View Profile
« Reply #8 on: November 19, 2021, 07:25:12 PM »

Quote
this is a thread worth following Hand Thumbs Up Left [...] good rant!
aww thanks Kiss here are render details btw

Quote
hitting a memory wall is often the biggest bottleneck on modern hardware
Doubt I'll hit a memory wall with 32x32 texture sizes (highest supported one  Cheesy) and the levels' super raw C structures-with-connecting-pointers sector graphs, haha. While I do use Lua, the entire level structure, renderer, and in the future the physics are 100% C for speed. I mean see my rant above, the only relevant bottleneck which apparently exists on e.g. the PinePhone and other slow stuff is 32bit rgba colored blits being hard to do without kinda too many operations (at least I struggle), so raw CPU speed.

Quote
it's interesting to come across this a few days after the "SDL will support TRIANGLES!"
I'm honestly torn on that. I used SDL_Renderer, before deciding to hand-roll everything and ripping out the code without not even a compile-time option just to see if I could Durr...? and you know, I just feel like it wasn't the most necessary addition, especially the planned further expansions. I think its appeal was how minimal it is, and SDL2 honestly is already a bit, well... fat Shocked then again with how much I hand-rolled, I might just switch at some point given how relatively little of SDL2 I still use. So a first-world problem in a way. SDL2's gamepad support appears to be hard to beat though, so that's a bummer.
« Last Edit: November 19, 2021, 07:46:32 PM by els » Logged

just a coding gal
JobLeonard
Level 10
*****



View Profile
« Reply #9 on: November 20, 2021, 02:41:24 AM »

Quote
here are render details btw
Nice, thank you!

Also, using rendering hardware exclusively for upscaling is somehow very funny to me (and also makes perfect sense of course: it's not the 3D part so it's not "cheating", but it still gives better performance/lower energy requirements).

Quote
I mean see my rant above, the only relevant bottleneck which apparently exists on e.g. the PinePhone and other slow stuff is 32bit rgba colored blits being hard to do without kinda too many operations (at least I struggle)
Quote
Many coloring look-up tricks which the DOS engines used to cut down on per-pixel divisions seem to not scale to that fidelity due to exploding table sizes.
Couldn't you get away with keeping smaller tables and then use simple lerps to fill in the gaps? And when I say "lerps" I mean the kind where you can optimize the division part of a lerp into a constant bitshift, since we're always talking about 8, 16 or 32 bit total colors, so a power-of-two number.

That's sort-of I do at work (I'm part of a team that is working on a web app-based tool for analysis/dataviz of flow cytometry data - not too relevant but just for a bit of context). Instead of a full Cividis color ramp (or any of the other dozens of color ramps they can use for their plot) we save about ten stops per ramp, and lerp all the values between that. The errors are absolutely minimal and imperceptible.

Quote
SDL2's gamepad support appears to be hard to beat though, so that's a bummer.
Having someone else handle external hardware support, where someone else is "battle-tested library", is probably a good idea, yeah. Plus it's not part of the 3D rendering so why waste time on something that isn't a "core value" of your engine, right?
Logged
els
Level 0
**


View Profile
« Reply #10 on: November 20, 2021, 01:55:45 PM »

Quote
Couldn't you get away with keeping smaller tables and then use simple lerps to fill in the gaps?
The expensive part is modulating the texture pixel, not at all getting the light color itself. If you re-read above (don't mean to imply you didn't carefully, it's admittedly complex) an entire wall slice is just ONE light color. Basically, I'm already doing what you are suggesting but to a more extreme. The only thing I see left to do would be to just allow plain one color walls without textures, and that would kind of not be the intended graphics style anymore  Blink

I'm deliberately ignoring floor/ceiling in this response because the answer would be similar just in complicated. And the bottleneck isn't floor/ceiling only, it's the walls too.

Edit: I guess I could limit to 256 light colors and then have 256 cached variants of every texture but... that already doesn't really scale anymore for low end, which is the one needing optimization the most. It would if the textures were also all just 256 shared colors, which is what the DOS engines do. Hence the 32-bit problem! I hope that clears it up. (Additional note: I'm working under the assumption that flickering lights, or simple fog messing with the light level as soon as there is any movement, will be common. So caching the texture for just the last 1-2 colors tinted would probably have 95%+ cache miss or something, while every cache fill would be - comparatively - expensive. I'm somewhat confident that'd make it run slower without even testing it)

Quote
Also, using rendering hardware exclusively for upscaling is somehow very funny to me
It does also amuse me that basically copying the entire viewport every frame from system memory to GPU memory just to do that one draw call on top is still usually faster than  scaling it on the CPU right away. Computers can be weird  Grin
« Last Edit: November 20, 2021, 09:21:37 PM by els » Logged

just a coding gal
JobLeonard
Level 10
*****



View Profile
« Reply #11 on: November 21, 2021, 03:44:35 AM »

Quote
don't mean to imply you didn't carefully, it's admittedly complex
You're too kind, since this was definitely sloppy reading on my end this time Cheesy. But thanks for the detailed explanation. But yeah, that sounds like a difficult one to overcome. I'm sure you'll come up with a trick some months from now though, right before going to bed or when you're taking a shower. That's how it usually goes with these puzzles for me at least Smiley
Logged
vdapps
Level 3
***


Head against wall since 2013


View Profile WWW
« Reply #12 on: November 22, 2021, 01:36:58 PM »

So nice reading about "cache misses" and "super raw C structures" in these super-high-level programming languages days. Kiss
Logged

els
Level 0
**


View Profile
« Reply #13 on: November 23, 2021, 03:55:15 PM »

Okay so not for any performance reasons, and I'm still measuring but I doubt it changed much, but rather for optical reasons I did a larger refactor moving all the surface/blitting/sprite code from 32bit (24bit color) to 16bit (12bit color/4096 colors):

16-bit/4096 colors (new):



32-bit/some millions of colors (old):



I just like how retro potato it looks with quite the color banding, while avoiding the need for custom palettes unlike 8-bit/256 colors where they are pretty mandatory to get usable results.

Edit: it's good that I didn't expect this to be faster, since after measuring it actually got 15%ish slower on my lowest end test hardware which is the 3GB RAM model PinePhone. I think it's because the bottleneck doesn't appear to be the amount of bytes "pushed around" (which went down) but the computational load of the instructions to get the color value for each pixel (which with shifts/masks to access the now 4-bit channels went slightly up). It looks amazing though and it's still usable, so I'm keeping this change in any case.
« Last Edit: November 23, 2021, 11:12:28 PM by els » Logged

just a coding gal
JobLeonard
Level 10
*****



View Profile
« Reply #14 on: November 24, 2021, 03:13:32 AM »

Retro potato tastes so good when prepared right Hand Thumbs Up Left
Logged
Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic