Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411281 Posts in 69324 Topics- by 58380 Members - Latest Member: bob1029

March 28, 2024, 11:31:37 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)The happy programmer room
Pages: 1 ... 274 275 [276] 277 278 279
Print
Author Topic: The happy programmer room  (Read 673105 times)
ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #5500 on: July 15, 2019, 09:24:39 AM »

You should extend that system so that you can set the state of an object by changing its vtable pointer Wink

I actually did a system like that for space punk slam dunk in C++ where you have tuples of method pointers that represent the state of objects, and each object that has a "virtual state machine" just needs a pointer to one. Of course, the actual implementation ended up being much more complex because I got caught in a quagmire of C++ features that, at a distance, promised conveniences but ended up being more time consuming than anything. Still, the system works fwiw.
Logged

ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #5501 on: July 15, 2019, 09:58:11 AM »

Whoa, interesting. So, instead of implementing a state machine with a switch statement or something, you use a changing function pointer that goes straight to the next set of behaviors? That's neat. In my system, that'd still be doable by putting function pointers into the instance variable area instead of the central vtable. I'll have to remember that technique if I run into a situation where it'd be applicable.
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #5502 on: July 15, 2019, 10:21:45 AM »

Yup!

I even added a bit of fluff to it by reserving the first two position of each vtable for onEnter and onExit methods. Then I can have a changeState method on each class that takes a vtable pointer and performs the onEnter/onExit appropriately.

I could also have used unions to switch between state-specific data, but that seemed like too much of a hassle for what it was worth.
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #5503 on: July 15, 2019, 11:52:54 AM »

There’s also a really nice variant that plays well with shared library hotloading. Instead of implementing a vtable by storing a pointer in your object, it is possible to have your object store an integer id. This id can then index into a global (static) table of vtables. The reason this plays nicely with code hotloading is that the address of the vtables is set by the compiler and hardcoded directly into the assembly.

Code:
struct object {
    uint32_t id;
};

struct octorok {
    object base;
    int pellet_count;
    int ai_state;
};

void update_octorok(octorok* o, float dt);

#define VTABLE_DEFINE(x) (void (*)(void *, float))update##x

struct vtable {
    void (*update)(void*, float);
};

vtable tables[] = {
    VTABLE_DEFINE(octorok),
}

void update_octorok(octorok* o, float dt) {
    switch (o->ai_state) {
    // ...
    }
}

void generic_update(object* o, float dt) {
    tables[o->id](o, dt);
}

The idea is octorok would “inherit” from object, which contains the type id. The type id is an index into the table of vtables. There is one macro for registering a vtable, which sets all the function pointers in a vtable index in static memory. The example only has an update function, but any number of functions can be added, and any number of different objects can be added.

New functions and new object types can be added at run-time.

The behavior of an object can change by just changing the id (assuming the underlying data matches for the various functions).
« Last Edit: July 15, 2019, 11:58:26 AM by qMopey » Logged
InfiniteStateMachine
Level 10
*****



View Profile
« Reply #5504 on: July 16, 2019, 06:30:34 PM »

I’m glad photorealism/PBR and the latest and greatest GPU features are of no interest to me. Tongue

Heh it was for my job. I doubt I would have gotten into it otherwise.

That said I'm glad I did do it. Rendering stuff has been a blind spot and it was nice to be able to justify all the studying Smiley
Logged

oahda
Level 10
*****



View Profile
« Reply #5505 on: July 17, 2019, 03:48:24 AM »

Yeah, that was a bit tongue in cheek. Tongue I'm not one to frown upon knowledge!

Also, sorry, gimymblert, I missed your response to the same thing: I'm afraid it's way over my head tho. Shocked
Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #5506 on: July 19, 2019, 12:25:43 PM »

Do'nt worry, if it's not implemented it's worth nothing, I just finish my training in webdesign, so I'll probably have to implement it. Might be easier to understand once it's done, but also we will see what are the actual limitation (it's a very harsh approximation of raytracing).

If I get it done you will probably do better stuff with it than me anyway lol, I'm still amazed how you used that zelda shader to get proper watercolor!

But I'm still crazy, while I haven't succeed in making hairtech shader(yet?) by analytically raytrace an helix in wrapping cube.... therefore ...

I thought about what would happen if I was making anthem reveal trailer as a gameboy game, because that make sense lol. The obvious way to make a 3d anthem on gb is to go space harrier and have sprite animation for z placement. But why not free roaming? I mean we got elite on BBC and NES, and they have 3d space ship with line removal ... I don't want 3d object, just free roaming so at least a wireframe rooms and corridor set up, as room are convex and therefore don't overlap, and corridor being space harrier/hang on like. I bet that rotating 4 points wouldn't cost more and we could probably cheat a bit by finding way to reuse calculation for all points (trig is super expensive on 8 bit machine with only add). Then I thought I could probably have spherical room and use perspective trick to get the illusion of moving away and toward wall at any angles. Which also mean we can use more graphic tiling trick instead of plain empty wall for graphics. But then why stay at convex shape? is there anyway to do better than that? what about a wolf like 2d raytracing? Problem is that the compute budget is low, and instruction have variable cycle count, it's a tile based hardware not a framebuffer one. I have seen example on NES and c64 and cpc, instead of operating at the pixel, they operate at the tile to render, which is much less data to process, still they are barely scrapping by, how can we accelerate the data, wouldn't be fun if we could do parallel processing on gameboy. Nonsense obviously, so I decided to investigate that...

What if we consider byte parallel processing of 8bits at the same time? We could probably raytrace 8 single bit operation at the same time! After all tile are already store that way on 8bit machine, in a c64 for example, character are an array of 8 number, each number is a bitfield representing the graphics row. Gameboy is loosely similar, except that it use 2 numbers per row, as in they are stacked to encode colors, so the first bit of both number encode the same pixel (2bits for 4colors) and that's how the hardware read them. What if we expend the idea to raytracing? basically the field would be a 8x8 bitfield (8bytes), you would "rasterize" the ray into a 8² field representation and && it to the world representation and use the "count leading/trailing zeroes" algorithm to get the intersection point, then use that to draw (hide or show slot gfx) walls. The thing is what is the best FOV so that bishift constitute a proper Z calculation replacement. I haven't figure out the rasterize ray into field part yet.

Of course this game me idea to extend that to beyond the gameboy, bitfield raytracing would probably awesome in modern cpu with 64bit length, but instead of doing 2d, we could do 3d, basically for the footprint of a regular 32 bit heightmaps, you could have 32 voxels height size map, and I think that it would also play well with the cache. But instead of height map, why not pack each byte as voxel arranged as a cube? each bit is a cube corners and tells you if it's present or not. Now you would have space locality, so you could skip empty space or terminate ray just by checking the whole cube instead of each voxel, if 0 it's empty skip, if ff it's full therefore hit, if any number step through (or use a LUT that get all cases). But that's for a single byte, what about 8bytes operation (64bit) each byte is a cube, but they are too arranged in a bigger cube, as not only the bit are corner, but also the byte, now you have a 4x4x4 cube you can skip with single checks, so if you have an array of them, you basically speed up marching through the data set up to 4 times. And since we use bit operation to trace through it anyway (&& operations) it's potentially fast and cache friendly, also even cache have a latency issue with how fast processor are, so we are potentially going even faster because we process 64 unit in a single op. Also ray are coherent, in an array they are conditioned by slopes, so we can potentially use a bitshift on the rasterized rays to get faster at tracing the ray too.

You could tell I was bored coding HTML & CSS all day long ...
Logged

oahda
Level 10
*****



View Profile
« Reply #5507 on: July 19, 2019, 02:13:28 PM »

That's the thing tho, I might know my way around modern tech, but the prospect of software 3D rendering on a GB, even just the theory, is complete magic to me—I'm not understanding much at all of this. Who, Me? Raytracing on a GB… I haven't even learned how to do it on current hardware. Cheesy Why GB tho, I thought you were doing that Sonic / princess thing?
« Last Edit: July 21, 2019, 02:32:57 AM by Prinsessa » Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #5508 on: July 20, 2019, 03:55:41 PM »

Just fighting boredom, I wasn't expected to actually implement it, it's just a thought experiment in lateral thinking. But now I'm curious about actual practicability Huh?

You do stuff much more complicated anyway, I read them all the time here lol



Raytracing (at least like I present it here, which is more accurately raymarching) is dead simple though, you just look into a grid a value, if it's the right value, proceed into the next relevant cell (in the direction of the ray), else get the position of the grid or get the value of the grid. On teh gb Thing I find a way to do it faster at bit level using bit tricks.

Else it's basic math of line primitive-intersection, you loop the primitive against a ray and take the closest hits, do it for each pixel of the screen. Anyone can get a raytracer under 100 line of code.

The real thing with raytracing is less about the raytracing part and more about containing the explosion of resources needed for GI. Basically for each hits you need to generate reflective rays, and they are a hemisphere of point above the hit. So for every hit say you need 1000 more rays, these ray will hit thing and generate 1000 other rays ... oups ... The code is teh same, you just add more rays to loop in.

The other difficult thing about Raytracing is still not raytracing, it's the "light transport" ie how much energy ray have in a direction after a hit, which is based on the brdf (shader) of the material hit.

The other difficulty is still not rayTracing, it's about how do you spend your ray budget to get a cool image. Even with many rays you have sampling issue and the image get noisy, the more rays the better but slower, so you need strategy to get more from less, such as importance sampling (concentrate rays where it matter the most, usually based on the brdf), denoising (remove the noise appearance by guessing how smooth the result is based on the noise and a few heuristics).

The other difficulty, still not raytracing, is how to loop efficiently all objects without having to loop through all of them and discard those we will never hit. Which is like spatial partitioning, you can use voxel type or tree type of structure to query object in ray path. It's the same in physics engine.

On gameboy I was thinking only about the first hit to detect very wide area on teh screen (not pixel) with shallow number of elements, all based on a small 2d grid. Based on the size of bytes.



On the other projects with cubemap, I simply consider a screen as first hit (that's what an image is, all the rays that are reflected back to your eyes). SO I simply take a picture of all objects with their UV colors, from the cubemap position. Since for every pixel, in shader we use the normal to find the color on the surface of the sphere of the cubemap, instead of using that color directly, I use it to get the color at that UV position defined by the color, like in flow map or glass shader, if you ever made those kind of distortion shader you have half the work done. Then I mix that with other techniques (box projection, dynamic texture) to simulate sampling above each points (that is I take multiple samples by shifting the normal 90° around it's direction in a circle) which I then accumulate each iteration in the dynamic texture. There is some more shenanigan (like properly weight the accumulation by distance, angle from the original normal, etc ..) but they aren't hard.

On the cubemap thing, I also use the lightmap as the lighting surface directly. Generally you write a shader, and in the shader you write how the light interact with the point. When you use lightmap, you skip the lighting and just sample the lightmap. But what if you do lighting directly in teh lightmap? That's a lightmap Gbuffer, basically I store the normal (in world space), the shadow masking (there is many way to do that) that will block the light. Then for each pixel I find the light that I store in a dynamic writable texture (light accumulation) which is teh texture running the shader and taking the Gbuffer lightmap as argument.

For the shadowmasking, I have many options, I store the XYZ positions of the pixel (precision issue with 8 bit), so I can compare with the shadowmap. The result is used to mask the usual light computation with the normal and the light position. Another option (the hard one) is to bake shadow with a Spherical harmonic texture, if we need time of day, instead of using a sphere data, we will use circle as time of day is generally circular, which mean less data to store. The simple way is just to have a regular another shadowmap texture and read that to attenuate lighting.

That is the whole operation is from the dynamic texture:

1 - read the Lightmap Gbuffer input (normal [RGB], world position [RGB], shadowmask [A], cubemap index [A])
2 - compute lighting with the normal and shadowmasking (like in regular shader) of the Gbuffer
3 - sample the UV cubemap atlas texture using the index stored in a channel of the Gbuffer
4 - use the cubemap sample to read the Gbuffer again
5 - compute the GI light of that sample
6 - mix and store all the data into itself (accumulation)
7 - Repeat next frame/iteration (can be frame independent)

A - Shader in static environement, read the lightmap, done!
B - Shader in dynamic objects, read the box projected UV cubemap with normal, sample the lightmap using teh UV data, DONE!


Another way is to separate the lighting calculation (1 and 2) in another dynamic texture (direct lighting texture, basically automatic baking), so we sample the light directly instead of calculating each frame (light probably won't move) and each time we sample a GI light (5).

Open GL es 2.0 are limited to 8 texture samples:
1. 2 textures samples RGBA of the LM Gbuffer
3. 1 textures sample of the cubemap per GI sampling
4. 2 textures samples RGBA of the LM Gbuffer per GI sampling
- That is 5 textures samples out of 8 for the simple case, and 3 textures samples per "rays", that is 2 rays max in OGL 2.0  But I have omitted the albedo sampling in the GI pass, so that's 4 sampling per ray with it. ie only one ray if we want colored GI with 2 extra remaining unused sampling.
- With the separate Direct light texture, we still get 2 textures samples per GI pass, since we need the position (for attenuation) on top of the lighting, but since we separate the light pass, we can get the color back since it can be done in the direct light pass (3 samples per direct update: normal+shadowmask, position and albedo) but we loose complex light transport (which is fine on low end).

Some optimizations (quality or speed) note due to the flexibility of the shader, which mean we can scale it as complex as we want/need it to be. I'll make bold the simple version, anything else is complexity build on top and probably not needed for the a simple working version, so if it's not bold you can probably skip it.

in 4 - it's using a a simple approximation technique call box projection (google it), and basically the volume that define the space of the projection is how we get the index per point, that is when the volume overlap or is close to a point in the lightmaped triangle. We can probably store more than one index and interpolate in order to have less artifact. Deciding how many cubemap sample per point and how to store the index seems like a way to experiment to improve the final render. How we store the cubemap atlas is also another point of flexibility, the most simple and intuitive is to use a lat long 2d cubemap (indexed by the normal x y and z) but isocahedron seems to be more efficient, it seems simple but still need to google the formula.

in 5 - That's the most involved part, it's not necessarily complex, but there is a lot of flexibility to scale complexity there. Basically that's where the light transport GI happen.
* The first thing is that we can sample a single point per frame, advancing the loop at the next frame to sample the next (parameter passed by a script to the shader). Useful for low end open gl 2.0 hardware who has a limit on how many textures sample we can have. Or get many points in a single loop pass of the shader on high end machine.
* We can do some importance sampling optimization probably somewhere in the sampling loop to reduce number of samples needed and ameliorate the quality, no idea how to do it, but that's something we can do ...
* Ideally we should recalculate the whole shader of the sample points (ie sampling the normal, the shadow, the colors (albedo) etc ..) to get proper lighting, calculate the attenuation using the two position to get the distance, calculate the angle attenuation and specular contribution by taking the vector toward the sample and doting it with the normal of the sample, and then weighting the result by taking the normal of the point with the direction toward the sample ...
BUT THAT'S TOO MUCH WORK, we would just sample the direct lighting (by using the normal calculation OR sampling it with the "direct light texture" idea) and attenuate it with the distance to the sample.
* If we have a direct light texture, we can use the mipmap to sample the average colors of close points, that would count as a multiple samples, but beware of the edges since we are on a lightmap representation. It's similar to cone tracing in voxel GI, thus we can basically also select which mip map to sample, by using the distance and computing the coverage to get the cone size giving the correct equivalent mip depth. However this increase light bleeding from sampling incorrect data from neighbor triangles that might not have the same direction. There is probably some research to do to limit the bleeding, probably with comparing the normal of the samples or extra data in the texture?
* Since we have the position of the start and end point (point position and sample position) we can use that to intersect a list of primitive in high end machine, and return the data if interception instead of the sample data.

Edit: due to the technique being lightmap based, it's limited to object being in the lightmap.
I have the idea of reserving cubemap index 0 as the "farfield" or basically the "skybox of the region". This cubemap is accessed for UV points in the sampled cubemap that have data (0,0), it's basically every other region outside the current one. If we calculate other region then render other region from that farfield cubemap, we can get their lighting contribution sampled instead of getting the sample from the Gbuffer. By updating region in specific order we can get further, then all the farfield cubemap we get distance lighting contribution too. Those farfield region can also be LOD.

Instead of storing light accumulation into pixel colors, we can store them into SH to have directional, but going from 3 channels (single texture) to 3*3=9 for SH1 (3 textures) or 3*9=27 for SH2 (9 textures))

Using bent normal would also ameliorate the result, bent cone are cone that are in bias toward the occlusion average, ie where most light would enter if the occlusion above the point is convex. this something an external soft will bake anyway.
« Last Edit: July 20, 2019, 05:00:14 PM by gimymblert » Logged

qMopey
Level 6
*


View Profile WWW
« Reply #5509 on: July 20, 2019, 04:53:54 PM »

Wow that is a long post  Screamy
Logged
oahda
Level 10
*****



View Profile
« Reply #5510 on: July 21, 2019, 02:40:13 AM »

You've thought it out in such detail that you pretty much just have to go ahead and implement it now to see if it works! Blink Blink
Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #5511 on: July 21, 2019, 06:07:09 AM »

Yep, exam is tomorow, after that I'm free to do it! I'll report result asap!

edit:
Let's see if it's as MAGICAL as I say
Mapping Approximation of Global Illumination Compute Applied by Lightprobe
« Last Edit: July 21, 2019, 10:56:42 AM by gimymblert » Logged

oahda
Level 10
*****



View Profile
« Reply #5512 on: August 12, 2019, 10:25:05 AM »



Adding (deferred) lights and shadows to my engine and happy not just because it's working but because I fully understand what I'm doing this time and more or less figured it out myself (but still looked at tutorials to confirm I was doing it right). Matrices are truly amazing. WTF

(not done yet, so there are only shadows in a limited area around the character right now)
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #5513 on: August 12, 2019, 10:27:32 AM »

I'm not sure what deferred lighting is, but it looks impressive! Great debug room aesthetic too Wink
Logged

oahda
Level 10
*****



View Profile
« Reply #5514 on: August 12, 2019, 10:39:23 AM »

Basically I'm adding all the lights and shadows in one single pass after I've already rendered all the 3D objects, kind of like applying a filter to a 2D image in an image editing program, which means I can have a lot of lights since I only have to calculate each light per pixel in the final rendered image instead of per every 3D object I render. c:



(the depth buffers are hard to see here because they look mostly red but in the GIF I've increased their contrast and put them as greyscale in the top-left corner so you can actually see)
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #5515 on: August 12, 2019, 11:04:10 AM »

Huh, neat! I'm guessing the reason is so that you can have as many objects as you want without lights being a significant performance limiter?
Logged

oahda
Level 10
*****



View Profile
« Reply #5516 on: August 12, 2019, 11:35:26 AM »

Yep! It scales super well already. c:
Logged

ProgramGamer
Administrator
Level 10
******


aka Mireille


View Profile
« Reply #5517 on: August 19, 2019, 03:14:09 PM »

Tangentially gamedev related; I discovered today that python has a module for loading DLLs and calling their functions, and it's actually super easy to use as long as you know what's in there! For instance, here's a python script that will open a window, wait five seconds, then close it as long as SDL2.dll is in the same directory!

Code:
from ctypes import *

# loading SDL2.dll
sdl2 = WinDLL("SDL2")

# creating bindings
sdl_init = sdl2.SDL_Init
sdl_init.argtypes = [c_int]
sdl_init.restype = c_int

sdl_quit = sdl2.SDL_Quit
sdl_quit.argtypes = []
sdl_quit.restype = None

sdl_create_window = sdl2.SDL_CreateWindow
sdl_create_window.argtypes = [c_char_p, c_int, c_int, c_int, c_int, c_uint32]
sdl_create_window.restype = c_void_p

sdl_destroy_window = sdl2.SDL_DestroyWindow
sdl_destroy_window.argtypes = [c_void_p]
sdl_destroy_window.restype = None

sdl_delay = sdl2.SDL_Delay
sdl_delay.argtypes = [c_int]
sdl_delay.restype = None

sdl_init_everything = 32

# running sdl stuff
sdl_init(sdl_init_everything)

window = sdl_create_window(b"SDL application window", 32, 32, 600, 400, 0)
sdl_delay(5000)
sdl_destroy_window(window)

sdl_quit()

More documentation here.
Logged

qMopey
Level 6
*


View Profile WWW
« Reply #5518 on: August 19, 2019, 03:43:43 PM »

If only calling Python from C was also as simple  Waaagh!
Logged
Daid
Level 3
***



View Profile
« Reply #5519 on: August 20, 2019, 01:24:05 AM »

If only calling Python from C was also as simple  Waaagh!
https://github.com/pybind/pybind11
Supports handing a python function to a std::function<> as callback, making calling back to python from C++ code quite easy.


Note that ctypes can be tricky, especially on windows. As you need to agree on a lot of things. For example, if you dll is 32bit and you have a 64bit python, types might not always be what you would expect (c_long in python isn't a long in C then), and then there is different calling conventions, structure padding differences between compilers.
It's also not a very fast way of calling functions, the overhead is quite big.
Logged

Software engineer by trade. Game development by hobby.
The Tribute Of Legends Devlog Co-op zelda.
EmptyEpsilon Free Co-op multiplayer spaceship simulator
Pages: 1 ... 274 275 [276] 277 278 279
Print
Jump to:  

Theme orange-lt created by panic