Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411469 Posts in 69368 Topics- by 58422 Members - Latest Member: daffodil_dev

April 23, 2024, 02:15:35 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)The happy programmer room
Pages: 1 ... 269 270 [271] 272 273 ... 279
Print
Author Topic: The happy programmer room  (Read 678220 times)
kason.xiv
Level 0
***


View Profile
« Reply #5400 on: February 13, 2019, 11:34:35 AM »

Wow that's pretty sly. I'll keep an eye out for suspicious behavior Cool
Logged
Crimsontide
Level 5
*****


View Profile
« Reply #5401 on: February 21, 2019, 06:50:01 AM »

AVX2 matrix multiplication working Smiley  Turned out much cleaner than I expected.

Code:
void OcclusionMap::MultiplyMatrix(const mat4x4f& s, const mat4x4f& t, __m256& st_AB, __m256& st_CD) noexcept {
// perfroms the matrix muliplication s x t (which is the transformation t applied, then s)

// init
const float* s_ptr = &(s[0][0]);
const float* t_ptr = &(t[0][0]);

// load matrix s
__m256 s_AB = _mm256_loadu_ps(s_ptr);
__m256 s_CD = _mm256_loadu_ps(s_ptr + 8);

// load and transpose matrix t
__m256i index0 = _mm256_setr_epi32(0, 4, 8, 12, 0, 4, 8, 12);
__m256i index1 = _mm256_setr_epi32(1, 5, 9, 13, 1, 5, 9, 13);
__m256i index2 = _mm256_setr_epi32(2, 6, 10, 14, 2, 6, 10, 14);
__m256i index3 = _mm256_setr_epi32(3, 7, 11, 15, 3, 7, 11, 15);

__m256 t_A0B0C0D0 = _mm256_i32gather_ps(t_ptr, index0, 4);
__m256 t_A1B1C1D1 = _mm256_i32gather_ps(t_ptr, index1, 4);
__m256 t_A2B2C2D2 = _mm256_i32gather_ps(t_ptr, index2, 4);
__m256 t_A3B3C3D3 = _mm256_i32gather_ps(t_ptr, index3, 4);

// perform dot products
__m256 st_A0_B0 = _mm256_dp_ps(s_AB, t_A0B0C0D0, 0xFF);
__m256 st_C0_D0 = _mm256_dp_ps(s_CD, t_A0B0C0D0, 0xFF);
__m256 st_A1_B1 = _mm256_dp_ps(s_AB, t_A1B1C1D1, 0xFF);
__m256 st_C1_D1 = _mm256_dp_ps(s_CD, t_A1B1C1D1, 0xFF);
__m256 st_A2_B2 = _mm256_dp_ps(s_AB, t_A2B2C2D2, 0xFF);
__m256 st_C2_D2 = _mm256_dp_ps(s_CD, t_A2B2C2D2, 0xFF);
__m256 st_A3_B3 = _mm256_dp_ps(s_AB, t_A3B3C3D3, 0xFF);
__m256 st_C3_D3 = _mm256_dp_ps(s_CD, t_A3B3C3D3, 0xFF);

// gather results
__m256 st_A0A1_B0B1 = _mm256_unpacklo_ps(st_A0_B0, st_A1_B1);
__m256 st_A2A3_B2B3 = _mm256_unpacklo_ps(st_A2_B2, st_A3_B3);
__m256 st_C0C1_D0D1 = _mm256_unpacklo_ps(st_C0_D0, st_C1_D1);
__m256 st_C2C3_D2D3 = _mm256_unpacklo_ps(st_C2_D2, st_C3_D3);

st_AB = _mm256_shuffle_ps(st_A0A1_B0B1, st_A2A3_B2B3, _MM_SHUFFLE(1, 0, 1, 0));
st_CD = _mm256_shuffle_ps(st_C0C1_D0D1, st_C2C3_D2D3, _MM_SHUFFLE(1, 0, 1, 0));
}
Logged
Schrompf
Level 9
****

C++ professional, game dev sparetime


View Profile WWW
« Reply #5402 on: February 21, 2019, 11:45:53 PM »

Wow, nicely done. How much did it yield?

Because I suspect that a pathological case like this (exactly known loop counts, MADs all over the place, operation count always multiply of 4) is an easy target for compiler's auto vectorizers, and I can imagine this code of yours always has been AVX2 already.
Logged

Snake World, multiplayer worm eats stuff and grows DevLog
Crimsontide
Level 5
*****


View Profile
« Reply #5403 on: February 22, 2019, 05:30:30 AM »

I haven't benchmarked it yet, but I agree that it would be interesting to compare the compilers code vs mine.

Its part of a full software occlusion culling class.  Still need to test/debug more parts of it.  I'd love it to be finished by the end of the week, but I'm not sure.
Logged
Ordnas
Level 10
*****



View Profile WWW
« Reply #5404 on: February 26, 2019, 02:17:44 AM »

Crimsontide, is that code part of your custom engine?
Logged

Games:

Crimsontide
Level 5
*****


View Profile
« Reply #5405 on: February 26, 2019, 03:22:49 AM »

It will be, why?
Logged
Ordnas
Level 10
*****



View Profile WWW
« Reply #5406 on: February 27, 2019, 01:33:45 AM »

I am fan of custom engines, I read books about engine architectures and graphics programming, so I always congratulate with programmers that built their engine from scratch  Grin
Logged

Games:

oahda
Level 10
*****



View Profile
« Reply #5407 on: February 27, 2019, 02:02:56 AM »

Had great success yesterday slapping Recast + Detour into my engine and generating a navmesh as well as a path between points on it, all in a couple of hours!

Physics collision mesh:



Navmesh:



Points to travel between:



Calculated path:

« Last Edit: February 27, 2019, 04:14:21 AM by Prinsessa » Logged

kason.xiv
Level 0
***


View Profile
« Reply #5408 on: February 27, 2019, 01:33:12 PM »

ooo definitely giving this repo a star. gotta check this out later.
Logged
oahda
Level 10
*****



View Profile
« Reply #5409 on: February 27, 2019, 11:16:13 PM »

Only difficulty was figuring out how to use it but following this forum post and cross-referencing the docs I got it working!
Logged

Ordnas
Level 10
*****



View Profile WWW
« Reply #5410 on: February 28, 2019, 12:59:12 AM »

Cool Prinsessa! Will we see you working in Unity or Epic in the next year?  Grin
Logged

Games:

oahda
Level 10
*****



View Profile
« Reply #5411 on: February 28, 2019, 02:35:58 AM »

They'd probably want me to implement it myself instead of slapping on a library. Tongue
Logged

Crimsontide
Level 5
*****


View Profile
« Reply #5412 on: February 28, 2019, 03:32:07 PM »

The 'rendering' portion of a hierarchical software occlusion culling solution I've been working on the last few weeks is now up and running.  Still need to test the downsampling and occlusion testing portion of the code, but the software renderer was always going to be the hardest part, and where performance mattered most; so I'm happy with how it turned out.  All done with AVX2 intrinsics so I can process 8 pixels at a time Smiley

Sadly no benchmarks at this time.
Logged
oahda
Level 10
*****



View Profile
« Reply #5413 on: March 02, 2019, 01:14:51 PM »

Sounds cool! I'm going to have to tackle occlusion culling eventually too. I've never done it, but I've read up and watched video tutorials so I basically know how it works. Shocked
Logged

Crimsontide
Level 5
*****


View Profile
« Reply #5414 on: March 02, 2019, 09:03:30 PM »

If you have any questions I'd be happy to help. 

The technique I used came from this page: https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/

Its a really good write up and an easy read even if you don't intend to implement it.  He also has a series called 'trip through the graphics pipeline' on his blog that should be a 'must read' for any aspiring game programmer.

You could also just download the intel code from github at the bottom of the page too.
Logged
Ordnas
Level 10
*****



View Profile WWW
« Reply #5415 on: March 04, 2019, 10:01:37 AM »

Sounds cool! I'm going to have to tackle occlusion culling eventually too. I've never done it, but I've read up and watched video tutorials so I basically know how it works. Shocked

Yeah, the important thing is to understand how it works, them implementing that in your engine is should not be too hard if you grasped the basics.
Logged

Games:

Guntha
Level 0
***


View Profile WWW
« Reply #5416 on: March 13, 2019, 04:35:37 AM »

Hello,

I finished the bulk of an archive format for the levels of my game Fire Exit (sorry, no devlog here yet :/).

The goal was to be able to compile several levels into one single, nice archive, that regroups every resource (texture, sounds, meshes, animations...) used for any level at the top, and every data specific to each level after that.

That allows me to:
=>Make every necessary allocation only when loading the archive: the "maximum" number and size of each asset type is computed and saved in the header when compiling, allowing me to allocate enough memory for any level in the archive.
=>When going from one level to another: I can see which resources I need for the next level; if I had loaded them for the previous level, I just keep them, if I need anything that wasn't loaded yet, I load only these.

And it kind of works:



My next step is to encode the "meta-game" in this archive (the world map where a level can be selected, how each level is unlocked... Things that aren't implemented in the game yet x)).
Logged

Crimsontide
Level 5
*****


View Profile
« Reply #5417 on: March 14, 2019, 11:53:56 AM »

I like the little stick men running around.  I'm guessing its some sort of puzzle game where they have to make it to the spaceship before the fire burns it all down?
Logged
oahda
Level 10
*****



View Profile
« Reply #5418 on: March 14, 2019, 12:06:26 PM »

Nice!
Logged

Guntha
Level 0
***


View Profile WWW
« Reply #5419 on: March 14, 2019, 12:34:08 PM »

Thanks!

Actually, instead of guessing, there is a playable single-level prototype from not too long ago:

https://lhuillia.iiens.net/wejv8/livrables/FireExit.zip

Sorry, the "readme" file is only in French; here's how to play: click or drag with the left mouse button to issue an order, use left Ctrl to switch to the next "tool" (there are 3), some orders can be cancelled when clicking with the right mouse button on a tile. You can also zoom in/out with the mouse wheel, but in this prototype the whole level fits on the screen.

After making that post, I realized the "load only what's needed" part wasn't really working ^^' You can notice a small freeze when switching from level 1 to 0, where really it should be instantaneous because they're using the same assets, one of them only uses a few less.
Logged

Pages: 1 ... 269 270 [271] 272 273 ... 279
Print
Jump to:  

Theme orange-lt created by panic