kason.xiv
|
|
« Reply #5400 on: February 13, 2019, 11:34:35 AM » |
|
Wow that's pretty sly. I'll keep an eye out for suspicious behavior
|
|
|
Logged
|
|
|
|
Crimsontide
|
|
« Reply #5401 on: February 21, 2019, 06:50:01 AM » |
|
AVX2 matrix multiplication working Turned out much cleaner than I expected. void OcclusionMap::MultiplyMatrix(const mat4x4f& s, const mat4x4f& t, __m256& st_AB, __m256& st_CD) noexcept { // perfroms the matrix muliplication s x t (which is the transformation t applied, then s)
// init const float* s_ptr = &(s[0][0]); const float* t_ptr = &(t[0][0]);
// load matrix s __m256 s_AB = _mm256_loadu_ps(s_ptr); __m256 s_CD = _mm256_loadu_ps(s_ptr + 8);
// load and transpose matrix t __m256i index0 = _mm256_setr_epi32(0, 4, 8, 12, 0, 4, 8, 12); __m256i index1 = _mm256_setr_epi32(1, 5, 9, 13, 1, 5, 9, 13); __m256i index2 = _mm256_setr_epi32(2, 6, 10, 14, 2, 6, 10, 14); __m256i index3 = _mm256_setr_epi32(3, 7, 11, 15, 3, 7, 11, 15);
__m256 t_A0B0C0D0 = _mm256_i32gather_ps(t_ptr, index0, 4); __m256 t_A1B1C1D1 = _mm256_i32gather_ps(t_ptr, index1, 4); __m256 t_A2B2C2D2 = _mm256_i32gather_ps(t_ptr, index2, 4); __m256 t_A3B3C3D3 = _mm256_i32gather_ps(t_ptr, index3, 4);
// perform dot products __m256 st_A0_B0 = _mm256_dp_ps(s_AB, t_A0B0C0D0, 0xFF); __m256 st_C0_D0 = _mm256_dp_ps(s_CD, t_A0B0C0D0, 0xFF); __m256 st_A1_B1 = _mm256_dp_ps(s_AB, t_A1B1C1D1, 0xFF); __m256 st_C1_D1 = _mm256_dp_ps(s_CD, t_A1B1C1D1, 0xFF); __m256 st_A2_B2 = _mm256_dp_ps(s_AB, t_A2B2C2D2, 0xFF); __m256 st_C2_D2 = _mm256_dp_ps(s_CD, t_A2B2C2D2, 0xFF); __m256 st_A3_B3 = _mm256_dp_ps(s_AB, t_A3B3C3D3, 0xFF); __m256 st_C3_D3 = _mm256_dp_ps(s_CD, t_A3B3C3D3, 0xFF);
// gather results __m256 st_A0A1_B0B1 = _mm256_unpacklo_ps(st_A0_B0, st_A1_B1); __m256 st_A2A3_B2B3 = _mm256_unpacklo_ps(st_A2_B2, st_A3_B3); __m256 st_C0C1_D0D1 = _mm256_unpacklo_ps(st_C0_D0, st_C1_D1); __m256 st_C2C3_D2D3 = _mm256_unpacklo_ps(st_C2_D2, st_C3_D3);
st_AB = _mm256_shuffle_ps(st_A0A1_B0B1, st_A2A3_B2B3, _MM_SHUFFLE(1, 0, 1, 0)); st_CD = _mm256_shuffle_ps(st_C0C1_D0D1, st_C2C3_D2D3, _MM_SHUFFLE(1, 0, 1, 0)); }
|
|
|
Logged
|
|
|
|
Schrompf
|
|
« Reply #5402 on: February 21, 2019, 11:45:53 PM » |
|
Wow, nicely done. How much did it yield?
Because I suspect that a pathological case like this (exactly known loop counts, MADs all over the place, operation count always multiply of 4) is an easy target for compiler's auto vectorizers, and I can imagine this code of yours always has been AVX2 already.
|
|
|
Logged
|
Snake World, multiplayer worm eats stuff and grows DevLog
|
|
|
Crimsontide
|
|
« Reply #5403 on: February 22, 2019, 05:30:30 AM » |
|
I haven't benchmarked it yet, but I agree that it would be interesting to compare the compilers code vs mine.
Its part of a full software occlusion culling class. Still need to test/debug more parts of it. I'd love it to be finished by the end of the week, but I'm not sure.
|
|
|
Logged
|
|
|
|
Ordnas
|
|
« Reply #5404 on: February 26, 2019, 02:17:44 AM » |
|
Crimsontide, is that code part of your custom engine?
|
|
|
Logged
|
Games:
|
|
|
Crimsontide
|
|
« Reply #5405 on: February 26, 2019, 03:22:49 AM » |
|
It will be, why?
|
|
|
Logged
|
|
|
|
|
oahda
|
|
« Reply #5407 on: February 27, 2019, 02:02:56 AM » |
|
Had great success yesterday slapping Recast + Detour into my engine and generating a navmesh as well as a path between points on it, all in a couple of hours! Physics collision mesh: Navmesh: Points to travel between: Calculated path:
|
|
« Last Edit: February 27, 2019, 04:14:21 AM by Prinsessa »
|
Logged
|
|
|
|
kason.xiv
|
|
« Reply #5408 on: February 27, 2019, 01:33:12 PM » |
|
ooo definitely giving this repo a star. gotta check this out later.
|
|
|
Logged
|
|
|
|
oahda
|
|
« Reply #5409 on: February 27, 2019, 11:16:13 PM » |
|
Only difficulty was figuring out how to use it but following this forum post and cross-referencing the docs I got it working!
|
|
|
Logged
|
|
|
|
|
oahda
|
|
« Reply #5411 on: February 28, 2019, 02:35:58 AM » |
|
They'd probably want me to implement it myself instead of slapping on a library.
|
|
|
Logged
|
|
|
|
Crimsontide
|
|
« Reply #5412 on: February 28, 2019, 03:32:07 PM » |
|
The 'rendering' portion of a hierarchical software occlusion culling solution I've been working on the last few weeks is now up and running. Still need to test the downsampling and occlusion testing portion of the code, but the software renderer was always going to be the hardest part, and where performance mattered most; so I'm happy with how it turned out. All done with AVX2 intrinsics so I can process 8 pixels at a time Sadly no benchmarks at this time.
|
|
|
Logged
|
|
|
|
oahda
|
|
« Reply #5413 on: March 02, 2019, 01:14:51 PM » |
|
Sounds cool! I'm going to have to tackle occlusion culling eventually too. I've never done it, but I've read up and watched video tutorials so I basically know how it works.
|
|
|
Logged
|
|
|
|
Crimsontide
|
|
« Reply #5414 on: March 02, 2019, 09:03:30 PM » |
|
If you have any questions I'd be happy to help. The technique I used came from this page: https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/Its a really good write up and an easy read even if you don't intend to implement it. He also has a series called 'trip through the graphics pipeline' on his blog that should be a 'must read' for any aspiring game programmer. You could also just download the intel code from github at the bottom of the page too.
|
|
|
Logged
|
|
|
|
Ordnas
|
|
« Reply #5415 on: March 04, 2019, 10:01:37 AM » |
|
Sounds cool! I'm going to have to tackle occlusion culling eventually too. I've never done it, but I've read up and watched video tutorials so I basically know how it works. Yeah, the important thing is to understand how it works, them implementing that in your engine is should not be too hard if you grasped the basics.
|
|
|
Logged
|
Games:
|
|
|
Guntha
|
|
« Reply #5416 on: March 13, 2019, 04:35:37 AM » |
|
Hello, I finished the bulk of an archive format for the levels of my game Fire Exit (sorry, no devlog here yet :/). The goal was to be able to compile several levels into one single, nice archive, that regroups every resource (texture, sounds, meshes, animations...) used for any level at the top, and every data specific to each level after that. That allows me to: =>Make every necessary allocation only when loading the archive: the "maximum" number and size of each asset type is computed and saved in the header when compiling, allowing me to allocate enough memory for any level in the archive. =>When going from one level to another: I can see which resources I need for the next level; if I had loaded them for the previous level, I just keep them, if I need anything that wasn't loaded yet, I load only these. And it kind of works: My next step is to encode the "meta-game" in this archive (the world map where a level can be selected, how each level is unlocked... Things that aren't implemented in the game yet x)).
|
|
|
Logged
|
|
|
|
Crimsontide
|
|
« Reply #5417 on: March 14, 2019, 11:53:56 AM » |
|
I like the little stick men running around. I'm guessing its some sort of puzzle game where they have to make it to the spaceship before the fire burns it all down?
|
|
|
Logged
|
|
|
|
oahda
|
|
« Reply #5418 on: March 14, 2019, 12:06:26 PM » |
|
Nice!
|
|
|
Logged
|
|
|
|
Guntha
|
|
« Reply #5419 on: March 14, 2019, 12:34:08 PM » |
|
Thanks! Actually, instead of guessing, there is a playable single-level prototype from not too long ago: https://lhuillia.iiens.net/wejv8/livrables/FireExit.zipSorry, the "readme" file is only in French; here's how to play: click or drag with the left mouse button to issue an order, use left Ctrl to switch to the next "tool" (there are 3), some orders can be cancelled when clicking with the right mouse button on a tile. You can also zoom in/out with the mouse wheel, but in this prototype the whole level fits on the screen.After making that post, I realized the "load only what's needed" part wasn't really working ^^' You can notice a small freeze when switching from level 1 to 0, where really it should be instantaneous because they're using the same assets, one of them only uses a few less.
|
|
|
Logged
|
|
|
|
|