Upcoming ranges and an issue with them...

Crimsontide

Level 5

« on: December 03, 2018, 10:32:59 PM »

So with C++20 (or something like that) it looks like we will finally get ranges. I'm not a huge fan of all the new stuff in C++, but this is one thing I'll be happy to see standardized. The current implementation is here: https://github.com/ericniebler/range-v3 and it has links to all the relevant papers/docs/etc...

Back in the day when the D programming language first came out, Andrei Alexandrescu made his video/talk about ranges vs iterators and I was inspired. I tried implementing some of the STL around ranges, just to see how it felt. There were advantages, disadvantages, and after a few months I sort of left it at that. An idea for another time... But after implementing custom containers and algorithms for my own uses over the years, I've come to realize both iterators and ranges are missing an important 'piece of the pie'.

Both iterators and ranges represent, from a theoretical/virtual perspective, a linear view over a 'range' or 'list' of items. But this isn't necessarily how they are stored in memory. If the range/iterators comes from a std::vector then the underlying items are continuous, if its linked list they are not, a tree may have small sections of continuous items, a circular buffer may be 1 or 2 continuous sections, etc...

So... why does this notion of 'continuous items' even matter? Simple, I think it forms a true 'basis' for abstract algorithms (ie. algorithms that can work on any container). Ranges and iterators are nice from a programmer/theoretical perspective, but they are not nice from a performance perspective. For example copying a string character by character is going to be slower than simply using strcpy or memcpy, because the latter can work on continuous chunks of data (keeps cache happy) and can work with chunks of data that better work with the underlying hardware (copying 64 bits at a time rather than 8 for example). Iterators and ranges throw this notion away, of continuous chunks of data, which in turn throws away any chances for having both abstract and high performance algorithms.

I think we need another notion of, or an extension to, ranges/iterators, I'm going to call them intervals (but the name isn't important, we simply need a name for a physically continuous sequence of items). An interval would simply be a pointer [begin, end) pair or something similar (pointer/count). The idea is each interval represents a *physically continuous sequence of one or more items, and a range then would consist of zero or more intervals. This abstraction would give us near maximum performance for any container, simple because all memory 'under the hood' on current systems is linear. Every single container we write, arrays 1d/2d/3d/4d, maps, graphs, dequeues, lists, sets, trees, tables, you name it, under the hood has to be mapped to zero or more intervals. What I'm trying to get at is that this idea of intervals isn't just some abstract idea, there's actually a physical basis to it, and why I feel it is a more optimal, if not the optimal, way to represent abstract container traversal.

Now adding intervals to iterators doesn't work, they are very different, but with the new addition of ranges, its easy to see how a range could 'wrap' or contain intervals. So a range would iterate over consecutive intervals, and then intervals iterate over items. For the most part it would require minimal or no syntactical changes. 'Tacking on' intervals after the fact would be near impossible though.

So why am I posting this here... This is important enough, I feel, that this idea should be considered prior to this becoming part of the standard. There's a significant amount of performance that can be gained here, with minimal or no additional work on the programmers part if this notion of intervals is added in from the beginning (in fact I found it easier to implement container traversal over intervals than ranges or iterators, simply because under the hood that's the way the memory is already ordered). That said I don't know how to tell them, or even if I should bother.

Am I completely out to lunch here? Is the idea of intervals simply stupid? Should I try? Is it too late? Is this simply a cathartic post for an overactive mind?

* I know I said 'physically continuous' a few times above. I am well aware that due to virtual memory mapping and such, that it might be a bit of a misnomer and that consecutive memory addresses may not be physically adjacent. That said, for all intents and purposes, both from a theoretical and performance standpoint, they might as well be; so I don't feel that's an important distinction, at least in the context of iterators/ranges/intervals.


	Logged

Schrompf

Level 9

C++ professional, game dev sparetime

Re: Upcoming ranges and an issue with them...

« Reply #1 on: December 03, 2018, 11:41:32 PM »

What does this gain us? In daily C++ work, you either have sort of an ArrayRef - contiguous set of elements - or you have an iterator pair. The latter is cumbersome to work with, I never really liked that part of the STL, but it suits your requirement to represent any sequence of elements, whatever the storage.

Now if I understand your Sequence-Of-Intervals concept correctly, you want to take advantage of the fact that in some containers, some parts are contiguous in memory. The only example I can think of are the hash maps. I don't think you can gain a lot there. To my knowledge the compiler already combines reads and writes if the underlying type allows, so a for() copy loop will already be replaced by a memcpy if the type is std::trivially_copyable. What would your intervals concept gain us over this?


	Logged

Snake World, multiplayer worm eats stuff and grows DevLog

Ordnas

Level 10

Re: Upcoming ranges and an issue with them...

« Reply #2 on: December 04, 2018, 01:13:07 AM »

If it will be add in C++20, probably someone asked for that and it will be use in some scenario.


	Logged

Games:

Daid

Level 3

Re: Upcoming ranges and an issue with them...

« Reply #3 on: December 04, 2018, 01:44:27 AM »

a) Drop the name. "Interval" has a horrible name.

b) Having algorithms using memmove/memcpy possibly bypasses constructors/assignment operators. Making things even more complex, as then they only work on POD data types.

c)

Quote

Iterators and ranges throw this notion away, of continuous chunks of data, which in turn throws away any chances for having both abstract and high performance algorithms.

I think you are overfitting your data. The byte arrays is an edge case where this could potentually be useful. But that's about it, anywhere else, it makes thing unneededly convuluted and complex. Also, what are the odds that your compile isn't already optimizing things somewhere without you know it?


	Logged

Software engineer by trade. Game development by hobby.
The Tribute Of Legends Devlog Co-op zelda.
EmptyEpsilon Free Co-op multiplayer spaceship simulator

Crimsontide

Level 5

Re: Upcoming ranges and an issue with them...

« Reply #4 on: December 04, 2018, 02:33:25 AM »

Quote from: Schrompf on December 03, 2018, 11:41:32 PM

What does this gain us?

Performance.

Quote

In daily C++ work, you either have sort of an ArrayRef - contiguous set of elements - or you have an iterator pair. The latter is cumbersome to work with, I never really liked that part of the STL, but it suits your requirement to represent any sequence of elements, whatever the storage.

From a purely theoretical standpoint iterators and ranges are fine. From a performance one they aren't.

Quote

Now if I understand your Sequence-Of-Intervals concept correctly, you want to take advantage of the fact that in some containers, some parts are contiguous in memory.

Yes

Quote

The only example I can think of are the hash maps. I don't think you can gain a lot there.

Dequeues/circular queues, 2d/3d/4d arrays in particular if they have padding, textures/sprites/images, trees (its not uncommon to have tree containers where groups of items are is small 'chunks'), graphs.

Quote

To my knowledge the compiler already combines reads and writes if the underlying type allows, so a for() copy loop will already be replaced by a memcpy if the type is std::trivially_copyable. What would your intervals concept gain us over this?

Only when the iterators are trivial (ie. pointers or simple pointers around wrappers), which is often not the case. Intervals would ensure that does happen (read/write combines) since intervals are defined to be simple/trivial, any complexity (wrapping, jumping past sentinels, traversing linked lists/trees/graphs, etc...) is part of the range. The separation between traversal of intervals (which a range handles) and traversal of items (which intervals handle) ensures the compiler can maximize performance.


	Logged

Crimsontide

Level 5

Re: Upcoming ranges and an issue with them...

« Reply #5 on: December 04, 2018, 02:46:36 AM »

Quote from: Daid on December 04, 2018, 01:44:27 AM

a) Drop the name. "Interval" has a horrible name.

Any suggestions?

Quote

b) Having algorithms using memmove/memcpy possibly bypasses constructors/assignment operators. Making things even more complex, as then they only work on POD data types.

Almost all implementations of the standard library functions uses templates and type traits to revert to memcpy and/or similar when appropriate. Of course no one's suggesting using similar when they types used aren't POD. This just saves a lot of hassle since that would rarely be required with intervals.

Quote

c)
I think you are overfitting your data. The byte arrays is an edge case where this could potentually be useful. But that's about it, anywhere else, it makes thing unneededly convuluted and complex. Also, what are the odds that your compile isn't already optimizing things somewhere without you know it?

Actually I came across this idea when implementing a flat dequeue. It made things really easier implementation-wise. I then began to apply it to other containers and realized... it just works. It fits. You can take any data structure (linear or otherwise), reduce it to intervals, and then process it easily as with iterators, with all the performance of a custom loop.


	Logged

Pages: [1]

« previous next »