Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411722 Posts in 69402 Topics- by 58455 Members - Latest Member: Sergei

May 22, 2024, 07:37:03 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Procedurally Generated Compression
Pages: 1 ... 4 5 [6] 7
Print
Author Topic: Procedurally Generated Compression  (Read 18077 times)
st33d
Guest
« Reply #100 on: June 03, 2010, 08:56:34 AM »

I'm guessing the decompressor will be so large as contain a set of keys for all possible data arrangements. Therefore the compressed file will need no size at all other than its key, which can be used to retrieve the data.

Or am I thinking of Rapidshare?
Logged
Gnarf
Guest
« Reply #101 on: June 03, 2010, 08:59:41 AM »

Eh.

At the moment, I'm trying to find how many bits it would take to store one value from 0-63, another 0-62, another 0-61, another 0-60 ... 0-5, 0-4, 0-3, 0-2, 0-1 in one value.

0-1 -> 1 bit
0-2 ... 0-3 -> 2 bits per value -> 4 bits
0-4 ... 0-7 -> 3 bits per value -> 12 bits
0-8 ... 0-15 -> 4 bits per value -> 28 bits
0-16 ... 0-31 -> 5 bits per value -> 75 bits
0-32 ... 0-63 -> 6 bits per value -> 186 bits

That would be 306 bits or so.
Logged
Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #102 on: June 03, 2010, 09:33:40 AM »

@Gnarf

The thing is, a lot of those values will be wasting space - I'm trying to consider doing things like storing (...)

Yeah, that comes back to the 64!, which is too large to store.


Alright, assuming a library of 64 values of 1-16 has the most variation of contents possible (4 of each value), X different arrangements will result in all the groups of four having all values adjacent to each other.

Must solve for X.

16! different ways to order the groups of four, 4! different ways the individual four values can fit into those slots. 16!*4! is ~2^49. 2^296/2^49 is 2^247. With this maximum variation, 32 bytes would be reduced to 16. 247 bits is 61 bytes in itself; the value noting the arrangement used should not exceed a storage space of 14 bytes. The compression should still be very effective, but this would be what prevents it from becoming something magical.

One of the things that would discourage chunks of data from becoming uncompressable too soon would be that in each pass, the data would be stored in a slightly different format (but still the same size, not counting the chunk header) so that the next pass would have sparkly-new contents to work with in place of the old maddening ones.

@st33d
The arrangement would be derived from the value via an algorithm, not some sort of table.
Logged
shrimp
Level 5
*****


View Profile WWW
« Reply #103 on: June 03, 2010, 09:53:15 AM »

The only thing that's foolish is not taking the time to understand the proofs available of why you can't have a general purpose compression algorithm that can compress every possible file down to 36 bytes. The mathetmatical proof isn't too hard to understand; the beauty of a good proof is that they are simple enough that you can be certain there are no loopholes.

This needs quoting more. Exploring new ideas for compression is not silly. What is silly is completely ignoring/not understanding the 20+ extremely well-explained and painstakingly written explainations of the 1 or 2 fundamental reasons (Shannon entropy and maybe Pigeonhole principle) why you can't compress data beyond a certain limit.

Einstein didn't invent relativity by not understanding/ignoring physics, he stood on the shoulders of giants (Newton).

  • Have you written a decompressor which works for all files? I'm suprised you're not writing the compressor and decompressor in parallel. I feel like I'm being trolled just asking.
  • What did Shannon get wrong when formulating the concept of entropy?
Logged

Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #104 on: June 03, 2010, 10:24:24 AM »

Have you written a decompressor which works for all files? I'm suprised you're not writing the compressor and decompressor in parallel. I feel like I'm being trolled just asking.

Why write the decompression code before a file exists to decompress? I'm still working at the compression.

What did Shannon get wrong when formulating the concept of entropy?

He mistook nothing. It's that the entire purpose of the algorithm is to eliminate or at least minimize entropy.
Logged
Overkill
Level 3
***


Andrew G. Crowell


View Profile WWW
« Reply #105 on: June 03, 2010, 11:11:17 AM »

Your logic continues to astound me. What don't you understand about these proofs that people are throwing you? We are at least remotely aware of what you're trying to do. We don't need specifics, because it is impossible by very simple mathematics.

Rearranging the order of bits is still a function of an input string. You are mapping input strings of some order, to output strings of some other ordering. That is your function. You have not sidestepped anything here.

Also, composing two compression functions f and g together is yet another function h(x) = f o g = f(g(x)). Not only are the individual functions subject to not being able to losslessly compress all strings to a length smaller than the original input, this follows for their functional composition. In the best case, you may actually "double compress". But in other cases, the different lossless compression functions will undo each other's work, or, even worse, both compressions fail to lower the size (and you are left with a much larger string than originally).


I think if you actually bothered to write your decompressor at the same time, you would realize this fact a lot faster.

And if helps you, the word "compression" is a bit of a misnomer when it comes to lossless algorithms. Lossless algorithms can at most remove SOME redundancy from the data by reencoding it into a different string. That chunk might be smaller, but there are necessarily instances where this is not the case, because a lossless algorithm by definition cannot lose the data it cannot optimally rearrange. So these failed strings must necessarily be larger or equal to the size of their original input.

If you are attempting lossy compression, then you are able to smash anything into smaller bits, at the expense of not being able to reverse it to the exact input you started with. But as far as I know you're not trying to do that. You are trying to make a lossless algorithm.
« Last Edit: June 03, 2010, 07:29:32 PM by Overkill » Logged

shrimp
Level 5
*****


View Profile WWW
« Reply #106 on: June 03, 2010, 11:23:49 AM »

Right.
Logged

bateleur
Level 10
*****



View Profile
« Reply #107 on: June 03, 2010, 11:27:49 AM »

I hope there are no trolls reading this thread, because this is unintentionally a perfect tutorial in how to troll maths geeks! Big Laff
Logged

Kekskiller
Guest
« Reply #108 on: June 03, 2010, 11:33:19 AM »

Math sucks.
Logged
Glaiel-Gamer
Guest
« Reply #109 on: June 03, 2010, 11:38:55 AM »

nah the best way to troll math geeks is to argue that .99999... != 1
Logged
shrimp
Level 5
*****


View Profile WWW
« Reply #110 on: June 03, 2010, 12:01:09 PM »

Lots more here: http://en.wikipedia.org/wiki/Pseudomathematics

See third paragraph in section headed "Current trends in pseudomathematics"....
Logged

David Pittman
Level 2
**


MAEK GAEM


View Profile WWW
« Reply #111 on: June 03, 2010, 12:23:25 PM »

Why write the decompression code before a file exists to decompress? I'm still working at the compression.

Because you won't know the compression is working (outside of simple test cases you can verify manually in a hex editor) until you can decompress it to its original state.
Logged

Zaratustra
Level 7
**



View Profile WWW
« Reply #112 on: June 03, 2010, 12:25:33 PM »


He mistook nothing. It's that the entire purpose of the algorithm is to eliminate or at least minimize entropy.

You can't reduce the entropy of a file without removing information from it. What you are doing is shuffle all the file's entropic content into a different representation, which may be larger or smaller than the original file.

Let me see if I understand: This algorithm takes blocks of 64 bytes, sorts the bytes into groups of the same value (possibly from smallest to largest), then stores the groups, RLE compressed (something between 2 and 80 bytes per block), along with the data used to shuffle them (296 bits, or 37 bytes).

This algorithm will obviously do no compression if there are no repeated bytes in the block. Furthermore, Since the position information takes 37 bytes, it needs to be that much more efficient than the regular RLE algorithm.

The only possible case I can envision this happening was a block that contained something like:

ABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABAB

yes, that stores horribly in RLE. Which is why most compressors do not use RLE.



Logged

David Pittman
Level 2
**


MAEK GAEM


View Profile WWW
« Reply #113 on: June 03, 2010, 12:26:17 PM »

nah the best way to troll math geeks is to argue that .99999... != 1

It's not. 1 - .99999... == .00000...1. PROVE'D!!
Logged

Kitoari
Level 0
**


View Profile
« Reply #114 on: June 03, 2010, 01:30:33 PM »

I hope there are no trolls reading this thread, because this is unintentionally a perfect tutorial in how to troll maths geeks! Big Laff


It's also a comedy that has a lot of genius bonuses, but those are not required to understand the humor on a basic level. :weneedapopcorneatingsmiley:

Also, the topic will probably die now because I linked to tvtropes.   Concerned
Logged
Tycho Brahe
Level 10
*****

λx.x


View Profile
« Reply #115 on: June 03, 2010, 01:54:25 PM »

Math sucks.
Burn in hell
nah the best way to troll math geeks is to argue that .99999... != 1
And you also
Why write the decompression code before a file exists to decompress? I'm still working at the compression.
Is probably a good idea to try and tackle both the problems at once, in case any algorithm you create which compresses produces files which are un-decompressable

To use a common simplification found earlier in this thread, the algorithm which reduces a file to 0 bytes looks fine on paper, and the compressor works, but any decompresser which tries to work on these files will fail, a problem with the compression, but one that one might not pick up on until one has finished the compressor.

(also, does tigsource support any of those mathematical notation thingies?)
Logged
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #116 on: June 03, 2010, 01:56:41 PM »

Also, the topic will probably die now because I linked to tvtropes.   Concerned

Dude, I just spent two hours trying to get off that site!

And yes, this is good troll material, but I consider my own contributions worth it now we get the payback of madk slowly but inevitably eating his words. What's that? You fell foul of the exact problem I warned you about? But you "think you can fix it". Please, do go on..
Logged
Tycho Brahe
Level 10
*****

λx.x


View Profile
« Reply #117 on: June 03, 2010, 02:00:11 PM »

@BorisTheBrave

The only possible response, I'm sorry.
Logged
Dacke
Level 10
*****



View Profile
« Reply #118 on: June 03, 2010, 02:13:54 PM »

And yes, this is good troll material, but I consider my own contributions worth it now we get the payback of madk slowly but inevitably eating his words. What's that? You fell foul of the exact problem I warned you about? But you "think you can fix it". Please, do go on..

That's how you learn stuff. Trial and error if fun.

Overconfidence can also be a good thing, if it means you are courageous enough to tackle seemingly impossible challenges.

I see no need for ridicule.

You knew better to begin with? So what? Don't be jerk about it. Just because you disliked _Madk's tone doesn't mean you have to use a bad tone yourself.

This goes for everyone who's been acting like smug jerks here. That's not a good way to encourage people to learn new things and try new ideas. Nor will it help _Madk to better himself, in any way.
Logged

programming • free software
animal liberation • veganism
anarcho-communism • intersectionality • feminism
Tycho Brahe
Level 10
*****

λx.x


View Profile
« Reply #119 on: June 03, 2010, 02:19:01 PM »

Not to want to quote a batman film, but:
Quote from: Thomas wayne
Why do we fall [bruce]?  So we can learn to pick ourselves up.
Logged
Pages: 1 ... 4 5 [6] 7
Print
Jump to:  

Theme orange-lt created by panic