Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411586 Posts in 69386 Topics- by 58445 Members - Latest Member: Mansreign

May 06, 2024, 06:34:53 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Procedurally Generated Compression
Pages: 1 [2] 3 4 ... 7
Print
Author Topic: Procedurally Generated Compression  (Read 18053 times)
Eraser
Guest
« Reply #20 on: May 31, 2010, 04:38:13 PM »

I say go for it. Even if it doesn't work, or works only partially, it's one heck of a good learning exercise. I'm very interested to see how your algorithm(s) turn out, compression is something I've always been interested in, but too scared of to tinker with Wink
Logged
salade
Level 4
****



View Profile
« Reply #21 on: May 31, 2010, 05:06:39 PM »

The compression algorithm then looks up the file on the megaserver and reduces the file to its index.
Considering there's way less than 1.16x10^77 files in the world (and won't ever really get close to that), this algorithm will compress every single file people use into 16 bytes.

A system where files are specified to be downloaded by their locations on a system of servers? I think you're on on to something...  Wink
Logged
LemonScented
Level 7
**



View Profile
« Reply #22 on: May 31, 2010, 05:50:48 PM »

_Madk,

I think you'll fail. I've played with compression myself, and that rabbit hole goes way, way deeper than I could have imagined. If your idea is so simple and bomb-proof as you make it sound, someone else would have done it by now - and if you're really good enough to make it work now, when nobody else in history has been able to, why aren't you already posting the completed solution here already? Or patenting it and eyeing up which island you're going to buy with the riches?

Also, if you don't have a working decompressor, you don't have anything at all. None of the compression stuff is interesting at all unless you have proof that you can decompress it too. That's why you're being mocked by some people here.

That said, I wouldn't want to discourage you from trying. This stuff is fun, and educational!

But yeah, a post about every minor issue and its resolution is a bit much. You should be able to put the whole thing together with a reasonable test case and either have it work, or understand why it doesn't, in an evening or two.
Logged

Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #23 on: May 31, 2010, 06:08:32 PM »

I've thoroughly debugged the compression code; I know for sure that it's working well. The decompressor is able to throw the original data back at me without issue, but it's not writing correctly to the output file.

In other words, the algorithm works, but the program doesn't.
Logged
PsySal
Level 8
***


Yaay!


View Profile WWW
« Reply #24 on: May 31, 2010, 09:40:59 PM »

_madk,

I know that everyone is telling you you'll fail, but... The truth is I think it will be a productive exercise.

One way to think of this particular entropy problem is like so:

- There *is* a random seed to go with an RNG that will convert your data to something more compressible. In fact, whatever bytes of data you want to compress will likely appear directly in your RNG stream, if it's a good enough stream and you aren't trying to compress too many bytes.

- So to compress any data, it remains to just store the seed that you use to generate that stream, and/or the stream position where that data appeared.

... however...

- The data you need to record in order to store that seed and/or it's position in the stream will eat up, on the whole, whatever benefit you gain from storing it this way. You actually will be able to compress your data this way, but not beyond a certain point.

Another simple "proof" that there are limits to compression:

- Imagine you have a compression algorithm that will always reduce the file size by 1%. I.e., there is not a theoretical limit to compression and you have an algorithm that is very modest, and will compress a file by only 1%. Further, assume it's general purpose, that it can compress anything.

- Start with any input file, and keep re-compressing your output, over and over again.

- Eventually you will have a file that is say 99 bits long (99 bits * 1% = 0 bits, if you're rounding down which we will do to support our "modest" argument; so this is as far as you can go if your algorithm always compresses by 1%)

- Now, 99 bits is a lot of bits. Not only that, but your typical file will have to have been reduced a lot of times to get it down to 99 bits. However...

There are certainly more than 2^99 different possible files! For instance, you can easily imagine creating them by brute force. But you've only got 2^99 possible compression outcomes. So your algorithm can't have compressed ANYTHING; since each of these 2^99 must eventually decompress to SOMETHING, you can imagine decompressing your possible 2^99 possible "final" compression outcomes to something.

So there must be files it doesn't know how to compress. But this conflicts with our original proposition that it's a general purpose compression algorithm. So there has to be a problem somewhere with our assumption about how our theoretical algorithm can work. Either it can't compress any type of file, or it must get to a point where it can't even reduce it by 1%.

Well, this is sort of a lacksadaisically stated version of this proof but there is a more robust version here:

http://en.wikipedia.org/wiki/Magic_compression_algorithm#Limitations

Something else that is really interesting to read up on is:

http://en.wikipedia.org/wiki/Kolmogorov_complexity

The point is, I think it's very useful learning and experimental experience to try these things out. The truth is even if you found a novel way of compressing something that wasn't actually better than what we already have, it could be quite useful. That alone would be worth experimenting. Maybe it would be useful for things other than compression, too!

But even if you don't, it's really a great exercise to wrap your head around these kind of problems-- it's what makes math so interesting! And you never know what you'll actually discover. I don't pretend to be an expert in these kind of topics but the way I see it they are quite deep and definitely worth experimenting with.
Logged
Kekskiller
Guest
« Reply #25 on: June 01, 2010, 12:11:25 AM »

Oh guys, stop talking so much. Everybody can start his own odyssey which is possibly just a jump between two muddy barrels. And especially compression is a point where we really need a non-theory-only approach to find something more interesting. See, I'm not talking about something specific. Years passing in science and suddenly someone discovers something that was theoretically impossible. But it is - in fact - possible. Maybe he'll find a way to revolutionize compressions this way - maybe not.

So, no matter what the result is, rants are much more annoying than actual development logs.
Logged
Core Xii
Level 10
*****


the resident dissident


View Profile WWW
« Reply #26 on: June 01, 2010, 03:28:08 AM »

I don't understand all the... unnecessary complexity in your algorithm. Why 4 bits? Why shuffle anything around?

The way I'd approach procedural compression is you make a program that can evolve other programs. So you build a virtual machine and generate some random instructions as programs. Your VM then executes these programs and they output some bytes. Then you run them through a fitness function that measures how close the outputs came to the original file. You clone and mutate the best programs, drop the worst. Repeat. Eventually, you'll have evolved a program that can generate close to the original file.

Since you won't evolve a program that generates the correct file 1:1, you'll just need to stop evolving at some point when you're close enough and patch the differences. So you take the best program's output, diff it with the original file, and store the differences separately.

To decompress, your VM runs the evolved program, takes the output, applies the diff to patch out the last pieces and there you have it.

Your fitness function should probably rank programs also based on speed, so you get a fast decompressor.

The advantage to this method is that you can also distribute the compression phase. Just give a hash of the original file to all the clients (on second thought, it has to be an incremental hash in order to measure closeness to original file) and have them run simultaneous evolution. Have them exchange best programs at intervals. Compressing@Home!

Now, I should note that evolving a compression takes a long time. You'll probably have to evolve for weeks on a single file. And I'm not good with statistics or even compression so I don't even know if this is viable in the first place. But this is how I'd do it.
Logged
Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #27 on: June 01, 2010, 04:53:34 AM »

Thanks for the conversation and feedback; once I've ironed out the bugs I'll likely be seeking a software patent. The algorithm works, I just haven't quite finished the program yet.
Logged
Kekskiller
Guest
« Reply #28 on: June 01, 2010, 05:30:00 AM »

So, how good is the compression ratio? Did you test it with random data, too? Or already highly compressed files? If it works like you said, then it may be a fabulous thing you have there.
Logged
muku
Level 10
*****


View Profile
« Reply #29 on: June 01, 2010, 07:19:30 AM »

Regardless of how well it compresses (of which I'm still very doubtful), if it really takes several minutes or more to compress a 128-byte chunk, then it's profoundly useless for any real-world application.

Not that I want to dissuade you from doing this; as others have said, this is a great learning experience. I had a similar phase, maybe 10 years ago or so, when I went through some excellent tutorial and implemented several major compression algorithms for fun. It's quite gratifying to have the magic taken out of compression, to understand how it works and to know that you could do it yourself if you really needed to.

Just don't expect that you can turn decades of coding theory on its head with some humdrum idea. That's a bit arrogant, frankly, and will just lead to disappointment.
Logged
Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #30 on: June 01, 2010, 07:23:33 AM »

Quote
Regardless of how well it compresses (of which I'm still very doubtful), if it really takes several minutes or more to compress a 128-byte chunk, then it's profoundly useless for any real-world application.
That was a very pessimistic view to keep my expectations low, fortunately it has exceeded them significantly. (never more than 2 seconds/chunk, but considering a chunk is 128 bytes, the time adds up.)

It reduced a 169KB executable to 1.09kb in 100 passes over the outputted data. I didn't record the time; it wasn't a solid test, but I think it took about 20 minutes. Additional passes, based on what tests I've run on other files, would likely bring it down to about 600 bytes.

Note that I have not yet taken a single step toward optimization.
Logged
muku
Level 10
*****


View Profile
« Reply #31 on: June 01, 2010, 07:30:26 AM »

It reduced a 169KB executable to 1.09kb in 100 passes over the outputted data. I didn't record the time; it wasn't a solid test, but I think it took about 20 minutes. Additional passes, based on what tests I've run on other files, would likely bring it down to about 600 bytes.

That's all very nice, but last I heard you still haven't managed to uncompress it again... Giggle

Unless that executable is like 99% zeros, then I totally believe you.
Logged
Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #32 on: June 01, 2010, 07:31:26 AM »

 Big Laff
Logged
Overkill
Level 3
***


Andrew G. Crowell


View Profile WWW
« Reply #33 on: June 01, 2010, 08:12:24 AM »

You're acting like one of those people that the comp.compression FAQ talks about. One section, in particular, I found interesting:

Another common (but still incorrect) argument is to assume that for any file,
some still to be discovered algorithm might find a seed for a pseudo-random
number generator which would actually generate the whole sequence of bytes
contained in the file. However this idea still fails to take into account the
counting argument. For example, if the seed is limited to 64 bits, this
algorithm can generate at most 2^64 different files, and thus is unable to
compress *all* files longer than 8 bytes. For more details about this
"magic function theory", see http://www.dogma.net/markn/FAQ.html#Q19

Anyways, why do you continue to post about this here? You don't want to listen to other people's advice. You have failed to come up with decisive evidence that your compression scheme even works. And now that you've <snipped> out every detail about the algorithm, effectively removing any chance of useful discussion.
« Last Edit: June 01, 2010, 08:30:28 AM by Overkill » Logged

Zaratustra
Level 7
**



View Profile WWW
« Reply #34 on: June 01, 2010, 09:09:19 AM »

btw Software is not patentable in the US unless tied to a specific hardware setup.

I guess you don't care anyway. Good luck with whatever you're doing.
« Last Edit: June 01, 2010, 09:19:47 AM by Zaratustra » Logged

Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #35 on: June 01, 2010, 09:31:43 AM »

@Overkill
That's entirely different from what my algorithm does. And I <snipped> the details because further discussion is unneccesary and I'm not interested in my algorithm being copied/stolen. It works because it doesn't rely upon the random seed entirely, only for a portion of the activity. I don't care to explain myself, but it works.

@Zaratustra
My apologies - I was mistaken in referring to it as a software patent. I wouldn't be patenting the program, but rather the algorithm I'm using, which is patentable.
« Last Edit: June 01, 2010, 09:35:25 AM by _Madk » Logged
David Pittman
Level 2
**


MAEK GAEM


View Profile WWW
« Reply #36 on: June 01, 2010, 09:53:53 AM »

Can we turn this into a discussion on the pros and cons of software patents? Pretty please?

Because patenting software if you don't already have a product pipeline in place to capitalize on the patent is among the douchier things a developer can do. Patents exist to protect R&D investments, essentially. You sink a lot of money into finding something new that works, and the patent gives you time to recoup those costs without other people taking your ideas. If you're patenting something you just developed over the 3-day weekend and aren't going to profit from, you're just stifling creativity and further development (which has been the tone of this thread, incidentally).
Logged

Overkill
Level 3
***


Andrew G. Crowell


View Profile WWW
« Reply #37 on: June 01, 2010, 10:12:09 AM »

@Overkill
That's entirely different from what my algorithm does. And I <snipped> the details because further discussion is unneccesary and I'm not interested in my algorithm being copied/stolen. It works because it doesn't rely upon the random seed entirely, only for a portion of the activity. I don't care to explain myself, but it works.

I am aware you don't necessarily rely ONLY on a random number generator, but as I remember reading from earlier, you segment the file into chunks which are then compressed -- but these chunks are longer in byte length than the seed used to generate them. Anyways: "The net results is always the same: you cannot create a function using X bytes that will generate all sequences of length Y, when X is less than Y." [Mark Nelson's FAQ]

Alright whatever, if you don't explain yourself, don't expect anyone to adopt the use of your algorithm or to remotely care about what you're making. Also: there are other means to discourage misappropriation of work without patents (for example, licensing your implementation as open-source software). And besides, since most people in this thread have been discouraging you to work on this idea, I cannot fathom a reason why they would want to steal your work. It is clear they saw flaws with your idea, and have been providing several counterexamples to that effect.

Best of luck! That's all.
« Last Edit: June 01, 2010, 10:26:59 AM by Overkill » Logged

Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #38 on: June 01, 2010, 10:36:27 AM »

@Overkill
The seed is not used to regenerate the data, it's used to reorganize it. There aren't 16^128 ways to go, as you seem to be stuck on, but rather only 2^15 (256 Factorial). With some refining of the algorithm I've been doing recently, this value will instead become 136 (16 Factorial).
As for stealing the idea, it won't be terribly long before I've got a working demonstration program. Once the naysayers are silenced, I have no intention of making this some sort of giveaway.


@Pitt
If I don't file a patent, I don't imagine it would be easy to prevent a large corporation from claiming the same or similar idea as their own.





The algorithm works. I've been doing some mathematical analysis: it is mathematically impossible for the algorithm (using a different though still brute force method I've conceived) to be unable to compress a file over 128 bytes. With some changes, this minumum size will be reduced, but the main benefit will be that it'll actually perform much faster (Not on a single pass, but fewer total passes will be required) than it does now.

Dispute it all you want, but it works. I understand all these compression rules you're throwing at me, but I daresay I've found some loopholes.

Logged
David Pittman
Level 2
**


MAEK GAEM


View Profile WWW
« Reply #39 on: June 01, 2010, 10:58:45 AM »

If I don't file a patent, I don't imagine it would be easy to prevent a large corporation from claiming the same or similar idea as their own.

But what do you lose in that case?
Logged

Pages: 1 [2] 3 4 ... 7
Print
Jump to:  

Theme orange-lt created by panic