Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411722 Posts in 69402 Topics- by 58455 Members - Latest Member: Sergei

May 22, 2024, 07:42:46 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Procedurally Generated Compression
Pages: 1 ... 5 6 [7]
Print
Author Topic: Procedurally Generated Compression  (Read 18077 times)
Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #120 on: June 03, 2010, 02:35:58 PM »

Not to want to quote a batman film, but:
Quote from: Thomas wayne
Why do we fall [bruce]?  So we can learn to pick ourselves up.

You just made my day - I love that film.


@Zarasutra
It's completely unnecessary to attribute that many bytes to the possible arrangements, as I said n a previous post, I'm just trying to figure out how many of the arrangements really do need to be available without sacrificing compression to a horrid extent. As for that ABABABAB you provided, the algorithm is designed to transform that into a sequence of A's and a sequence of B's, or, if that arrangement isn't available, it will be intelligent to know that one or two misplaced values is not the end of the world. Also considering I'm handling values as 4 bits each rather than 8 makes it much harder to stumble upon an uncompressable portion of data.

Also, when going over the same file many times, as Overkill pointed out could be counterproductive, chunks where no arrangement could be found that resulted in a reduction of total size, it will be saved differently formatted (most likely in the form of a bit shift on a per value basis - 0 would become 2, 1 -> 3, 2 -> 4, 13 -> 15, 15 -> 1, etc) so that in addition to being at a different offset and therefore spreading its contained data at the end of one chunk and the beginning of the next, the values will change, hopefully resulting in a greater chance of a significant amount of compression.



Also, I will be mindful to begin developing the decompressor alongside the compressor. I do thank you for the advice.
Logged
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #121 on: June 03, 2010, 02:50:52 PM »

I see no need for ridicule.

No, I see plenty of cause for ridicule. Not for experimenting, that is fine. Nor ignorance of general compression knowledge, that's precisely the point of experimenting. Madk's sin is one of arrogance, that he didn't take the time to consider he might be wrong, shown in every action, but particularly in failing to grasp the impossibility proof repeated at every opportunity.

But he's owning up to his mistakes, somewhat, and he's never been insulting, so that's good. I'm sure this will eventually be a learning experiment for him, but not perhaps the one he wanted (or more likely, in addition to the one he wanted).
Logged
Rob Lach
Level 10
*****



View Profile WWW
« Reply #122 on: June 03, 2010, 07:35:44 PM »

_Madk, I don't mean to be offensive but you seem to have had fallen into the same trap that has taken many inventors seeking to make perpetual motion machines; you come across a very ingenious idea but either look past or refuse to believe where that idea fails.
Logged

gimymblert
Level 10
*****


The archivest master, leader of all documents


View Profile
« Reply #123 on: June 03, 2010, 08:47:51 PM »

But there is also the tales of Langley VS Wright brothers.
You never know who is the Langley.

The problem is also that "naysayers" assume ideals condition like in theory, but what is the real degree of entropy of most files? I may be wrong and prove wrong and yet find something that work in most practical ways. I have seen that happen a LOT.

EDIT: i don't believe the claim that all file can be reduce to 69 bytes.
« Last Edit: June 03, 2010, 09:00:24 PM by neoshaman » Logged

Alex May
...is probably drunk right now.
Level 10
*


hen hao wan


View Profile WWW
« Reply #124 on: June 04, 2010, 01:31:42 AM »

The way I see it (in my mathematically uncultured mind), you can't do it because it's a look-up, either into an algorithm that describes the file completely or into a database. Either way you need to have all possible files described at the decompression location before you "decompress" the file. A database is out of the question because if you have all the files already then you don't need to decompress. Good luck finding an algorithm that can reproduce any given megabyte-long file, that is so trivially short that a) it can be redistributed and b) it can be computed in the first place.

edit: hadn't read some essential posts
« Last Edit: June 04, 2010, 01:42:41 AM by Alex May » Logged

Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #125 on: June 04, 2010, 02:43:26 AM »

EDIT: i don't believe the claim that all file can be reduce to 69 bytes.

Neither do I - that was a delusional claim thanks to the mistake that 64 factorial was 64+63+62... and not 64*63*62...
However, I'm still confident the compression will perform better than other algorithms; we will see. I've had a lot of busy stuff lately, so I haven't been able to work at it much.
Logged
Balrog
Level 2
**



View Profile WWW
« Reply #126 on: June 04, 2010, 10:21:31 AM »

I've been kicking around an idea for compression. Basically you would scan a file for patterns of long strings of bits and also scan for small non-occurring patterns and replace the large patterns with the small and define the mapping in some kind of header part of the file. Something like:

00010101111111001111111 Original

Map 1111111(large occurring pattern) to 11 (smaller non-occurring pattern)

0001010110011 End result

I dunno, it seems like in any file of significant size you'd have some large pattern that you could map to a smaller pattern.
Logged

Chromanoid
Level 10
*****



View Profile
« Reply #127 on: June 04, 2010, 10:51:53 AM »

the problem to this approaches is how to separate the "mapping keys" from the normal stuff.
Logged
Tycho Brahe
Level 10
*****

λx.x


View Profile
« Reply #128 on: June 04, 2010, 11:57:56 AM »

This seems a lot like rle, with the added benefit that you can apply that to patterns as well as blocks of bits
Logged
shrimp
Level 5
*****


View Profile WWW
« Reply #129 on: June 04, 2010, 01:10:59 PM »

See LZW and Huffman coding
Logged

st33d
Guest
« Reply #130 on: June 04, 2010, 03:51:17 PM »

This seems a lot like rle, with the added benefit that you can apply that to patterns as well as blocks of bits

Oh, so like LZW?

I mean come on _Madk either you have something here or you don't. If you posted a snippet then people would jump on it in a good way.

We all like computer science on this part of the board. Even if you've got it wrong then we can be grown ups and say, well that's an interesting problem.

You don't cite references to established compression algorithms, that's bad. Always cite references, that's how you pass a degree.

Put some code on the table that will run. We'll solve the problem together. And either it will be wrong or we'll all go, "shit, that _Madk kid had a point." It doesn't matter that you solved it, it matters that WE solved it, after that WE knew you were right, and we'll support you.

No code. No dice.
Logged
Pineapple
Level 10
*****

~♪


View Profile WWW
« Reply #131 on: June 04, 2010, 04:21:12 PM »

Alright.

Here's what I've got down in code at the moment, it's written in BlitzMax.

Code:
SuperStrict

Framework brl.StandardIO 'Standard console IO
Import brl.FileSystem
Import brl.Stream



Global chunksize%=32  '# of values to store in each chunk. Lower values are faster but higher values (until 256) result in more efficient compression. This is not considering availability of different arrangements.

Global persist:Byte=0 '0 is quick, 1 is regular, and 2 is dedicated for the persistence setting.

    'PERSISTENCE:

' QUICK: would search only 1/32 of all available seeds, would use the first detected to result in a size any smaller,
' and would leave the data uncompressed if none of the seeds tested resulted in a size reduction.

' REGULAR: would search exactly 1/4 of all available seeds and use the most efficient of those tested, or leave
' uncompressed if no seeds succeeded.

' DEDICATED: would not stop searching until it found a suitable seed or exhausted all possibilites, in which case it
' would leave the data uncompressed.



Global maxtries!
'If persist=




Global combos%=0 'number of possible ways to organize the number of values in a chunk. (factorial)
For Local calc_combos%=1 To chunksize 'calculate
combos:*calc_combos
Next


Global fil:TStream
Function compress(in$,out$)
Local a!=FileSize(in)*2 'size of arrays necessary to hold the file
Local c!=Int(a/Float(chunksize)) 'number of chunks held in the file
Local extra_bytes%=a Mod chunksize 'extra bytes at the end that are too small to form a chunk

Local mem@[a] 'original data array
Local sec@[c,chunksize] 'shuffled & organized data array
Local cmp@[c] 'use this when seeking to rearrange each chunk.

fil=ReadFile(in) 'open the input file
If Not fil error "Could not open input file"

'put file data into mem array
Local mem_array_index!=0
Repeat
Local b@=ReadByte(fil) 'read the byte from the file stream
mem[mem_array_index+0]=b Mod 16 'first memory address is the right 4 bits. ( [.][.][.][.][!][!][!][!] )
mem[mem_array_index+1]=(b Shr 4) Mod 16 'second memory address is the left 4 bits. ( [!][!][!][!][.][.][.][.] )
mem_array_index:+2
Until Eof(fil)
CloseFile fil

'organize mem array data into second (sec) array
For Local into!=0 To a-1
sec[Int(into/Float(chunksize)),(into Mod chunksize)]=mem[into]
Next

'find smallest compression possibility for each chunk
Local newloc@@[chunksize] 'where each value would most desireably be moved to

'

fil=WriteFile(out) 'write the output file
If Not fil error "Could not create output file"

End Function



Function expand(in$,out$)

End Function


Function error(txt$)
If fil Then CloseFile fil
Print txt
Print "Press enter to exit.."
Input
End
End Function


Logged
Eraser
Guest
« Reply #132 on: June 04, 2010, 09:47:46 PM »

Ha, I have a newfound respect for anyone who can create a working algorithm for file compression. I got really bored today and spent a few hours creating an LZW-like algorithm. It indexes 8 byte segments. Unfortunately, this doesn't have any real application. Aside from the compressor exponentially eating memory and being slower than a snail; because most files are not so perfect as to have a lot of repeating 8 byte patterns, this raised the file size 95% of the time (because it uses a 4 byte tag to decide how it should decompress 8 byte segments). Anyway, this was my first and probably last attempt at self-rolling compression, heh.

_madk, thanks for sharing your code; perhaps (if it's compressor code is complete) someone can write a decompressor to see if it works properly if you don't get around to it.
Logged
bateleur
Level 10
*****



View Profile
« Reply #133 on: June 05, 2010, 12:24:13 AM »

Here's what I've got down in code at the moment, it's written in BlitzMax.

First thing I notice is that you seem to be calculating the number of combos. No need for that. It's pretty obvious which ordering will compress most effectively: just placing all equal-valued digits adjacent to one another. So all you need to do is find a way to describe the resulting permutation.

(I say "all"... actually the most efficient encoding in the general case is going to be the uncompressed data! I'll leave you to find ways around that issue, if possible!)
Logged

Tycho Brahe
Level 10
*****

λx.x


View Profile
« Reply #134 on: June 05, 2010, 01:15:16 AM »

Oh, so like LZW?
I hadn't heard of that, thanks!
Logged
drChengele
Level 2
**


if (status = UNDER_ATTACK) launch_nukes();


View Profile
« Reply #135 on: June 06, 2010, 05:26:13 AM »

What a coincidence, when I was 12 I wrote a kick-ass compression algorithm that was able to compress every file in the world into a 256-byte file.

Now if only there was a way to decompress them...  Well, hello there!
Logged

Praetor
Currently working on : tactical battles.
baconman
Level 10
*****


Design Guru


View Profile WWW
« Reply #136 on: June 06, 2010, 05:40:24 AM »

 Roll Eyes

1. Understand multiple, efficient compression algorhythms that already exist.

2. Analyze file contents to determine optimal algorhythm for each particular file (images, sound, text, video, code, executables, and other file types would all have different consistency patterns about them, anyways), and group them accordingly.

3. Make compressed file groups, based on optimal choice, and index the files and formats accordingly.

4. Bundle these groups into a single file/format for transfer and storage convenience, indexing them based on their compression algorhythms; prepare file for optimal decompression (IE: add a quick switch command to change the algorhythm on the fly, as the main compressed file is decompressing); and finally, add a decompressing agent that responds to this.

5. Rock on.

I could be wrong, but isn't this exactly what "maximal" RAR compression does, btw? (And why it sometimes takes so long? "Quicker" variations just use less compression possibilities, resulting in bigger files, but quicker decompression, that way.)
Logged

Pages: 1 ... 5 6 [7]
Print
Jump to:  

Theme orange-lt created by panic