Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411196 Posts in 69314 Topics- by 58380 Members - Latest Member: feakk

March 18, 2024, 06:32:27 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsCommunityJams & EventsCompetitionsOld CompetitionsAssemblee: Part 1Assembleech - The Assemblee Link Leech (19/12 update)
Pages: 1 [2] 3 4
Print
Author Topic: Assembleech - The Assemblee Link Leech (19/12 update)  (Read 38408 times)
Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #20 on: November 26, 2009, 06:28:36 PM »

I made a script to download everything from this script to the proper author sub-folder so it's compatible with the Indie Asset Organizer.

I just need confirmation if Sos is willing to modify his script a bit for that.

I'm building it for Windows. It's open-source and depends mostly on "wget" so Mac/Linux users could build it too.
Thanks to wget's parameter "-c", you can continue if a download fails/gets broken, and I think it doesn't redownload once it did ok, so you can recall the script.

Regards.

Logged

Working on HeliBrawl
Sos
Level 8
***


I make bad games


View Profile WWW
« Reply #21 on: November 27, 2009, 03:15:57 AM »

I made a script to download everything from this script to the proper author sub-folder so it's compatible with the Indie Asset Organizer.

I just need confirmation if Sos is willing to modify his script a bit for that.

I'd be delighted to, but you didn't say what do i need to change.
Logged

Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #22 on: November 27, 2009, 07:20:45 AM »

I sent you 2 PMs yesterday, haven't you received them?
Perhaps if you're not too busy we could work together a way to easily integrate it with the IAO, dunno.

If you can manage to get the author with the link I can make a tool to download it in the correct folders.
Instead of HTML a raw output of

Quote
AUTHOR1
LINK_A
AUTHOR1
LINK_B
AUTHOR2
LINK_C
....

If you go to the first post, I found a pattern:

In the first TD tag that is of class "windowbg" you go
TD class=windowbg -> table -> tr -> td -> b -> a -> innerText
And that's the author's name  Smiley

Code:
<td class="windowbg">
<table width="100%" cellpadding="5" cellspacing="0" style="table-layout: fixed;">
<tr>
<td valign="top" width="16%" rowspan="2" style="overflow: hidden;">
<b><a href="http://forums.tigsource.com/index.php?action=profile;u=3670" title="View the profile of Sos">Sos</a></b>
<div class="smalltext">

Let me know what you think!

Regards! Beer!




Ok, I did a preliminary for the utility. It depends on wget.

If you want to do it, I just need this format:
Quote
author\n
url\n
author\n
url\n

It needs a "\n" instead of <BR>, so I can load the file getting line by line as a raw text file.

Thanks and best regards!

Sorry for the misunderstanding Smiley
Logged

Working on HeliBrawl
Sos
Level 8
***


I make bad games


View Profile WWW
« Reply #23 on: November 27, 2009, 07:38:50 AM »

Sorry, didn't notice them. It seems a tough job to extract authors name, I will try it tho.
Logged

Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #24 on: November 27, 2009, 07:44:22 AM »

Check these out:
http://keithdevens.com/software/phpxml
http://php.net/manual/en/book.xml.php
http://www.phpclasses.org/browse/file/17412.html
Logged

Working on HeliBrawl
Sos
Level 8
***


I make bad games


View Profile WWW
« Reply #25 on: November 27, 2009, 02:25:25 PM »

These are XML parsers. Forums output some major HTML mess (I dare you to take a peek Tongue)
Logged

Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #26 on: November 27, 2009, 03:52:12 PM »

But HTML is like XML, it will parse it, believe me, I did it a thousand times (not in PHP though)

It's not that messy, really. As I told you, if you search for the first TD with the property "class" = "windowbg", you can easily get the first post.
Logged

Working on HeliBrawl
Sos
Level 8
***


I make bad games


View Profile WWW
« Reply #27 on: November 27, 2009, 05:39:11 PM »

I'd rather use just preg_match to get the text between '"View the profile of' and '">', however the hard part is to get it to assign post to poster.
Logged

Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #28 on: November 27, 2009, 05:49:28 PM »

Oh but I was talking about something way more simple:
Just search for the the original poster (1st post) and all the links in the thread are assumed to be from him Smiley

I mean, it's heuristic but pretty reasonable.
Logged

Working on HeliBrawl
Sos
Level 8
***


I make bad games


View Profile WWW
« Reply #29 on: November 27, 2009, 06:17:37 PM »

i can't since I rely on pages, not threads (some got more of them).

Anyways, let's get all this tech mumbo-jumbo out of this thread, I'll do my best
Logged

Fifth
Level 10
*****



View Profile
« Reply #30 on: November 27, 2009, 07:44:19 PM »

You could try the threads in print mode.  You'd get all of the posts in one page with the author's name up at top.
Logged
Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #31 on: November 28, 2009, 12:42:16 PM »

Well, if someone manages to do before tomorrow it it would be cool.

Here's the source of the download tool and the correct wget for Windows (command-line, not gnuwin32's GUI one).

Src + Wget.exe

GET THE ONE FROM THE FIRST POST! Smiley

You can compile it in Mac easily with:
Quote
$ g++ -Wall leech_tool.cpp -o leech_tool

And in Windows w/MSVC creating a "Console app/Empty Project" solution,
dragging the leech_tool.cpp to the "Sources" tree folder and compiling.

I could get you executables but I need to know the leech link to directly compile it there (to avoid people having to open a console/terminal and writting the link themselves)

Or here's the code if you're too lazy to download the zip

Code:
// leech_tool:
// Tool to download an "<author>\n<url>\n<author>\n<url>\n"... file list and then
// put each file under the correct author folder.
//
// It was built to work with Indie Asset Organizer and
// download everything using the Assembleech script (by "Sos", Thanks to you!)
//
// Released under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 license.
//
// You need to have wget on the same folder where this is run from.

//v1.0

#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string>
#include <list>
#include <sstream>
#include <ctype.h>
using namespace std;

#error You must define a URL before you compile
#define URL_LEECH //"http://so0os.pl/assembleech/"
#define TEMP_LIST "./temp-list.txt"


//composite string, ret code
list<pair<string, int>> failed;

const char * const wgetExitCodes[] = {
    "No problems occurred.",
    "Generic error code.",
    "Parse error—for instance, when parsing command-line options, the ‘.wgetrc’ or ‘.netrc’...",
    "File I/O error.",
    "Network failure.",
    "SSL verification failure.",
    "Username/password authentication failure.",
    "Protocol errors.",
    "Server issued an error response.",
};

void getFile(const string &author, const string &url)
{
string::size_type slash=url.find_last_of('/');
string filename = slash != string::npos ? url.substr(slash+1) : url;

cout << author << "'s " << filename << "...";

//quiet, continue, no host directories and output to author directory
stringstream call;
call << "wget -q -c -nH -P ./assets/" << author << "/ " << url;

int ret;
if((ret=system(call.str().c_str())) != 0) {
stringstream failStr;
failStr << author << "'s " << url;
failed.push_back(make_pair(failStr.str(), ret));
cout << "FAIL" << endl;
}
else
cout << "OK" << endl;
}

bool getList()
{
cout << "Retrieving full list..." << endl;
int ret;
//quiet and output to specific path
if((ret=system("wget -q -O " TEMP_LIST " " URL_LEECH)) != 0) {
cout << "Couldn't retrieve list. wget error "<<ret<<(0<=ret && ret<=8 ? wgetExitCodes[ret] : "???")<<endl;
return false;
}

ifstream flist(TEMP_LIST);
string author;
string url;
while(!flist.eof()) {
getline(flist, author);
getline(flist, url);

//Fix to work with IAO
for(string::size_type i=0; i<author.length(); i++) {
if(isspace(author[i]))
author[i] = '_';
}

getFile(author, url);
}

return true;
}



int main()
{
cout << "leech_tool: download everything from the assembleech by Sos to" << endl;
cout << " Indie Asset Organizer compatible folders" << endl;
cout << endl;
cout << "Links must be direct for this to work" << endl;
cout << "Failed links will be listed last" << endl;
cout << endl;

if(!getList())
return 1;

cout << "================== FAILED LIST ==================" << endl;
for(list<pair<string, int>>::iterator it = failed.begin(); it != failed.end(); ++it) {
cout << "FAILED(" << " - " << it->second
<< ": " << (0<=it->second && it->second<=8 ? wgetExitCodes[it->second] : "???")
<< "): " << it->first << endl;
}

cout << endl << "Press enter to exit" << endl;
getchar(); //Mac friendly I believe.
return 0;
}
« Last Edit: December 02, 2009, 03:13:11 AM by nitram_cero » Logged

Working on HeliBrawl
Craig Stern
Level 10
*****


I'm not actually all that stern.


View Profile WWW
« Reply #32 on: November 28, 2009, 11:35:30 PM »

Dude, this is awesome! The Assembleech Search seems to have problems when I search for mp3, though. Its results for this thread, for example, are

Quote

Just so you know.
Logged

Tesse
Level 0
**



View Profile
« Reply #33 on: November 29, 2009, 03:31:27 AM »

I don't know if my idea is a good one or not but...

I've downloaded all the links with the DownThemAll plugin for Firefox.
Now I have a folder with more than 1200 files and I don't know where do they come etc.

I don't know how to code PHP, otherwise I would already have done it.
I think basically that every link pointing to a png/jpg/gif should be displayed as an image that links to the orginal post when clicked on it.

Yes, would takes time to load but if it's to have a complete gallery of what you are looking for, maybe it worth it.
Logged

Please be friend of me.
Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #34 on: November 29, 2009, 01:10:26 PM »

I made the Indie Asset Organizer almost exclusively for this competition.
It went through almost unnoticed but it has really useful features. It even runs on Mac.

If there was a way to work out the assembleech to output authors and file links, leech_tool downloads everything into author-named subfolders and can be used by the Indie Asset Org. to be easily previewed (audio and graphics), drag-copied, etc.

It's a shame I spent time doing those tools and nobody is using them. Oh well.



« Last Edit: November 29, 2009, 01:14:02 PM by nitram_cero » Logged

Working on HeliBrawl
Sos
Level 8
***


I make bad games


View Profile WWW
« Reply #35 on: November 29, 2009, 02:01:22 PM »

Sorry for not having that done already. I was a bit busy, and now I'm away. will do it tmrw.
Logged

Pencerkoff
CCCP
Level 4
*


Hello I am Pencerkoff


View Profile
« Reply #36 on: November 29, 2009, 07:53:20 PM »

Hello this is Pencerkoff

It's a shame I spent time doing those tools and nobody is using them. Oh well.

Yeah, too bad you spent time expanding your horizons by creating something instead of playing Call of Duty 4.

-PENCERKOFF
Logged

Martin 2BAM
Level 10
*****


@iam2bam


View Profile WWW
« Reply #37 on: November 29, 2009, 08:31:37 PM »

Yeah, too bad you spent time expanding your horizons by creating something instead of playing Call of Duty 4.

I didn't expand my horizons, it was quite boring to do.
Well, I guess the grass is always greener on the other side... of Jamaica.

Sorry for not having that done already. I was a bit busy, and now I'm away. will do it tmrw.

No sweat! And thanks!
Logged

Working on HeliBrawl
JamesGecko
Level 3
***



View Profile WWW
« Reply #38 on: November 29, 2009, 11:16:20 PM »

Getting all this stuff in a torrent would be pretty great. Unless we wouldn't have enough people downloading at once to make it work at a reasonable speed?
Logged
chris_b
Level 1
*


View Profile WWW
« Reply #39 on: November 30, 2009, 01:21:10 AM »

I made the Indie Asset Organizer almost exclusively for this competition.
It went through almost unnoticed but it has really useful features. It even runs on Mac.

I noticed it and would totally use it, but to be honest I was being lazy and sort of waiting until maybe someone else organized all the stuff into the correct subfolders and put the sorted files up for download.
Logged
Pages: 1 [2] 3 4
Print
Jump to:  

Theme orange-lt created by panic