Oh Moldy Crabapples
I REALLY went off topic there, didnt I

more on the wget:
up to 8.5 GB and 108,770 files
still growing.
I can click on any downloaded thread and see the individual page and discussion in its entireity, along with
usernames, blach blah blah. Its literaslly an html snapshot in time.
At this point, clicking on "next" sends me to to live BG page. When wget finishes up, it is "supposed to"
have linked in the appropriate links via secret magic index pixie dust.
and when its done ( in a week? a month?) I should have mine very own
snapshot copy of brasssgoggles everything!
to hold and view
and share with everybody
via the wayback!

prof rambles
Huzzah! Good show, Prof. Marvel!
That last property of the last indexing link connecting to a live page gives me the idea that the snapshot could be used as a "kernel" for a new forum!
That would seem possible! since this mirror is all html, i dont see why I couldn't edit a bunch of links to point to "the new one".
ok, about 36 hours in we have
14.3 GB (15,457,669,120 bytes)
193,288 Files, 8,069 Folders
two consecutive file examples - under local files /brassgoggles.so.uk/bg-forum/
index.php@PHPSESSID=4ec7d5220eb53a13750e0f9050ca5257&topic=5079.15 96.2 KB (98,509 bytes)
index.php@topic=104.msg183610 112 KB (114,688 bytes)
the first is this complete page of a thread
http://brassgoggles.co.uk/forum/index.php/topic,5079.0.htmlthe second is this complete page of a thread
http://brassgoggles.co.uk/forum/index.php/topic,104.5925.htmlI am hoping that wget somehow musteriously resolves the indexes going from one page of a thread to the next...
it is also saving the "printer" or text only versions of complete threads as one long file - eg
this
index.php@action=printpage;topic=11505.0 388 KB (397,312 bytes)
is a snapshot of this thread
http://brassgoggles.co.uk/forum/index.php/topic,11505.0.htmlbut in this form
http://brassgoggles.co.uk/forum/index.php?action=printpage;topic=11505.0Dang, this is actually saving user profiles and the permissions too! ... which I can't see when I open it because it is not my profile
and I don't have the permissions... hmm wait. I bet it is saving the html error page when wget "ttries" to look at the permission.
if if wget doesnt link threads in a topic, it will not be hard to write a script to do so, since each thread has its own unique topic number,
followed by a "msg" number would seems to indicate thread link ie
index.php@topic=9039.msg275269
points to this
http://brassgoggles.co.uk/forum/index.php/topic,9039.0and I got there by manually pasting "topic,9039" onto the the base "
http://brassgoggles.co.uk/forum/index.php/"
easy to extract the topic number, and replace"topic=" with "topic,"
i used to do this cfrap all the time - we got data from one DB, we were "only supposed to be able to view it" ...
but our job required USING IT
so I exttracted the operant stuff from html or whatever, looked for links, file names, etc,
and either accessed the actual file itself or re-formated the query, or just grabbed the live raw data out of the web page
and reformatted it to suit our needs - usually to feed another program or script that acted on it.
I actually have a couple patents on programs doing just that.
One guy wrote a program to "do stuff" and was distributed on servers literally around the world.
If it ran into problems, it would throw errors and the guy expected a humna to sit at a console/workstation
and resolve them real-time by hand.
That program worked really well at what it did, but was spaghetti code, he ws no longer around, and
so modifying and getting it updtaed around the globe ...it would have been a nightmare.
Soooo instead of "doing what I was told" I started taking the html error data, parsing it out with "C Shell" scripts
reformatting it into a human readable dashboafrd to make it easier to deal with....
then ran basic net diagnostics to see what the fault was and where....
Then found we had a global DB of server info that I could "read" with a human realtime query...
so I wrote another script to appear to be a command line query and got the data on the failing server...
then found I could query the company phone book to get contact poop on employees...
and I rebuilt the display to show
error / server that faulted / where it lived / who "owned maintained" by name , company phone and email, and their boss
then I built anoither backend that not reply generated a "trouble ticket" with all that crap
AND the dignositic data I logged to make it easier for him to fix it!
and emailed it to the server/network admin in question
and copied his boss if he did not reply within a couple hours.
Later I wrote another backend to put all that crap into the actual "trouble ticket system" that the compnay had.
So instead four or six of us sitting at a console all time every third day playing monket boy, we would come in at regular hours
check my trouble board, and check the emails/trouble tickets/ etc to see if the admins responded to fix it
and check my "repairs not done" board to see if it was back and we could kick off an abreviated
"redo the job" using my list of "broke but repaired servers"
THAT was very satisfying, esp since the bosses saw value and let me do what was needed. And I got a couple patents
and learned a lot about patent law. I got screwed out of one patent because IBM was pursuing something similar to one
of my tools and they filed first

It was very satisfying work ....
Later we formed a small group to turn the whole thing into a real product with the intent to sell it, but we were under the wrong
VP, and some other VP in the bay area sorta squashed it by not marketing it all. No sales == dead product.
Hellfor what we had invested in it, we could have GIVEN IT AWAY bundled in with our other product stack, as a freebie wonder tool
for our customers to use themselves but oh no, that would not fly either.
I actually had a chance to fly out on a sales tour to the PAC RIM to promote this thing, Australia, Hong Kong, Singapore, Japan...
But it started getting wierd when manager took what was supposed to be a 4-6 week trip and kept adding more and more cities...
Then he built a slide shiow presentation that claimed we were montioring and fixing over 1000 servers wroldwide...
but it was only 625. He said, "oh it will be that many by then" and I said... No I can't lie in a preso to our own peole let alone lie to
our customeres....
Sooo he said ok and got someone else to do it. They were gone for almost 10 WEEKS on a plane like every third day....
looked like hell when she finally got back....
and I moved on to another support team under a better manager who wold not lie....
And thats what we "used to" call HACKING
oops i digressed again...
more on wget and
molesting modifying the links if need be later
prof omg he is rambling again .....
OOOOOohhhhhkaaayyyyyy
more messing about.
the structure is becoming clearer
so we have TOPIC number == thread in brass goggles. forum/subforum is not indicated
we have PAGE within a topic, starting at zero and incrementing by 25 by appending said number to the topic
ie
http://brassgoggles.co.uk/forum/index.php/topic,9039.0.htmlhttp://brassgoggles.co.uk/forum/index.php/topic,9039.25.htmlhttp://brassgoggles.co.uk/forum/index.php/topic,9039.50.htmlhttp://brassgoggles.co.uk/forum/index.php/topic,9039.75.htmlhttp://brassgoggles.co.uk/forum/index.php/topic,9039.100.htmlsomewhere perhaps I could obfuscate the thing to put an entire thread on one very long html page...
....
and we have individual user entries into the topic designated by "msg######" appended to the topic number
ie
http://brassgoggles.co.uk/forum/index.php/topic,9039.msg168542.htmlintersting
wget now up to
200,583 Files, 8,377 Folders
15.0 GB (16,122,040,320 bytes)
prf mumbles