synthetikal.com Forum Index


The Hive Files Now Open
Goto page Previous1, 2, 3, Next
Post new topic   Reply to topic    synthetikal.com Forum Index -> The Hive Files
Author Message
IndoleAmine
Dreamreader Deluxe
Joined: 09 Feb 2005
Posts: 681
Location: Bahamas
18717.10 Points

Wed Mar 09, 2005 7:32 pm
Reply with quote

No - whether you view a html document (or make a backup on your hd), you always have to download the whole file, and save it, either temporarily or forever.

Downloading the whole archive is equivalent to viewing the whole archive, and we have enough bandwidth to allow a good number of users to view the whole archive every month, so this shouldn't be a problem.

And I think not many bees will have the patience to d/l all single files by hand, and those who have the endurance hopefully will not frequent our server that often, once they have all goodies on their hd - so we can in fact spare even more bandwith to others when you d/l it... Wink

(at least I hope so!? Very Happy )


i_a
Back to top
Novalis

Joined: 28 Feb 2005
Posts: 3
Location: Europe
0.00 Points

Wed Mar 09, 2005 8:10 pm
Reply with quote

I hope so too.

But you certainly know that it's possible to mirror such an archive easily with tools like WinHTTrack, etc. And since Rhodium's pdfs are quite popular and where not available for a long time, I think some bees will try to download the whole 1000 MB.
Back to top
java
Consumer
Joined: 07 Feb 2005
Posts: 736
Location: The Mexican Republic
21796.14 Points

Wed Mar 09, 2005 10:08 pm
Reply with quote

Quote:
java ......where are all the the fucking picproxie-doc


3base....I've been off line for a week hence was unable to respond to your inquiry.... as stated , we have tried to recover as much as we can of the Hive archives and the associated articles, this is what we have , and you're welcome to search and read to your heart's content .......I can't reply to your request since we have no direct access to the Hive Archives, se are simply trying to paste together what the contributing bee's have pull together, so don't be an ingrate and stop your unkind demands as no one owe's you anything........java
Back to top
Guest

0.00 Points

Thu Mar 10, 2005 11:31 am
Reply with quote

Yes, we don't really want want people downloading the archive,
As hypocritical as that may sound, we do, and we don't,
And we will have to look into this,

We want everyone to have access to it, but complete downloads, no, not yet, anyway,


Since we only have 50gb bandwidth per month,
This would kill us,

I would say, that public mirror will have to be set-up,

I am open for suggestions,.


syn,
Back to top
Polverone

Joined: 12 Feb 2005
Posts: 28
846.64 Points

Fri Mar 11, 2005 8:07 am
Reply with quote

The plain HTML takes up much less space than all the picproxie stuff, right? Distribution of the HTML as a zip or rar file from the web site might be acceptable with 50 GB a month, depending on how many people want it. You might even be able to get away with distributing it in one or two pieces using rapidshare.de.

The much more scalable solution, which could easily distribute everything and not just the html, is to set up a .torrent for it. The efficiency will depend on people who download staying online long enough to share, of course.

Another possible solution: create a new gmail account, mail a multipart .rar archive of everything to this account as multiple attachments, then share the password to the account here on the forum. As long as nobody decides to be a smartass and delete the files, people should then be able to rapidly download the files from the shared account. If you don't want google to have a clue about file contents, encrypt all the attachments with a freshly generated public key and share the private key here on the forum along with the login information.
Back to top
jackoozzi
specialist
Joined: 10 Feb 2005
Posts: 135
Location: Australia
39384.40 Points

Fri Mar 11, 2005 8:48 am
Reply with quote

yousendit.com is probably the best option for this sort of thing i think each link is limited to 25 downloads or 7 days but then you upload it again and change the link and it will take files up to 1gb

http://s23.yousendit.com/
Back to top
nubee
Master Archiver
Joined: 18 Feb 2005
Posts: 215
Location: homeless
18648.26 Points

Fri Mar 11, 2005 10:40 am
Reply with quote

i started a httrack last night and got 30MB through it on a dialup...

when i zipped it up it was like 3-6 mb !!

and of coarse if was plain text even smaller,

we need a packaged dl for sure as it makes an important reference to have available without having to get entangled in the whole loss of anonymoty involved in connecting to an internet account each time you want to check something out.... Wink
Back to top
mind

Joined: 10 Feb 2005
Posts: 39
362.48 Points

Fri Mar 11, 2005 3:37 pm
Reply with quote

what about a torrent?
ofcourse paranoid people should not connect..

or a more secure filesharing program like mute or filetopia
Back to top
IndoleAmine
Dreamreader Deluxe
Joined: 09 Feb 2005
Posts: 681
Location: Bahamas
18717.10 Points

Fri Mar 11, 2005 6:55 pm
Reply with quote

No, torrent isn't a good idea.

Quote:
The plain HTML takes up much less space than all the picproxie stuff, right? Distribution of the HTML as a zip or rar file from the web site might be acceptable with 50 GB a month, depending on how many people want it. You might even be able to get away with distributing it in one or two pieces using rapidshare.de.



Thx for your help, but - what are you talking about, text file being even smaller? Do you have some people at hand to retype all that PDF stuff into txt? Laughing Or are you planning to OCR all files and correct the errors by hand?? Shocked

..PDF's are maybe 5-10% smaller when zipped at "best compression method"... Sad
(maybe some PDF-specific compression could be used without much quality loss, but then again every single file has to be converted..)


i_a
Back to top
Polverone

Joined: 12 Feb 2005
Posts: 28
846.64 Points

Sat Mar 12, 2005 8:36 am
Reply with quote

The hive HTML is much smaller than HTML + all images and documents. Distributing a compressed archive of the HTML alone might be possible even without creative solutions.

Actually, I believe that there is a way to easily shrink a lot of the Rhodium/Hive PDFs, now that I think of it. Most journal articles (from JACS and other sources) are stored as PDFs with OCR text underneath bitonal page images. All JACS archives and other journals that I know of use only G4 compression for these bitonal images. JBIG2 compression can do considerably better, like 1/2 to 1/4 the size, and of course the images take up most of the space in the files. The Xerox "Silx" PDF compressor is a command line tool that will convert bitonal images in PDFs to JBIG2 and leave everything else alone. A cracked version of this compressor has been available for a while. It can be provided if you can't find it on your own.

Then you just apply the tool to all of your PDFs. Under Linux, you might do it something like:

for k in *.pdf;do Silx $k smaller.pdf; mv smaller.pdf $k;done

You could do something similar with a batch file in a command shell with windows.

This will compress all images that can be compressed and leave other images alone, for all pdf files in the directory. I think this would yield quite a bit of space-savings.
Back to top
Guest

0.00 Points

Sat Mar 12, 2005 4:41 pm
Reply with quote

polverone,

I like the Gmail idea,
Especially because of it fast bandwidth,
You can get a good 50k/s from them,

I have seen those webhosting places, that will host it, but make you wait 20s while on a page of adds,
We've just insallled the imageshack file upload function, that stores linked images for free, off our server, and it came with php intergration code,
How sweet, except it had a few bugs in the php code,
But since Nazlfrag, our coder was near, it was quickly repaired,

With the Gmail, it would be a real trust thing, but I think that would work here, to regular uses etc,

This is a good chance for admins of similar board themes to discuss board related problems,

We have noticed a dedicated server host with 120gb space and 500gb bandwidth for $60US, that's pretty cheap,


syn
Back to top
Polverone

Joined: 12 Feb 2005
Posts: 28
846.64 Points

Mon Mar 14, 2005 12:10 pm
Reply with quote

I noticed that the posts stored in the Hive archive still have <meta name="robots" content="noindex,nofollow"> near the top of each thread. This means that Google and other mainstream search engines will never index these files. If you want them to be more accessible and searchable, you need to remove these directives from the HTML files. It's just a big search-and-replace across all the HTML.
Back to top
nubee
Master Archiver
Joined: 18 Feb 2005
Posts: 215
Location: homeless
18648.26 Points

Mon Mar 14, 2005 12:25 pm
Reply with quote

let me admit i downloaded the hive files with htttrack:

its over 8,500 files and 260+mb, but when zipped its only 88mb,

im currently creating an offline search page/index for it then it done...
Back to top
IndoleAmine
Dreamreader Deluxe
Joined: 09 Feb 2005
Posts: 681
Location: Bahamas
18717.10 Points

Mon Mar 14, 2005 12:45 pm
Reply with quote

nubee: Good work! Cool Thanks for taking the initiative here.
For the search page, just simply search&replace all paths with the appropriate one of your offline archive, like
http://12.162.180.114:90/synthetika/hiveboard/methods/000170447.html

with

file://archive/methods/000170447.html

..but I think you know that, right? Very Happy


Polverone: thanks for your help, I was thinking of something like the G4 compression, but the Silx tool and JBIG2 look considerably better of course - sadly I am not familiar with unix shell commands.. Neutral (but I'm sure one of our admins or our coder nazlfrag has?)


i_a
Back to top
nubee
Master Archiver
Joined: 18 Feb 2005
Posts: 215
Location: homeless
18648.26 Points

Mon Mar 14, 2005 1:15 pm
Reply with quote

i didnt know that, ill have a look, where do i find the reference to change in what file ???
Back to top
Display posts from previous:   
Post new topic   Reply to topic    synthetikal.com Forum Index -> The Hive Files All times are GMT + 5.5 Hours
Goto page Previous1, 2, 3, Next
Page 2 of 3

 



Powered by phpBB 2.0.11 © 2001, 2002 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron