synthetikal.com Forum Index


Document Management System?
Page 1 of 1
Post new topic   Reply to topic    synthetikal.com Forum Index -> Ethos
Author Message
CherrieBaby
chouchou
Joined: 01 Mar 2005
Posts: 67
3070.02 Points

Tue Apr 12, 2005 8:55 am
Reply with quote

I don't know which forum to put this in.

One of the discussions at the Hive related to how Rhodium should store his site. One problem Rhodium faced was the size and complexity of the site. Many files related to more than one area of chemistry and the simple use of an index to classify documents under one system was not a perfect solution.

Another problem was that many documents were images and such images were completely invisible to a text-based
search.

I've recently come across similar problems with my own modest archive of articles.

The most rational solution to this is to use a database to store information about each document.

This is my first trial solution.

size - filesize in bytes
filename - as stored on disk
journal - abbreviated journal name
volume - (of journal)
page - first page of article
year - year published
title - article title
author - authors names.
location - directory where it is stored

These are two fields that can also be added (later):
abstract
keywords

What do people think about storing PDFs and DjVu files with a database to help look up information? It would probably be a lot of work at first and even when up still quite a bit of work to update. The other problems are that many files are available in more than one form (and we may want to keep multiple copies). Eg. a) DjVu and a PDF version. b) text (OCRed) and original image version. c) compact (small) and large file versions.

The solution to that would be to have one database table for each file and a separate table for each article with a one-to-many relationship between tables.

The new (file) table would have 6 fields:

filekey - not strictly needed (this is just an a numerical unique key)
size - filesize in bytes
filename - as stored on disk
location - directory on drive where it is stored
jounalkey - foreign database key
format (PDF/DjVu/JPEG/text/HTML/etc)

The journal table is now:

jounalkey - numerical unique key
journal - abbreviated journal name (eg. JOC, JACS, TL)
volume - (of journal)
page - first page number of article
year - year published
title - article title
author - authors names.

4 optional fields for the above table are:
abstract
keywords
number of pages
last page number of article

And ideas about design improvements from the database designers out there?

It's because this seems such a useful way to organise (and find) information that I'm suggesting synthetikal add such a system.
Back to top
java
Consumer
Joined: 07 Feb 2005
Posts: 736
Location: The Mexican Republic
21794.14 Points

Tue Apr 12, 2005 9:53 am
Reply with quote

I agree that we need some type of system to file the journal archives in some type of order. In my own files I have a hard time in filing them by title and journal , or by what the article is about. Too many variables even rhodium had a hard time as I go through his pdf archives I almost have to open each one to gather the flow of the article. The listing by journal make no sense by itself since it tells you nothing.

I though of picking up one of those free forum software and using it as a document manager since it comes with a search function , but I haven't tried it. I think the reference section should have a better method of indexing the journals and books ...but until we can come up with something we can at least try to keep archiving the journal articles used throughout the forum and the one's asked to be found. I have hundreds of journal citations but can't really tell what there all about because of the said problem. So I create folders for hydrogenations, catalysts, reduction , chlorinations, ....and so on.

Let's hope something is developed as I agree with you CherrieBaby in your assesment and solution......java
Back to top
nubee
Master Archiver
Joined: 18 Feb 2005
Posts: 213
Location: homeless
18579.78 Points

Tue Apr 12, 2005 10:18 am
Reply with quote

that afsearch prog ive been using has support for indexing pdf's but i guess that depends on the journal having working OCR...

but it's not really detailed enough...

Rolling Eyes

im, not sure if they're really any easy answer...
Back to top
n00dle

Joined: 03 Apr 2005
Posts: 21
743.88 Points

Tue Apr 12, 2005 3:04 pm
Reply with quote

As a punter SWIM most often found digests to particular compounds the most useful as all information was located in one central thread.

perhaps we could expand on this idea as the digests were only for stuff like p2p or mdp2p but we could extend it to include whatever, PAA, indole, etc. etc.
Back to top
Guest

0.00 Points

Wed Apr 13, 2005 7:57 am
Reply with quote

This is a very important subject

Our thinking is that we html everything, pdf's as well, and make one super indexed document, that can be searched,

This is not at all a big job, as we have thought about this often,
we've done it to the hiveboard, and to a bit of rhod's pdf's

We spent a bit of time seeing what would happen if we html'd all of rhodiums site,
And it wasn't too bad, we had a space saving of around 60% which is around 300mb of searchable html,

Keep up this interest, and we will do this faster, if everyone thinks this is a good idea,

That way, we can have one search engine, that searches through everything that we have ever written, including this forum

Were we stand at the moment is that we are half way through putting up the old hiveboard texts, and then we will attempt to turn this forum, into a html doc,
We are having a few problems with this conversion, but we will get there in the end, and hopefully reach this Grand Searchable Database

The GSD, must be done,

syn
Back to top
CherrieBaby
chouchou
Joined: 01 Mar 2005
Posts: 67
3070.02 Points

Wed Apr 13, 2005 11:58 am
Reply with quote

synthetika wrote:
This is a very important subject

Our thinking is that we html everything, pdf's as well, and make one super indexed document, that can be searched

syn


I can see the logic behind this but I personally disagree with it. For instance, I now have so many documents on my own PC that it takes ages to search them. The net result being that I prefer to look for something first by directory based on where I think it is and then only search for it as a last resort. I prefer the DBMS approach - it will always be faster. I don't think, for a moment, that I'll change your minds but the site will grind to a halt if you try to implement the HTML everything including the database policy.

I believe it's more important to catalog journals and patents and key threads than it is to just catalog discussion. A log of discussion at the hive for instance was based on instinct or guesswork and some discussion was just people patting other's on the back - OK for a forum - but just noise when you're searching archives ages later.

Maybe we should start by cataloging the Rhodium PDF and DjVu files to give them abstracts when they are only images? (unless you can find someone to OCR them).

I don't understand the idea of a GSD? Do you mean database as "repository of data" or database as it is meant in computing (with a DBMS)? I think only a DBMS will be fast enough. You have to assume that this site will grow to be at least the size of the hive. A web-site has to be scalable.
Back to top
primathon
modified
Joined: 23 Mar 2005
Posts: 190
Location: Unknown
98616.26 Points

Sun Apr 24, 2005 11:46 am
Reply with quote

[outdated information]

Last edited by primathon on Thu Jul 07, 2005 1:08 am; edited 1 time in total
Back to top
Polverone

Joined: 12 Feb 2005
Posts: 28
846.64 Points

Tue Apr 26, 2005 10:18 am
Reply with quote

Set up a Wiki to index articles and ebooks. A wiki is collaborative content creation-and-management software. Imagine a mini-Wikipedia just describing and organizing documents and ebooks of interest. You could leave access public, or make it private to some degree: hide it behind a password-protected page, or just control who's allowed to edit it. This would let multiple people work to build a well-organized, well-described corpus of documents that can be fulltext searched at any time, just like a regular web page. The tricky part might be insuring availability of the documents themselves. It's probably too bandwidth-intensive and legally risky to host all of the documents on a public web server. RapidShare and the like really aren't reliable for long-term storage. Perhaps the actual documents/ebooks could live on a private FTP server, and the Wiki would just say where documents can be found in the FTP server's directory hierarchy.
Back to top
PSY420

Joined: 30 Jun 2005
Posts: 30
1174.16 Points

Sun Jul 03, 2005 2:38 pm
Reply with quote

Yeah, a wiki is exactly what we need here, I have been thinking along those lines for quite a while now.

Look at what we're discussing here all the time : compounds and reactions from one chemical to another. There is a list of maybe 300 compounds that are of high value as endproducts around here, a hand full of them of very high value as seen by the amount of discussion regarding them. The problem I see with all that discussion is the missing links : reaction steps with the resp. references are posted, but only the refs are kept as they are regarded as highly valuable(which they are indeed as long as they are linked to a reaction). Over the years many of us, me included, have grown a nice collection of articles, but if you look at them somehow they've lost their magic, compared to the day they were put into the archive. This is the moment you realize that the reactions these articles belong to are scattered all over this forum, other forums and further parts of the net.

If you ask me the wiki that brings back that magic should be based on linked substance-monographs, where the first part of the page would give chemical, physical and statistical informations about the molecuel (picture of structure, CAS + Beilstein #s, mp, bp, toxicity, etc...) then there's the list of links to precursors followed by all the reactions to higher compounds. That way you could easily navigate through 3-dimensional reaction schemes which, if everything is nicely pictured, can add enormously to the understanding of 'the bigger picture'.

I think the ASCII-text of the experimental details relevant for the reaction in question would be enough, in combination with the ref to the article it came from, of course. I'm no copyright-attorney but I think that would'nt qualify as cr-infringement (reproduction in part for personal, educational, scientific use... that's still free, isn't it ??)

If there's articles that deal with a topic we're all interested in (like reductive amination) but speak of none of the substances we'd like to see (it's only some strangely branched heterocles no MDP2P they reduce) these articles should be grouped and presented under a topic heading, just as in Rh's archive. But the main load of refs *do* deal with exactly those compounds we want them to so a pictured and linked substance/reaction-based wiki would bring the biggest part of anyones article-database back to life and use.

It would also put an end to those damn limited 2 dimensional overviews that are never complete; here's my best example, built some years ago : )



A wiki is quickley built if a hand full of people works on it, every 10 year old can draw molecule-structures with ISIS-draw, reaction and substance information only has to be copied&pasted. But before I keep on jabbering I'll better stop and ask for your thoughts about this.

--PSY420--
Back to top
primathon
modified
Joined: 23 Mar 2005
Posts: 190
Location: Unknown
98616.26 Points

Mon Jul 04, 2005 12:03 am
Reply with quote

Notice: There's a Wiki on the way...
Back to top
loki
guinea pig
Joined: 09 Mar 2005
Posts: 391
14167.88 Points

Mon Aug 29, 2005 1:29 am
Reply with quote

this is highly amusing to find this discussion of a wiki...

i am a moderator at dmt world and we started putting this 'tek section' thing in with how-to's but it was kinda shoe-horned into the interface of the forum software. i was just saying just now 'what we need here is a wiki and then people can adopt pages to maintain and choose and adopt other people to help and add to things...'

like psy says, wiki's capitalise on the fact that people tend to have a bee in their bonnet about something or other and if you let them they will do it for you... the entirety of rhodium could be substituted with a wiki, people could convert pdfs and djvu documents for references into wiki pages, meaning the data is all in one system, searchable, user-modifiable... much easier to manage than having people run their own subsections too, and it automatically takes care of making the look of the site uniform.

by the way psy420: that chart there is a work of art Very Happy

choice selections from the hive archives and so forth could be worked into it as well... and it should be possible to have a way for the whole system to be archived by a user.

i know that i'd be happy to be given the job of working up a heap of things onto the wiki, i love doing that sort of thing Very Happy
Back to top
primathon
modified
Joined: 23 Mar 2005
Posts: 190
Location: Unknown
98616.26 Points

Mon Aug 29, 2005 8:26 am
Reply with quote

Glad to hear it. Unfortunately, I can't set this up without Syn's access credentials, and the offshore hosting I was looking into seems to run ~$1,000/yr., which is exactly the kind of money that I just don't have. Right now...
Back to top
Display posts from previous:   
Post new topic   Reply to topic    synthetikal.com Forum Index -> Ethos All times are GMT + 5.5 Hours
Page 1 of 1

 



Powered by phpBB 2.0.11 © 2001, 2002 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron