Log in

View Full Version : Looking for find and copy software.


megalomania
July 26th, 2006, 08:53 PM
There are plenty of “find and replace” applications out there, I have a few myself. I want a program that does something similar, but instead of REPLACING and search string, I want it to COPY that search string and preferably load said string into a list.

I have found only one program that does exactly what I want, give it a search term, and it copies the ENTIRE LINE of text from that document. Since I know what my search string starts with, but not necessarily what it ends with, I want the entire line.

The name of that program is “Extract Data & Text in Multiple Files.” Unfortunately, this program is only a crippled demo. It can extract the lines I need, but it will not export the results as a list unless it is registered. I have not found a crack or serial for this app. It just came out on June 6, 2006, so it may be a bit new yet.

To explain what I am doing, I have approximately 10,000 archived html documents. Almost all of them contain a rapidshare link. I want that rapidshare link. Obviously I want to search for “rapidshare.de” as my term, but I do not know what the rest of the string will be, so I want to copy the entire line. Every mention of rapidshare.de is on one line in these documents.

I have tried several data mining and extraction apps, but none do it as simply or as effectively as Extract Data & Text in Multiple Files. They either don’t do multiple documents in batch, or they don’t extract a line, or they don’t export a list of what I want to extract.

I would think this would be a simple app to have, but I cannot find anything that can do this. Please, if anyone can help me by suggesting a working app, or if anyone has a “fix” for Extract Data & Text in Multiple Files, I would greatly appreciate it.

furdog
July 26th, 2006, 09:23 PM
I just asked a freind who works on main frame software to look for a program, he thinks he understands what your doing is like a data miner program .I hope what your looking for is a way to download data files in a list to extract them am I correct?

megalomania
July 26th, 2006, 10:12 PM
Yes, what I am doing is data mining. I am looking for a particular line of text from thousands of documents that I then want copied into a single document.

furdog
July 26th, 2006, 10:52 PM
But do you wish to also export them? I assume your wanting to file share themor copy ect..... hes looking into it on there system for a program they may have let ya no!

Anira
July 26th, 2006, 10:58 PM
Use this serial: 11089567

akinrog
July 26th, 2006, 11:32 PM
To explain what I am doing, I have approximately 10,000 archived html documents. Almost all of them contain a rapidshare link. I want that rapidshare link. Obviously I want to search for “rapidshare.de” as my term, but I do not know what the rest of the string will be, so I want to copy the entire line. Every mention of rapidshare.de is on one line in these documents.

Sir,
If you are trying to extract rapidshare links, rapidget application that rapidshare provides does that. When you open the application, you should click add files, in the dialog box shown, you should click "text" button at the top right. Now open the HTML file and show its source and select and copy block by block the source code and paste into the box entitled enter text / html source.

After that click grab links, and application extracts the links. I know, it's quite crude but useful for getting links. I'm using this method since I hate rightclicking and click on copy shortcut and most importantly the links shorthened in the browser. Instead I open the source and have rapidget to grab the links.
HTH.

FUTI
July 27th, 2006, 11:02 AM
Mega did you consider using Copernic desktop search engine for the task you mentioned? It will find requested string wherever it is on your computer...or in the folders you said him to look upon. Extraction and manipulation must be manuall AFAIK, but I guess it can't be to hard to write some kind of a script that will do the manuall job...

megalomania
July 30th, 2006, 01:31 PM
Hmm, I have Copernic desktop search, and I did not know it could do that. Thanks to the key provided by Anira, I was able to get the “Extract Data & Text in Multiple Files” program to work properly.

Now in a few WEEKS I may be able to download the nearly 100GB worth of books I have rapidshare links for :(

Third_Rail
July 31st, 2006, 01:33 AM
Slighty OT, but how the heck do you find rapidshare links? I know there's no way from that main site to search the database. I wonder if the computer programs that I'm looking for (not available through torrents) would be somewhere in there.

nbk2000
July 31st, 2006, 07:52 PM
Google is your friend. :)

It pisses me off too that Rapidshare doesn't have a search function, but what can you do?

At least megaupload offers searching for their premium members.

Bugger
July 31st, 2006, 08:31 PM
You can try the site http://www.rapidshared.org , which lists rapidshare uploads available to the public. Their database of links can be searched by subject. I think that they find them by periodically searching for the word "rapidshare.de" (in quotation marks) on Google, This brings up results for rapidshare.de links published on various forum websites, Yahoo groups, etc., although not links which have been shared only privately. In the search string on Google, one can add subject-related words and terms like chemistry, physics, Windows, Unix, organic, azide, nitrate, SMG, explosive, drug, etc..

To Megalomania: How about posting your 10,000 rapidshare.de links here, or at least those not so far published? They could be all collected in a TXT file, and this put into a ZIP archive and posted as an attachment.

nbk2000
July 31st, 2006, 10:42 PM
First he'll download them, before sharing them publicly, as it seems there are untermensch in this world that'll download a file, then rat it out to Rapishare, to keep others from benefiting from it. :mad:

No need to help them out by giving them a huge list that they can simply e-mail out, eh? ;)

Bugger
August 1st, 2006, 02:59 AM
In that case, Megalomania could send the list by email or PM privately to all roguesci.org members, or at least those who have made posts, please.

FUTI
August 1st, 2006, 07:35 AM
Mega I tested will Copernic desktop search engine find files containing specific string and if you make search for html documents containing "www.r as search string for example it works. I will try to do the other part of task you look for in a couple days (if temperature here drops to the limit my internal CPU doesn't freeze-out refusing the task). I will try to test offline browser in a totaly unrelated experiment, so I hope that this experiment doesn't come little late.

midtown
August 1st, 2006, 03:24 PM
Another idea, if that doesn't work, is to use the command:

cat [filenames] | grep [search word] > [output file]

If all the files are in the same directory, then you can just use an asterisk to search all of them.

Pb1
August 6th, 2006, 12:30 AM
This would be a pain in the ass for large, multidirectory html archives. You can pass all the files in a folder with a single * on the command line, but if you have thousands of folders with one web page each, you aren’t gonnna get very far with this. This would only work if all the html files were in a small number of folders.

This is assuming, of course, that you can run linux in the first place.

In my opinion, the best way to do this is to learn perl (O’Reilley’s ebook on this is superb and on several FTPs) and write a script to go through your files and extract the necessary information. The advantage of this is that you have total control of what the script does (what kinds of expressions it extracts, etc.) The disadvantage is that you have to learn perl. An easy language, but not for those who have no programming experience.

megalomania
August 13th, 2006, 06:36 PM
Oh gods! It has been nearly 3 weeks and I have 50GB of science related books. I still have thousands more to go :( I'll release the big list all right, as soon as I am done with my bookz orgy.

ravn
November 5th, 2006, 04:28 PM
This is a little late but possibly useful for others attempting to do the same thing.

The base install of cygwin (available at www.cygwin.com) is your best friend in this case. Once you have this installed, run a Cygwin bash shell, and cd to the top level directory where all your files are. For example:

cd /cygdrive/c/rapidfiles

This assumes you have created a directory on your c: drive called rapidfiles, and placed your 10000 files there. (sub-directories are fine)

From this directory enter the following command:

grep -R -h rapidshare\.de * > links.txt

This will Recurse all subdirectories, suppress file names, and find the string rapidshare.de spitting out all lines with that string in them to a file in the current directory called links.txt.

Nihilist
November 8th, 2006, 02:57 AM
http://download.yousendit.com/CC618A10556C20A2

Enjoy :). Just replace your .exe with that one. No worries about virii and the like - cracked it myself.

electricdetonator
November 26th, 2006, 10:28 AM
Why not simply using a XML parser with appropiate rules ?

You're looking for HTML Tags like:

<a href="http://www.rapidshare....>Sometext</a>


Just take any free XML parser and extend the rule for finding the href into finding href="http://www.rapidshare* and you'll get all HTML tags containing the full links ;)

Just google for XML freeware parser and you'll find a bunch of it ;)

BTW it's much more flexible then any other search engine, cause you can make special rules for it.

Hope could help.