Jump To: Parent Description
I added a lot of new features to RADAR today. First, radar-web-search has been extended. This program originally allows for one to search for a topic on the net, say, "event extraction" after which it will build a large search for all the software on the net. It then searches Yahoo and looks at each page, to see if any software is linked:
andrewdo@box:/var/lib/myfrdcsa/codebases/internal/event-system/IE$ radar-web-search "event extraction"
QUERY: "event extraction" system OR java OR project OR library OR php OR web OR framework OR open OR manager OR linux OR engine OR net OR server OR management OR game OR tool OR tools OR client OR simple OR editor OR cms OR database OR\ file OR generator OR software OR network OR xml OR python OR based OR source OR plugin OR data OR amp OR language OR application OR control OR online OR toolkit OR interface OR 3d OR irc OR eclipse OR free OR api OR windows OR code OR \ os OR perl OR virtual OR development OR gui OR driver OR content OR module OR mail OR image OR suite OR player OR portal OR monitor OR platform OR simulator OR script OR object OR log OR media OR text OR easy OR browser OR search OR ser\ vice OR viewer OR de OR chat OR remote OR parser OR mysql OR time OR bot OR mobile OR converter OR sql OR daemon OR tracker OR rpg OR programming OR test OR gnu OR environment OR class OR utility OR gnome OR compiler OR internet OR 0 OR\ user OR utilities OR html OR package OR desktop
Result: #1 Url:http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15494078 http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15494078 Summary: With the explosion of molecular data, tools developed by computer scientists are ... BIND-The Biomolecular Interaction Network Database. Nucleic Acids Research. ... Title: PASBio: predicate-argument structures for event extraction in molecular ... $VAR1 = ; Result: #2 Url:http://nlp.cs.nyu.edu/info-extr/ http://nlp.cs.nyu.edu/info-extr/ Summary: This system combines a web crawler (which searches for reports of outbreaks on a ... engine, and a data base browser to examine the extracted events (Proteus Project ... Title: Proteus Project: Information Extraction $VAR1 = ;
So, that's what it does. But a problem it was having is it only looks one layer deep for tar.gz and zip files and the like. I wanted it to look further, but that would have been bandwidth and time expensive, searching all the links. So what I did was to download a dataset from:
which contained a large dataset of web links. I then rated the last dir or file of the url that linked to a set of files for how many "desireable" files were there.
i.e. in the above url it would be "database" 17.0028327481393 jars 40 9 211 14.3298883847943 download.html 116 168 498 12.9734278116825 edit.html 67 33 67 10.8830909612802 patches 8 2 368 10.4918735220202 Debug 38 16 42 10.0194507146122 golem 29 6 30 8.97441185481296 canaries13 20 0 20
So I added that and now it can speculatively search 1 extra ply. It has already helped to find some new software.
Secondly, I went ahead and added the ability to search within pdf and other documents and extract the URLs from them, so that research papers (which often link to systems, or at least name them) can be searched as well. This is a separate script that will integrate eventually with radar.