Update: Sorry, link is going up and down. Worth trying, but will try to find a more stable option when time cycles free up.
This past week I decided to cook up a service (link in bold near the middle of this post) I feel will greatly assist users in developing advanced Google Custom Search Engines (CSE’s). I read through the Co-op discussion posts, digg/blog comments, reviews, emails, etc. and learned many of our users are fascinated by the refinements feature – in particular, building search engines that produce results like this:
‘linear regression” on my Machine Learning Search Engine
… but unfortunately, many do not know how to do this nor understand/want to hack up the XML. Additionally, I think it’s fair to say many users interested in building advanced CSE’s have already done similar site tagging/bookmarking through services like del.icio.us. del.icio.us really is great. Here are a couple of reasons why people should (and do) use del.icio.us:
- It’s simple and clean
- You can multi-tag a site quickly (comma separated field; don’t have to keep reopening the bookmarklet like with Google’s)
- You can create new tags on the fly (don’t choose the labels from a fixed drop-down like with Google’s)
- The bookmarklet provides auto-complete tag suggestions; shows you the popular tags others have used for that current site
- Can have bundles (two level tag hierarchies)
- Can see who else has bookmarked the site (can also view their comments); builds a user community
- Generates a public page serving all your bookmarks
Understandably, we received several requests to support del.icio.us bookmark importing. My part-time role with Google just ended last Friday, so, as a non-Googler, I decided to build this project. Initially, I was planning to write a simple service to convert del.icio.us bookmarks into CSE annotations – and that’s it – but realized, as I learned more about del.icio.us, that there were several additional features I could develop that would make our users’ lives even easier. Instead of just generating the annotations, I decided to also generate the CSE contexts as well.
Ok, enough talk, here’s the final product:
http://basundi.com:8000/login.html
If you don’t have a del.icio.us account, and just want to see how it works, then shoot me an email (check the bottom of the Bio page) and I’ll send you a dummy account to play with (can’t publicize it or else people might spam it or change the password).
Here’s a quick feature list:
- Can build a full search engine (like the machine learning one above) in two steps, without having to edit any XML, and in less than two minutes
- Auto-generates the CSE annotations XML from your del.icio.us bookmarks and tags
- Provides an option to auto-generate CSE annotations just for del.icio.us bookmarks that have a particular tag
- Provides an option to Auto-calculate each annotation’s boost score (log normalizes over the max # of Others per bookmark)
- Provides an option to Auto-expand links (appends a wildcard * to any links that point to a directory)
- Auto-generates the CSE context XML
- Auto-generates facet titles
- Since there’s a four facet by five labels restriction (that’s the max that one can fit in the refinements display on the search results page), I provide two options for automatic facet/refinement generation:
- The first uses a machine learning algorithm to find the four most frequent disjoint 5-item-sets (based on the # of del.icio.us tag co-occurrences; it then does query-expansion over the tag sets to determine good facet titles)
- The other option returns the user’s most popular del.ico.us bundles and corresponding tags
- Any refinements that do not make it in the top 4 facets are dumped in a fifth facet in order of popularity. If you don’t understand this then don’t worry, you don’t need to! The point is all of this is automated for you (just use the default Cluster option). If you want control over which refinements/facets get displayed, then just choose Bundle.
- Provides help documentation links at key steps
- And best of all … You don’t need to understand the advanced options of Google CSE/Co-op to build an advanced CSE! This seriously does all the hard, tedious work for you!
In my opinion, there’s no question that this is the easiest way to make a fancy search engine. If I make any future examples I’m using this – I can simply use del.icio.us, sign-in to this service, and voila I have a search engine with facets and multi-label support.
Please note that this tool is not officially endorsed by nor affiliated with Google or Yahoo! It was just something I wanted to work on for fun that I think will benefit many users (including myself). Also, send your feedback/issues/bugs to me or post them on this blog.
January 3, 2007 at 2:02 pm |
[...] Singh, who used to work at Google, has created a really amazing tool that lets you create a highly Google Custom Search Engine without knowing how [...]
January 5, 2007 at 12:57 am |
Is this like http://sandbox.sourcelabs.com/kibbutz/generate.php ?
January 5, 2007 at 1:04 am |
Yea I saw that when researching this. The main difference is the tool that I provide actually works
This one looks like a UI prototype or something – try a bad delicious login and it works, it doesn’t even ask for a CSE ID (how does it know where to upload to?), nor does it show you the annotations output. My tool gives you the output for the annotations, plus uses some AI tricks to also autogenerate the context.
January 5, 2007 at 4:49 am |
Thanks, nice to see this.
I visited eagerly but i stuck in this err
http://idlivada.vpsland.com:8000/coop_delicious.py/design?user=webarmi&passw=?????????&cse=_cse_1_zcj0f6hhk&title=My%20Delicious&keywords=java&descr=My%20Delicious&volunteers=false&groupby=c&subrlinks=true
here password is changed
January 5, 2007 at 5:36 am |
Hi, during the login process, Do you capture/record my login ID and password of del.icio.us?
January 5, 2007 at 5:38 am |
Thanks for the bug Riza. Try the link now it should work.
Thanks again.
* I wasn’t UTF-8 encoding the tag strings in my title generation step. Haven’t seen tag names like that before but hey you have every right to
January 5, 2007 at 5:52 am |
Hi Babs – Technically it does get captured (GET request logs) since those values are passed as CGI parameters to the URL. I will not use/sell them. I plan to delete the logs periodically.
It’s tough to provide full encryption since I don’t have a SSL certificate. In the meantime, if you’re concerned about security/privacy, why not go to del.icio.us, change your password to some temporary value, run this wizard, then change your password back? The wizard takes like a minute to do. Later today I will (finally) encrypt passwords in all URL requests to provide a decent level of security.
I’m also wondering if I should release this as a desktop application. Would people prefer this?
January 5, 2007 at 7:39 am |
Please do me one favour, please my second comment and this one too.
I think u knew why i’m saying this
January 5, 2007 at 7:44 am |
Please do me one favour, please delete my second comment and this one too.
I think u knew why
Note:
I’ve to take at least one minute to check it before posting a comment cos no editing here . One new resolution for this year to follow.
January 5, 2007 at 8:02 am |
Hi Riza – I deleted your comment with the link that exposed your password. I’m keeping (and replying) to your latter comments so users know that they can directly email me such links in the future (check bottom of bio for contact information).
January 5, 2007 at 9:51 am |
Thanks Singh, here’s one more bug (I hope so
).
Well, as u knew my password got exposed in last post, so i changed my password to a strong one (which includes $ * and alpha numeric)
Here is the bug, my new password shows the err
GetXmlResponse Error: HTTP 401 Code: Bad user/password webarmi ?????????
Don’t worry i changed the password shown above with my real one while checking.
One more clue for u, my old password working fine now, i hope the prob is only with my new pass contains special characters
[Note: I double checked this comment b4 posting
]
January 5, 2007 at 10:20 am |
Hi!
Eager to test something new to boost my deli.cio.us account, but…
When I try the tool I only get an error message “Failed to connect to Yahoo! Search” when I try to generate content XMl on step 2.
Generating the annotation xml only generates an empty file…
Perhaps I didn’t figure everything out;)
January 5, 2007 at 10:26 am |
Riyaz – Done. I wasn’t escaping the password before.
IP – In the process of doing this fix I may have messed up some connection settings. If it’s still not working for you then shoot me an email (you can find my contact info in the bio).
Really appreciate the feedback.
January 5, 2007 at 1:13 pm |
Just updated the service so all post login requests encrypt the password parameter. The server rekeys every 30 minutes which should provide ample time for a user to generate his/her XML. If the login does not work, it most likely happened due to the key expiration, so just try re-logging in. If all else fails, just post the issue here or email me. Thanks.
January 5, 2007 at 9:28 pm |
Just fixed a tag parsing issue. If you were getting extremely long label names that was due to a bug. Should be fixed now.
January 8, 2007 at 4:18 pm |
[...] search mashup that google-dexes your delicious bookmarks has been unleashed on the blogosphere. You can scroll down to see it in action with my own [...]
January 8, 2007 at 6:47 pm |
This blog was very interesting to read and I like your writing style. Nice blog!
January 8, 2007 at 8:15 pm |
[...] Google Co-op and del.icio.us!: build a full search engine without having to edit any XML, auto-generate the Custom Search Engines (CSE) annotations XML from del.icio.us bookmarks and tags « zooie’s blog (tags: engine google mashup del.icio.us search searchengine tagging xml) [...]
January 9, 2007 at 9:52 am |
Thanks for the awesome product Zooie. I have one question. Everytime I add a new delicious link, do I need to go thru this process again of creating the annonate.xml and upload.
If that is so, is there a way to simplify that ?
January 9, 2007 at 11:01 am |
My 3500+ account’s annotation file generated a “413- Your client issued a request that was too large.” error while loading it into coop. Can I break the file into separate bits and load them one by one.
Tom
January 9, 2007 at 2:48 pm |
[...] Vik’s Blog [...]
January 9, 2007 at 5:06 pm |
I tried this, and it only seems to have moved across a few urls, as shown by this output on the sites page:
http://www.longfocus.com/firefox/gmanager/* Firefox Extensions Google
Include all pages whose address contains this URL
Include just the specific page or URL pattern I have entered
http://www.awriterz.org/Fantasy/* Awriterz Fantasy
Include all pages whose address contains this URL
Include just the specific page or URL pattern I have entered
beautifulbeta.blogspot.com/2006/10/pullquotes-for-your-blog.html Article Blog Publishing
Include all pages whose address contains this URL
Include just the specific page or URL pattern I have entered
http://www.eusing.com/CDRipper/CDRipper.htm Computers Entertainment Music Software
Include all pages whose address contains this URL
Include just the specific page or URL pattern I have entered
wiki.rubyonrails.com/rails/pages/HowtoSetupApacheWithFastCGIAndRubyBindings Article Linux Ruby Research
All of the tags seem to be imported, but not most of the actual bookmarks…
January 9, 2007 at 6:21 pm |
Sanjay – For now yes. The delicious API does support an update (which will push only new links since the last call) so it’s definitely feasible. When time cycles free up I’ll add that in. Thanks for the feature suggestion.
Tom – Yeah there’s a limit on the XML file size being pushed back through the browser. Two solutions: (1) I can save the file on the server (but I’m reluctant to use server storage at the moment) (2) My wizard allows the user to generate annotations per delicious tag. Try that – so produce annotation files for your favorite delicious tags and just upload each one sequentially in the CSE.
Stephen – Did you check the Rank option? Or filter your bookmarks by a tag? The rank option most likely won’t do every bookmark due to the expensiveness of retrieving the Other counts (the delicious API really needs to expose these numbers in the posts/all call). If you didn’t do either, then shoot me an email (I have my contact info in my bio page).
January 9, 2007 at 11:34 pm |
[...] Google Co-op just got del.icio.us! « zooie’s blog Incorporate your delicious entries in your Google searches (tags: del.icio.us mashup searchengine tagging) [...]
January 10, 2007 at 1:37 am |
Hello,
Can you send me an account to play with?
Thank you,
Rod Guzzo
January 25, 2007 at 1:38 pm |
[...] Singh, V. 2007. Google Co-op just got del.icio.us!. Saatavilla www-muodossa: http://zooie.wordpress.com/2007/01/03/google-co-op-just-got-delicious/. [...]
January 27, 2007 at 4:37 pm |
Hey, Great tool, It would be nice if you could go into the xml creation a little thought … that will help further development of similar search engines…
January 31, 2007 at 1:08 am |
Great blog Vik! The material is interesting and smart and I enjoy following it. Continued success!
February 11, 2007 at 5:37 am |
[...] Also see how to use it to search your del.icio.us account. [...]
February 13, 2007 at 9:23 pm |
Hello,
Very interesting work. I wonder if you would send me an account so that I could play with the CSE. Thanks!
February 15, 2007 at 9:52 pm |
Great tool. Unfortunately I have lots of tagged URL’s and Google limits this to 2000. Any suggestions?
February 16, 2007 at 12:07 pm |
Don’t seem to be able to get your link http://basundi.com:8000/login.html to resolve… what’s going wrong?
February 16, 2007 at 12:18 pm |
Hi Matthew – It works for me. Hmm. Try again (refresh and clear the cache if necessary). It should be working.
February 21, 2007 at 3:25 pm |
Check this out http://www.googlepowersearch.com.
I created GooglePowerSearch so you can power search for Video, News, Maps, Images and more…
Google Power Search helps to unleash the built in power of Googles special features.
Using Google Power Search you are able to get better-targeted results.
Check out Google Power Search and let me know what you think.
Thanks
Steve
February 27, 2007 at 12:15 pm |
I am liking this idea
March 26, 2007 at 9:35 am |
yours is the second blog (or actually third maybe) I have ever bookmarked (yea I don’t use aggregators)
I already was like deleting my co-op account then I read through this post once and then TA-DA
http://taxa.search.googlepages.com/home
I even licensed it with same exact license as you had just to be sure.
but I think I have comitted at least a dozen of copyright infringements as well
I always get all giggly seeing the google labs logo but this just close to too exciting
so yea thanks for pointing out how it’s done and I haven’t even started with implementing that facets x labels thing which sounds great (probably first I have to make a del.icio.us account)
so yea this blog has been valuable content for me.
April 11, 2007 at 1:00 am |
[...] VideoGoogle GroupsGoogle MapsGoogle NewsHow to Get Detailed PPC Keyword Data from Google AnalyticsGoogle Co-op just got del.icio.us! « zooie’s blogGoogle Code – Updates: Four Google open source tools on Google CodeNo comments yet.RSS for comments [...]
April 17, 2007 at 9:07 pm |
Vik –
Great tool. How would I take one of my subscriptions and turn it into a CSE.
Let’s say that I’ve subscribed to the tag San Francisco, can I use your tool to take that subscribtion and generate web URLs that I can feed back into CSE?
April 18, 2007 at 4:04 am |
Hi Farhan – I would recommend looking into the OPML upload feature available in the Advanced tab of the CSE’s control panel. This will take OPML (and various RSS feed formats), extract its URL’s, and import them directly into the CSE. My tool currently just supports a user’s bookmarks available via del.icio.us’s API.
The other option (in case the OPML feature does not work) would be to regex out the URL’s and pump them into a flat file (each link new-line separated), then paste the links in the sites box (Sites tab).
April 18, 2007 at 5:01 am |
Hi VIk’s,
Ok, I give up, just gimme your dummy in my email
Thanks pal…
April 26, 2007 at 11:27 pm |
could this work with ma.gnolia bookmarks?
April 26, 2007 at 11:38 pm |
Yeah it’s possible if there’s an API or XML feeds available for retrieving the bookmarks. When time cycles free up I’ll look into that.
May 30, 2007 at 10:36 pm |
Throw me an account, this looks awesome. Was thinking of purpose-building an app to do same, but running with Google is even better. Any chance of getting a copy of this to run on my own server/alter?
May 31, 2007 at 12:24 pm |
I use delicius with my blog every day.
June 26, 2007 at 4:58 am |
[...] own collaboratively-created Google CSE or Swicki of favorite, subject-specific sites (or have a CSE generated from a del.icio.us account’s links). Librarians should seek to be familiar with technologies for finding and organizing online [...]
July 5, 2007 at 6:42 pm |
is this different with the google search for your domain thing that they offered long time back
July 27, 2007 at 10:11 pm |
Sundaize Search Engine
Sundaize Search Engine
July 29, 2007 at 1:17 pm |
I use delicious evary day for my bookmarks great tips
Thanks
August 4, 2007 at 6:26 pm |
Any news on the ma.gnolia integration?
August 4, 2007 at 11:55 pm |
Zooie, any update on the ma.gnolia integration?
August 4, 2007 at 11:55 pm |
oops, sorry for the double post
August 5, 2007 at 2:04 am |
Hey Danny – Sorry for the delay. Haven’t had a chance to get to it. You might want to look at the Google Custom Search site. They have a new feature called ‘Linked CSE’ – I think this might do what you want.
August 6, 2007 at 12:58 pm |
I looked at the linked CSE, your plugin would suite my needs much better. I’ll keep watching this space – crossing my fingers for ma.gnolia integration
August 18, 2007 at 3:46 am |
That is a very slick implementation. I have been looking into the custom search engine and more specifically their Ajax API. I created a javascript class using the prototype.js library to allow for a completely customizable ajax search. They now support binding an GWebSearch Object to a specific CSE.
http://positionabsolute.net/blog/2007/08/implement-custom-search.php
Cheers,
Matt
October 29, 2007 at 2:13 pm |
I can’t get your page to load! Dying to see what you’ve done here, but it keeps telling me the connection timed out. Any hints?
October 29, 2007 at 4:24 pm |
Hi EB – Sorry, I was running it on my friend’s server and I think I may have over-welcomed my stay
Let me see what I can do.
– Vik
November 25, 2007 at 4:23 am |
Any luck relocating this service? I’m eager to try it!
November 25, 2007 at 5:41 am |
Hi David – Not yet – Sorry! Anyone out there got a server available running apache/mod_python?
December 18, 2007 at 2:42 am |
Zooie, I am interested in helping you out – However, do you have any idea how much cpu usage and bandwidth does your app require?
December 19, 2007 at 10:01 am |
I stumbled across this page while searching for a way to search my Ma.gnolia bookmarks. I wasn’t able to find anything else, so I wrote my own little tool (http://nemti.awardspace.com/goo.gnolia/). It’s just a rather simple implementation of Google linked CSE. Yours sounds much more featureful – I hope you can find a host, and it would be great if you could add Ma.gnolia support.
February 18, 2008 at 9:02 pm |
I tried this, but when I uploaded my XML annotations and skeleton, I got an “error parsing XML at line 3″ message in both cases. Can you tell me what I did wrong? Thanks.
February 18, 2008 at 9:07 pm |
Hi Kristen – Good chance Google changed their XML formats since I developed this tool. Could you send me your XML (the one which produces the bug)? vik.singh [at gmail]. Thanks.
March 13, 2008 at 9:02 am |
As I see deligoo.com do the same thing with del.icio.us search, but you must install their plugin.
March 27, 2008 at 6:12 pm |
[...] zooie’s work I almost duplicate it besides Linked CSEs introduced. Then you need not to download and upload the annotations XML file and you can copy a piece of code to your page then get the cse toolbar. [...]
April 15, 2008 at 4:50 am |
[...] Importing del.icio.us bookmarks for CSE (aka Vik’s tool) [...]
April 28, 2008 at 11:15 am |
[...] Singh, V. 2007. Google Co-op just got del.icio.us!. Saatavilla www-muodossa: http://zooie.wordpress.com/2007/01/03/google-co-op-just-got-delicious/. [...]
June 30, 2008 at 8:48 pm |
hi,
we used google cse and delicous to generate results based on all delicious bookmarks and created refinements based on the tags. Check it out at http://www.scoofers.com
July 9, 2008 at 4:14 am |
Thx…
July 10, 2008 at 10:53 am |
[...] a CSE that searched over sites I have tagged in del.icio.us. I found a couple of examples of this. One wanted my delicious username and password. No thanks. The other, deligoo, looks good, but wanted me [...]
September 11, 2008 at 1:19 am |
here’s an easier way:
http://tips.dennyhalim.com/2008/09/google-cse-delicious-widgetbox-instant.html
October 8, 2008 at 1:05 pm |
Really good initiative.
I wonder if Google didn’t rework this with their makeannotation tool :
http://www.google.com/coop/docs/cse/tools.html
May 3, 2009 at 7:39 am |
hi…..is there a way to add your delicious account as subscribed linked in google search???
June 14, 2009 at 3:32 am |
Try deligoo
http://www.deligoo.com/en/