Google Co-op just got del.icio.us!

Update: Sorry, link is going up and down. Worth trying, but will try to find a more stable option when time cycles free up.

This past week I decided to cook up a service (link in bold near the middle of this post) I feel will greatly assist users in developing advanced Google Custom Search Engines (CSE’s). I read through the Co-op discussion posts, digg/blog comments, reviews, emails, etc. and learned many of our users are fascinated by the refinements feature – in particular, building search engines that produce results like this:

‘linear regression” on my Machine Learning Search Engine

… but unfortunately, many do not know how to do this nor understand/want to hack up the XML. Additionally, I think it’s fair to say many users interested in building advanced CSE’s have already done similar site tagging/bookmarking through services like del.icio.us. del.icio.us really is great. Here are a couple of reasons why people should (and do) use del.icio.us:

  • It’s simple and clean
  • You can multi-tag a site quickly (comma separated field; don’t have to keep reopening the bookmarklet like with Google’s)
  • You can create new tags on the fly (don’t choose the labels from a fixed drop-down like with Google’s)
  • The bookmarklet provides auto-complete tag suggestions; shows you the popular tags others have used for that current site
  • Can have bundles (two level tag hierarchies)
  • Can see who else has bookmarked the site (can also view their comments); builds a user community
  • Generates a public page serving all your bookmarks

Understandably, we received several requests to support del.icio.us bookmark importing. My part-time role with Google just ended last Friday, so, as a non-Googler, I decided to build this project. Initially, I was planning to write a simple service to convert del.icio.us bookmarks into CSE annotations – and that’s it – but realized, as I learned more about del.icio.us, that there were several additional features I could develop that would make our users’ lives even easier. Instead of just generating the annotations, I decided to also generate the CSE contexts as well.

Ok, enough talk, here’s the final product:
http://basundi.com:8000/login.html

If you don’t have a del.icio.us account, and just want to see how it works, then shoot me an email (check the bottom of the Bio page) and I’ll send you a dummy account to play with (can’t publicize it or else people might spam it or change the password).

Here’s a quick feature list:

  • Can build a full search engine (like the machine learning one above) in two steps, without having to edit any XML, and in less than two minutes
  • Auto-generates the CSE annotations XML from your del.icio.us bookmarks and tags
  • Provides an option to auto-generate CSE annotations just for del.icio.us bookmarks that have a particular tag
  • Provides an option to Auto-calculate each annotation’s boost score (log normalizes over the max # of Others per bookmark)
  • Provides an option to Auto-expand links (appends a wildcard * to any links that point to a directory)
  • Auto-generates the CSE context XML
  • Auto-generates facet titles
  • Since there’s a four facet by five labels restriction (that’s the max that one can fit in the refinements display on the search results page), I provide two options for automatic facet/refinement generation:
    • The first uses a machine learning algorithm to find the four most frequent disjoint 5-item-sets (based on the # of del.icio.us tag co-occurrences; it then does query-expansion over the tag sets to determine good facet titles)
    • The other option returns the user’s most popular del.ico.us bundles and corresponding tags
    • Any refinements that do not make it in the top 4 facets are dumped in a fifth facet in order of popularity. If you don’t understand this then don’t worry, you don’t need to! The point is all of this is automated for you (just use the default Cluster option). If you want control over which refinements/facets get displayed, then just choose Bundle.
  • Provides help documentation links at key steps
  • And best of all … You don’t need to understand the advanced options of Google CSE/Co-op to build an advanced CSE! This seriously does all the hard, tedious work for you!

In my opinion, there’s no question that this is the easiest way to make a fancy search engine. If I make any future examples I’m using this – I can simply use del.icio.us, sign-in to this service, and voila I have a search engine with facets and multi-label support.


Please note that this tool is not officially endorsed by nor affiliated with Google or Yahoo! It was just something I wanted to work on for fun that I think will benefit many users (including myself). Also, send your feedback/issues/bugs to me or post them on this blog.

About these ads

74 Comments

Filed under AI, Co-op, CS, CSE, Google, Machine Learning, Research, Tagging

74 responses to “Google Co-op just got del.icio.us!

  1. Pingback: » AutoGenerate A Google Custom Search Engine With del.icio.us » InsideGoogle » part of the Blog News Channel

  2. Yea I saw that when researching this. The main difference is the tool that I provide actually works :) This one looks like a UI prototype or something – try a bad delicious login and it works, it doesn’t even ask for a CSE ID (how does it know where to upload to?), nor does it show you the annotations output. My tool gives you the output for the annotations, plus uses some AI tricks to also autogenerate the context.

  3. Thanks, nice to see this.

    I visited eagerly but i stuck in this err

    http://idlivada.vpsland.com:8000/coop_delicious.py/design?user=webarmi&passw=?????????&cse=_cse_1_zcj0f6hhk&title=My%20Delicious&keywords=java&descr=My%20Delicious&volunteers=false&groupby=c&subrlinks=true

    here password is changed

  4. Babs

    Hi, during the login process, Do you capture/record my login ID and password of del.icio.us?

  5. Thanks for the bug Riza. Try the link now it should work.
    * I wasn’t UTF-8 encoding the tag strings in my title generation step. Haven’t seen tag names like that before but hey you have every right to :) Thanks again.

  6. Hi Babs – Technically it does get captured (GET request logs) since those values are passed as CGI parameters to the URL. I will not use/sell them. I plan to delete the logs periodically.

    It’s tough to provide full encryption since I don’t have a SSL certificate. In the meantime, if you’re concerned about security/privacy, why not go to del.icio.us, change your password to some temporary value, run this wizard, then change your password back? The wizard takes like a minute to do. Later today I will (finally) encrypt passwords in all URL requests to provide a decent level of security.

    I’m also wondering if I should release this as a desktop application. Would people prefer this?

  7. Please do me one favour, please my second comment and this one too.

    I think u knew why i’m saying this ;-)

  8. Please do me one favour, please delete my second comment and this one too.

    I think u knew why ;-)

    Note:
    I’ve to take at least one minute to check it before posting a comment cos no editing here . One new resolution for this year to follow. :-)

  9. Hi Riza – I deleted your comment with the link that exposed your password. I’m keeping (and replying) to your latter comments so users know that they can directly email me such links in the future (check bottom of bio for contact information).

  10. Thanks Singh, here’s one more bug (I hope so ;-)).

    Well, as u knew my password got exposed in last post, so i changed my password to a strong one (which includes $ * and alpha numeric)

    Here is the bug, my new password shows the err

    GetXmlResponse Error: HTTP 401 Code: Bad user/password webarmi ?????????

    Don’t worry i changed the password shown above with my real one while checking.

    One more clue for u, my old password working fine now, i hope the prob is only with my new pass contains special characters

    [Note: I double checked this comment b4 posting ;-) ]

  11. IP

    Hi!

    Eager to test something new to boost my deli.cio.us account, but…

    When I try the tool I only get an error message “Failed to connect to Yahoo! Search” when I try to generate content XMl on step 2.

    Generating the annotation xml only generates an empty file…

    Perhaps I didn’t figure everything out;)

  12. Riyaz – Done. I wasn’t escaping the password before.

    IP – In the process of doing this fix I may have messed up some connection settings. If it’s still not working for you then shoot me an email (you can find my contact info in the bio).

    Really appreciate the feedback.

  13. Just updated the service so all post login requests encrypt the password parameter. The server rekeys every 30 minutes which should provide ample time for a user to generate his/her XML. If the login does not work, it most likely happened due to the key expiration, so just try re-logging in. If all else fails, just post the issue here or email me. Thanks.

  14. Just fixed a tag parsing issue. If you were getting extremely long label names that was due to a bug. Should be fixed now.

  15. Pingback: del.icio.us arsblog - Aquatic Inference Engine

  16. This blog was very interesting to read and I like your writing style. Nice blog!

  17. Pingback: links for 2007-01-08 at Metaverse Territories

  18. Thanks for the awesome product Zooie. I have one question. Everytime I add a new delicious link, do I need to go thru this process again of creating the annonate.xml and upload.
    If that is so, is there a way to simplify that ?

  19. tom

    My 3500+ account’s annotation file generated a “413- Your client issued a request that was too large.” error while loading it into coop. Can I break the file into separate bits and load them one by one.

    Tom

  20. Pingback: Generate A Google Custom Search Engine to Search Your del.icio.us Bookmarks » D’ Technology Weblog — Technology, Blogging, Gadgets, Fashion, Life Style.

  21. I tried this, and it only seems to have moved across a few urls, as shown by this output on the sites page:

    http://www.longfocus.com/firefox/gmanager/* Firefox Extensions Google

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    http://www.awriterz.org/Fantasy/* Awriterz Fantasy

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    beautifulbeta.blogspot.com/2006/10/pullquotes-for-your-blog.html Article Blog Publishing

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    http://www.eusing.com/CDRipper/CDRipper.htm Computers Entertainment Music Software

    Include all pages whose address contains this URL
    Include just the specific page or URL pattern I have entered

    wiki.rubyonrails.com/rails/pages/HowtoSetupApacheWithFastCGIAndRubyBindings Article Linux Ruby Research

    All of the tags seem to be imported, but not most of the actual bookmarks…

  22. Sanjay – For now yes. The delicious API does support an update (which will push only new links since the last call) so it’s definitely feasible. When time cycles free up I’ll add that in. Thanks for the feature suggestion.

    Tom – Yeah there’s a limit on the XML file size being pushed back through the browser. Two solutions: (1) I can save the file on the server (but I’m reluctant to use server storage at the moment) (2) My wizard allows the user to generate annotations per delicious tag. Try that – so produce annotation files for your favorite delicious tags and just upload each one sequentially in the CSE.

    Stephen – Did you check the Rank option? Or filter your bookmarks by a tag? The rank option most likely won’t do every bookmark due to the expensiveness of retrieving the Other counts (the delicious API really needs to expose these numbers in the posts/all call). If you didn’t do either, then shoot me an email (I have my contact info in my bio page).

  23. Pingback: links for 2007-01-09 « timtowle

  24. Hello,
    Can you send me an account to play with?

    Thank you,
    Rod Guzzo

  25. Pingback: Tiedon hallintaa sosiaalisten kirjanmerkkien avulla? at Hypermediaa ja elämää.

  26. Hey, Great tool, It would be nice if you could go into the xml creation a little thought … that will help further development of similar search engines…

  27. Nita Singh

    Great blog Vik! The material is interesting and smart and I enjoy following it. Continued success!

  28. Pingback: Library clips :: Google CSE and dynamic OPML :: February :: 2007

  29. Ann Hulton

    Hello,

    Very interesting work. I wonder if you would send me an account so that I could play with the CSE. Thanks!

  30. Great tool. Unfortunately I have lots of tagged URL’s and Google limits this to 2000. Any suggestions?

  31. Matthew

    Don’t seem to be able to get your link http://basundi.com:8000/login.html to resolve… what’s going wrong?

  32. Hi Matthew – It works for me. Hmm. Try again (refresh and clear the cache if necessary). It should be working.

  33. Check this out http://www.googlepowersearch.com.
    I created GooglePowerSearch so you can power search for Video, News, Maps, Images and more…
    Google Power Search helps to unleash the built in power of Googles special features.
    Using Google Power Search you are able to get better-targeted results.
    Check out Google Power Search and let me know what you think.
    Thanks
    Steve

  34. yours is the second blog (or actually third maybe) I have ever bookmarked (yea I don’t use aggregators)
    I already was like deleting my co-op account then I read through this post once and then TA-DA
    http://taxa.search.googlepages.com/home
    I even licensed it with same exact license as you had just to be sure.
    but I think I have comitted at least a dozen of copyright infringements as well

    I always get all giggly seeing the google labs logo but this just close to too exciting
    so yea thanks for pointing out how it’s done and I haven’t even started with implementing that facets x labels thing which sounds great (probably first I have to make a del.icio.us account)
    so yea this blog has been valuable content for me.

  35. Pingback: Top 5 Daily Questions About Google

  36. Vik –

    Great tool. How would I take one of my subscriptions and turn it into a CSE.

    Let’s say that I’ve subscribed to the tag San Francisco, can I use your tool to take that subscribtion and generate web URLs that I can feed back into CSE?

  37. Hi Farhan – I would recommend looking into the OPML upload feature available in the Advanced tab of the CSE’s control panel. This will take OPML (and various RSS feed formats), extract its URL’s, and import them directly into the CSE. My tool currently just supports a user’s bookmarks available via del.icio.us’s API.

    The other option (in case the OPML feature does not work) would be to regex out the URL’s and pump them into a flat file (each link new-line separated), then paste the links in the sites box (Sites tab).

  38. Hi VIk’s,

    Ok, I give up, just gimme your dummy in my email :(
    Thanks pal…

  39. could this work with ma.gnolia bookmarks?

  40. Yeah it’s possible if there’s an API or XML feeds available for retrieving the bookmarks. When time cycles free up I’ll look into that.

  41. Throw me an account, this looks awesome. Was thinking of purpose-building an app to do same, but running with Google is even better. Any chance of getting a copy of this to run on my own server/alter?

  42. I use delicius with my blog every day.

  43. Pingback: davidrothman.net » Blog Archive » Social Search for Health Librarians

  44. is this different with the google search for your domain thing that they offered long time back

  45. Pingback: Sundaize Blog

  46. I use delicious evary day for my bookmarks great tips
    Thanks

  47. Any news on the ma.gnolia integration?

  48. Zooie, any update on the ma.gnolia integration?

  49. oops, sorry for the double post :(

  50. Hey Danny – Sorry for the delay. Haven’t had a chance to get to it. You might want to look at the Google Custom Search site. They have a new feature called ‘Linked CSE’ – I think this might do what you want.

  51. I looked at the linked CSE, your plugin would suite my needs much better. I’ll keep watching this space – crossing my fingers for ma.gnolia integration :-)

  52. That is a very slick implementation. I have been looking into the custom search engine and more specifically their Ajax API. I created a javascript class using the prototype.js library to allow for a completely customizable ajax search. They now support binding an GWebSearch Object to a specific CSE.

    http://positionabsolute.net/blog/2007/08/implement-custom-search.php

    Cheers,
    Matt

  53. EB

    I can’t get your page to load! Dying to see what you’ve done here, but it keeps telling me the connection timed out. Any hints?

  54. Hi EB – Sorry, I was running it on my friend’s server and I think I may have over-welcomed my stay ;)

    Let me see what I can do.

    – Vik

  55. Any luck relocating this service? I’m eager to try it!

  56. Hi David – Not yet – Sorry! Anyone out there got a server available running apache/mod_python?

  57. Jeremy

    Zooie, I am interested in helping you out – However, do you have any idea how much cpu usage and bandwidth does your app require?

  58. I stumbled across this page while searching for a way to search my Ma.gnolia bookmarks. I wasn’t able to find anything else, so I wrote my own little tool (http://nemti.awardspace.com/goo.gnolia/). It’s just a rather simple implementation of Google linked CSE. Yours sounds much more featureful – I hope you can find a host, and it would be great if you could add Ma.gnolia support.

  59. Kristen

    I tried this, but when I uploaded my XML annotations and skeleton, I got an “error parsing XML at line 3″ message in both cases. Can you tell me what I did wrong? Thanks.

  60. Hi Kristen – Good chance Google changed their XML formats since I developed this tool. Could you send me your XML (the one which produces the bug)? vik.singh [at gmail]. Thanks.

  61. L.A. Buddy

    As I see deligoo.com do the same thing with del.icio.us search, but you must install their plugin.

  62. Pingback: Charlie’s path to dEAth » 用Google自定义搜索引擎(CSE)搜索del.icio.us

  63. Pingback: Google blog» Архив блога » Here’s some love for our Custom Search friends

  64. Pingback: Saved: Tiedon hallintaa sosiaalisten kirjanmerkkien avulla? at Ip’s

  65. hi,

    we used google cse and delicous to generate results based on all delicious bookmarks and created refinements based on the tags. Check it out at http://www.scoofers.com

  66. Pingback: del.icio.us driven Google custom search « The Ancient Geeks

  67. Really good initiative.

    I wonder if Google didn’t rework this with their makeannotation tool :

    http://www.google.com/coop/docs/cse/tools.html

  68. 8928

    hi…..is there a way to add your delicious account as subscribed linked in google search???

  69. cool features, very useful deli.cio.us tool

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s