<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for Vik Singh</title>
	<atom:link href="http://zooie.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://zooie.wordpress.com</link>
	<description>&#34;Let&#039;s party on the data!&#34; -- Jim Gray</description>
	<lastBuildDate>Fri, 18 Dec 2009 18:31:49 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Yahoo Boss &#8211; Google App Engine Integrated by suman</title>
		<link>http://zooie.wordpress.com/2008/08/04/yahoo-boss-google-app-engine-integrated/#comment-19483</link>
		<dc:creator>suman</dc:creator>
		<pubDate>Fri, 18 Dec 2009 18:31:49 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=36#comment-19483</guid>
		<description>is dere any sandbox account? unless my application is ready i would not be hosting it. so in this scenario how would  i get app id?</description>
		<content:encoded><![CDATA[<p>is dere any sandbox account? unless my application is ready i would not be hosting it. so in this scenario how would  i get app id?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Some Stats about Twitter&#8217;s Content by Babak</title>
		<link>http://zooie.wordpress.com/2009/10/12/some-stats-about-twitters-content/#comment-19475</link>
		<dc:creator>Babak</dc:creator>
		<pubDate>Wed, 02 Dec 2009 03:35:55 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=371#comment-19475</guid>
		<description>Hi Vik,

Very interesting stats, I would like to know how you generate a single key for each message by coalescing its representatives. How can I use keys to find exact duplicate or almost duplicates? Also do you know a source that can provide stemmed terms?

Example:

Wild cat
Cat behaves wildly
The cat is wild
Is she like a wild cat?
wildcats.com
Wild animals like to eat cat
She looks wild like her cat
John saved the cat from the wild animal

According to your algorithm all above should produce a unique key, right? and how your data analysis work in this example?


Thanks for your help.</description>
		<content:encoded><![CDATA[<p>Hi Vik,</p>
<p>Very interesting stats, I would like to know how you generate a single key for each message by coalescing its representatives. How can I use keys to find exact duplicate or almost duplicates? Also do you know a source that can provide stemmed terms?</p>
<p>Example:</p>
<p>Wild cat<br />
Cat behaves wildly<br />
The cat is wild<br />
Is she like a wild cat?<br />
wildcats.com<br />
Wild animals like to eat cat<br />
She looks wild like her cat<br />
John saved the cat from the wild animal</p>
<p>According to your algorithm all above should produce a unique key, right? and how your data analysis work in this example?</p>
<p>Thanks for your help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Some Stats about Twitter&#8217;s Content by Industry News: Twitter Trends Data Boot Camp Digital &#8211; Digital and Interactive Marketing Training and Certification : Boot Camp Digital &#8211; Digital and Interactive Marketing Training and Certification</title>
		<link>http://zooie.wordpress.com/2009/10/12/some-stats-about-twitters-content/#comment-19474</link>
		<dc:creator>Industry News: Twitter Trends Data Boot Camp Digital &#8211; Digital and Interactive Marketing Training and Certification : Boot Camp Digital &#8211; Digital and Interactive Marketing Training and Certification</dc:creator>
		<pubDate>Tue, 01 Dec 2009 19:39:36 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=371#comment-19474</guid>
		<description>[...] Singh (the engineer behind Yahoo Boss) conducted an analysis of 10 million tweets and shared some trends and [...]</description>
		<content:encoded><![CDATA[<p>[...] Singh (the engineer behind Yahoo Boss) conducted an analysis of 10 million tweets and shared some trends and [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Some Stats about Twitter&#8217;s Content by Twitter Doesn&#8217;t Track The Zeitgeist. Only 2 Percent Of Tweets Overlap With Search Trends.</title>
		<link>http://zooie.wordpress.com/2009/10/12/some-stats-about-twitters-content/#comment-19466</link>
		<dc:creator>Twitter Doesn&#8217;t Track The Zeitgeist. Only 2 Percent Of Tweets Overlap With Search Trends.</dc:creator>
		<pubDate>Mon, 30 Nov 2009 05:31:33 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=371#comment-19466</guid>
		<description>[...] stats came from an analysis of 10 million Tweets he crawled last summer. He looked at all Tweets, not just trending topics. When he stripped out the [...]</description>
		<content:encoded><![CDATA[<p>[...] stats came from an analysis of 10 million Tweets he crawled last summer. He looked at all Tweets, not just trending topics. When he stripped out the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Comparison of Open Source Search Engines by sheetal</title>
		<link>http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/#comment-19464</link>
		<dc:creator>sheetal</dc:creator>
		<pubDate>Thu, 26 Nov 2009 10:30:26 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=172#comment-19464</guid>
		<description>good to see such comparison and rich discussion at one single place.  Vik appreciate your entreprenuership.  

Testing-Associates provides independent verification and validation services for your software release and software acquisitions requirements.  Contact us at info@testing-associates.com or  +91-9481482882 / +1-(415)-944-1435 / +44 – (208)-196-6233 

Professional software testing services</description>
		<content:encoded><![CDATA[<p>good to see such comparison and rich discussion at one single place.  Vik appreciate your entreprenuership.  </p>
<p>Testing-Associates provides independent verification and validation services for your software release and software acquisitions requirements.  Contact us at <a href="mailto:info@testing-associates.com">info@testing-associates.com</a> or  +91-9481482882 / +1-(415)-944-1435 / +44 – (208)-196-6233 </p>
<p>Professional software testing services</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Some Stats about Twitter&#8217;s Content by Vik</title>
		<link>http://zooie.wordpress.com/2009/10/12/some-stats-about-twitters-content/#comment-19463</link>
		<dc:creator>Vik</dc:creator>
		<pubDate>Thu, 26 Nov 2009 02:28:20 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=371#comment-19463</guid>
		<description>I did a quick search and found:

http://hasin.wordpress.com/2009/06/20/collecting-data-from-streaming-api-in-twitter/

Not the one I used but it looks reasonable.</description>
		<content:encoded><![CDATA[<p>I did a quick search and found:</p>
<p><a href="http://hasin.wordpress.com/2009/06/20/collecting-data-from-streaming-api-in-twitter/" rel="nofollow">http://hasin.wordpress.com/2009/06/20/collecting-data-from-streaming-api-in-twitter/</a></p>
<p>Not the one I used but it looks reasonable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Some Stats about Twitter&#8217;s Content by Ramanean</title>
		<link>http://zooie.wordpress.com/2009/10/12/some-stats-about-twitters-content/#comment-19462</link>
		<dc:creator>Ramanean</dc:creator>
		<pubDate>Thu, 26 Nov 2009 02:10:11 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=371#comment-19462</guid>
		<description>Do you have any script for scraping(indexing) Tweets? I am in need of one to use AI in it..</description>
		<content:encoded><![CDATA[<p>Do you have any script for scraping(indexing) Tweets? I am in need of one to use AI in it..</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Comparison of Open Source Search Engines by Lee Giles</title>
		<link>http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/#comment-19458</link>
		<dc:creator>Lee Giles</dc:creator>
		<pubDate>Thu, 19 Nov 2009 13:57:46 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=172#comment-19458</guid>
		<description>We just integrated Heritrix with Solr using Java middleware, calling it YouSeer. The code is on SourceForge. We think this will compete with any of the open source search engines, such as Nutch and Hounder. We would appreciate any comments.

Best

Lee Giles</description>
		<content:encoded><![CDATA[<p>We just integrated Heritrix with Solr using Java middleware, calling it YouSeer. The code is on SourceForge. We think this will compete with any of the open source search engines, such as Nutch and Hounder. We would appreciate any comments.</p>
<p>Best</p>
<p>Lee Giles</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Comparison of Open Source Search Engines by SEO</title>
		<link>http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/#comment-19367</link>
		<dc:creator>SEO</dc:creator>
		<pubDate>Fri, 30 Oct 2009 11:41:32 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=172#comment-19367</guid>
		<description>Eye opening, especially for Luciene users.</description>
		<content:encoded><![CDATA[<p>Eye opening, especially for Luciene users.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Comparison of Open Source Search Engines by Open Source Search Engine Comparisons</title>
		<link>http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/#comment-19324</link>
		<dc:creator>Open Source Search Engine Comparisons</dc:creator>
		<pubDate>Sun, 25 Oct 2009 00:52:53 +0000</pubDate>
		<guid isPermaLink="false">http://zooie.wordpress.com/?p=172#comment-19324</guid>
		<description>[...] Engine Comparisons by lmetzler 7. July 2009 15:51 There&#039;s a nice little writeup on a comparison of various open source search engines.&#160; We&#039;ve been using Lucene and Lucene.Net (not explicitly tested) here at Bambit for many years [...]</description>
		<content:encoded><![CDATA[<p>[...] Engine Comparisons by lmetzler 7. July 2009 15:51 There&#39;s a nice little writeup on a comparison of various open source search engines.&nbsp; We&#39;ve been using Lucene and Lucene.Net (not explicitly tested) here at Bambit for many years [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
