<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Linking Bollywood</title>
	<atom:link href="http://silentideas.com/wordpress/2008/10/01/linking-bollywood/feed/" rel="self" type="application/rss+xml" />
	<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/</link>
	<description>Ideas that make a difference</description>
	<pubDate>Tue, 07 Sep 2010 00:42:03 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: Kumar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-90</link>
		<dc:creator>Kumar</dc:creator>
		<pubDate>Fri, 14 Nov 2008 06:11:42 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-90</guid>
		<description>sorry friends for this late reply. I was caught up in some work.
So as Tushar mentioned let's first decide on our source of information. If we try to crawl on one particular website for information, the firewall may declare us a spam and restrict access (this came to my mind lately). However dumb servers may not notice. But while gathering data we don't intend to create traffic for the host server and application. Hence need to design the crawler so that it appears to be coming from different ip and at some interval rather than running a loop.</description>
		<content:encoded><![CDATA[<p>sorry friends for this late reply. I was caught up in some work.<br />
So as Tushar mentioned let&#8217;s first decide on our source of information. If we try to crawl on one particular website for information, the firewall may declare us a spam and restrict access (this came to my mind lately). However dumb servers may not notice. But while gathering data we don&#8217;t intend to create traffic for the host server and application. Hence need to design the crawler so that it appears to be coming from different ip and at some interval rather than running a loop.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tushar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-89</link>
		<dc:creator>Tushar</dc:creator>
		<pubDate>Thu, 23 Oct 2008 01:33:28 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-89</guid>
		<description>My few hours of research and reading have clear my understanding of Web Crawler. In fact there are many web crawlers code and 'how to write' strategies available on Internet. 

My first phase of Web Crawler design 
Step 1. Source of the information (URL): Who (web pages) are going to be the initial source from where we (Crawler) extract information.   Your suggestion required!

Step 2. Identify the container and design a method to access it’s contents  : Web pages comprises of different "containers" - tables, charts etc (What’s etc??…I don’t know other than these:)),. After that Crawler parse the information and retrieve information of interest 

Step 3. Data or Information Processing: Information will be processed by Crawler (Information on Internet leads to other Information via link or more specifically hyperlink. We will decide how to process link it in next phase.)

Step 4. Network connections: sockets

Once we have Crawler ready that retrieves data, we start discussing how to present contents like Digg Arch at the initial level. And then we move to next phase of Crawler design - there is still more to it :)

What do you think?</description>
		<content:encoded><![CDATA[<p>My few hours of research and reading have clear my understanding of Web Crawler. In fact there are many web crawlers code and &#8216;how to write&#8217; strategies available on Internet. </p>
<p>My first phase of Web Crawler design<br />
Step 1. Source of the information (URL): Who (web pages) are going to be the initial source from where we (Crawler) extract information.   Your suggestion required!</p>
<p>Step 2. Identify the container and design a method to access it’s contents  : Web pages comprises of different &#8220;containers&#8221; - tables, charts etc (What’s etc??…I don’t know other than these:)),. After that Crawler parse the information and retrieve information of interest </p>
<p>Step 3. Data or Information Processing: Information will be processed by Crawler (Information on Internet leads to other Information via link or more specifically hyperlink. We will decide how to process link it in next phase.)</p>
<p>Step 4. Network connections: sockets</p>
<p>Once we have Crawler ready that retrieves data, we start discussing how to present contents like Digg Arch at the initial level. And then we move to next phase of Crawler design - there is still more to it <img src='http://silentideas.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>What do you think?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-88</link>
		<dc:creator>Kumar</dc:creator>
		<pubDate>Wed, 15 Oct 2008 05:22:57 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-88</guid>
		<description>There are two aspects of any project that we take up.
Learning and Business
I think we start with "learning". But cautiously. We should not reinvent the wheel. We'll start with something that's unexplored and learn while designing and developing it.
And when I say unexplored there will be market and business for it. Just that we need to work around it to make it available.
So how do we start?
where are other guys? why aren't they contributing? Come on people let's look beyond. we need to slog for these things, but at least giving a bit will help.And one more thing, this is not restricted to our group only.</description>
		<content:encoded><![CDATA[<p>There are two aspects of any project that we take up.<br />
Learning and Business<br />
I think we start with &#8220;learning&#8221;. But cautiously. We should not reinvent the wheel. We&#8217;ll start with something that&#8217;s unexplored and learn while designing and developing it.<br />
And when I say unexplored there will be market and business for it. Just that we need to work around it to make it available.<br />
So how do we start?<br />
where are other guys? why aren&#8217;t they contributing? Come on people let&#8217;s look beyond. we need to slog for these things, but at least giving a bit will help.And one more thing, this is not restricted to our group only.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maulik</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-87</link>
		<dc:creator>Maulik</dc:creator>
		<pubDate>Tue, 14 Oct 2008 15:48:32 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-87</guid>
		<description>It's not that much silence dont worry. Me and Tushar had long discussion on this topic last weekend. And we felt that we should move further on this idea. 

Now I like your reasoning and idea about application that leads to discovering new information. Similar idea is implemented in websites like Amazon. Amazon generally tries to figure out what kind of stuff you like and what you might like and it throws to you all related item based on that. 

In my previous comment what I wanted to figure out was what we really want to accomplish with this project. Are we targeting a particular audience or are we doing a complete experimental program which may or may not be audience oriented? And it would be great if we can come up with a system that can be applied to any information system that is similar to bollywood. But then again we should be clear on what we ultimately want from this project.</description>
		<content:encoded><![CDATA[<p>It&#8217;s not that much silence dont worry. Me and Tushar had long discussion on this topic last weekend. And we felt that we should move further on this idea. </p>
<p>Now I like your reasoning and idea about application that leads to discovering new information. Similar idea is implemented in websites like Amazon. Amazon generally tries to figure out what kind of stuff you like and what you might like and it throws to you all related item based on that. </p>
<p>In my previous comment what I wanted to figure out was what we really want to accomplish with this project. Are we targeting a particular audience or are we doing a complete experimental program which may or may not be audience oriented? And it would be great if we can come up with a system that can be applied to any information system that is similar to bollywood. But then again we should be clear on what we ultimately want from this project.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-86</link>
		<dc:creator>Kumar</dc:creator>
		<pubDate>Tue, 14 Oct 2008 07:31:34 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-86</guid>
		<description>why is there so much silence??
where is everybody??

silent !!!!!!!!!!!!</description>
		<content:encoded><![CDATA[<p>why is there so much silence??<br />
where is everybody??</p>
<p>silent !!!!!!!!!!!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-85</link>
		<dc:creator>Kumar</dc:creator>
		<pubDate>Fri, 10 Oct 2008 05:14:41 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-85</guid>
		<description>Tying up with a website/organization to get the data is surely the straightforward way. But that needs agreement and lot of talks which means lot of time. Why not to use data that is available on the internet. We ask the website for using their data in this way and give them the application as a promotional item(with limited feature) and that works as an enough incentive to use their data.

When you are looking for something specific at that time getting data straight is necessary. But when you are just browsing with an intention to discover something at that time navigation and how you present the data becomes important. Anyways we can have different views. But to start with let's have a small bit of code grabbing the data from web.</description>
		<content:encoded><![CDATA[<p>Tying up with a website/organization to get the data is surely the straightforward way. But that needs agreement and lot of talks which means lot of time. Why not to use data that is available on the internet. We ask the website for using their data in this way and give them the application as a promotional item(with limited feature) and that works as an enough incentive to use their data.</p>
<p>When you are looking for something specific at that time getting data straight is necessary. But when you are just browsing with an intention to discover something at that time navigation and how you present the data becomes important. Anyways we can have different views. But to start with let&#8217;s have a small bit of code grabbing the data from web.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maulik</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-84</link>
		<dc:creator>Maulik</dc:creator>
		<pubDate>Fri, 10 Oct 2008 00:39:02 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-84</guid>
		<description>Alright here is my opinion. We had discussed about this application in the past, but not much. I think it's a really great idea because there is also a business in this area. Two questions, is writing crawler the efficient way to fetch data? How about if we make a flash application and make a partnership with some big website? Just some ideas on that front. 

I really love what people do with flash and digg. But I think application like this (the one you posted) is not meant to be for largest audience possible. It asks for more brain usage than a normal person would like to use. In my experience there are times when I just need information direct on my face, I don't like to interpret it from different representation. And that's why imdb works really great or &lt;a href="http://www.boxofficemojo.com/alltime/" rel="nofollow"&gt;boxofficemojo&lt;/a&gt; which uses simple table representations. So you get to think which works for the most number of people instead of what is really a clever idea if you are getting into commercial application.</description>
		<content:encoded><![CDATA[<p>Alright here is my opinion. We had discussed about this application in the past, but not much. I think it&#8217;s a really great idea because there is also a business in this area. Two questions, is writing crawler the efficient way to fetch data? How about if we make a flash application and make a partnership with some big website? Just some ideas on that front. </p>
<p>I really love what people do with flash and digg. But I think application like this (the one you posted) is not meant to be for largest audience possible. It asks for more brain usage than a normal person would like to use. In my experience there are times when I just need information direct on my face, I don&#8217;t like to interpret it from different representation. And that&#8217;s why imdb works really great or <a href="http://www.boxofficemojo.com/alltime/" rel="nofollow">boxofficemojo</a> which uses simple table representations. So you get to think which works for the most number of people instead of what is really a clever idea if you are getting into commercial application.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-83</link>
		<dc:creator>Kumar</dc:creator>
		<pubDate>Thu, 09 Oct 2008 16:12:53 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-83</guid>
		<description>ok if you are comfortable with perl can you write a script which can extract particular data from a set of webpages which have same kind of html layouts/structuring.
Take for example different pages of movies at musicindiaonline website</description>
		<content:encoded><![CDATA[<p>ok if you are comfortable with perl can you write a script which can extract particular data from a set of webpages which have same kind of html layouts/structuring.<br />
Take for example different pages of movies at musicindiaonline website</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tushar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-82</link>
		<dc:creator>Tushar</dc:creator>
		<pubDate>Thu, 09 Oct 2008 04:16:47 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-82</guid>
		<description>Because you have an idea how crawler will be; you might have thought of the most suitable scripting language for it!

From my side, I think perl will be good candidate!</description>
		<content:encoded><![CDATA[<p>Because you have an idea how crawler will be; you might have thought of the most suitable scripting language for it!</p>
<p>From my side, I think perl will be good candidate!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kumar</title>
		<link>http://silentideas.com/wordpress/2008/10/01/linking-bollywood/#comment-81</link>
		<dc:creator>Kumar</dc:creator>
		<pubDate>Tue, 07 Oct 2008 04:21:04 +0000</pubDate>
		<guid isPermaLink="false">http://silentideas.com/wordpress/?p=133#comment-81</guid>
		<description>well than here we go.
tushar which scripting language are you comfortable with? Take one of that and write a crawler as I said. I don't how it is done. May be will search on internet a bit. we can then host the app on silentideas itself in the lab section.
what say??</description>
		<content:encoded><![CDATA[<p>well than here we go.<br />
tushar which scripting language are you comfortable with? Take one of that and write a crawler as I said. I don&#8217;t how it is done. May be will search on internet a bit. we can then host the app on silentideas itself in the lab section.<br />
what say??</p>
]]></content:encoded>
	</item>
</channel>
</rss>
