<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>paidContent &#187; Derrick Harris Archives</title>
	<atom:link href="http://paidcontent.org/author/dharrisstructure/feed/" rel="self" type="application/rss+xml" />
	<link>http://paidcontent.org</link>
	<description>The economics of digital content</description>
	<lastBuildDate>Mon, 20 May 2013 20:55:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='paidcontent.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/89ee7e1250b4095eefb87d28e6e64947?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>paidContent &#187; Derrick Harris Archives</title>
		<link>http://paidcontent.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://paidcontent.org/osd.xml" title="paidContent" />
	<atom:link rel='hub' href='http://paidcontent.org/?pushpress=hub'/>
		<item>
		<title>Aereo CEO says free content might be on the way</title>
		<link>http://paidcontent.org/2013/04/17/aereo-ceo-says-free-content-might-be-on-the-way/</link>
		<comments>http://paidcontent.org/2013/04/17/aereo-ceo-says-free-content-might-be-on-the-way/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 22:00:53 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[aereo]]></category>
		<category><![CDATA[chet-kanojia]]></category>
		<category><![CDATA[paidcontent live 2013]]></category>

		<guid isPermaLink="false">http://paidcontent.org/?p=227930</guid>
		<description><![CDATA[Aereo CEO Chet Kanojia wants to disrupt TV pricing again, this time by rolling out movie and news packages at a fraction of the price of traditional ones. News, he said, might even be free.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=227930&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Aereo’s approach to letting consumers access broadcast TV content on their mobile devices and computers is nothing if not disruptive, and Wednesday at our <a href="http://event.gigaom.com/paidcontent/?utm_source=media&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=227930+aereo-ceo-says-free-content-might-be-on-the-way&amp;utm_content=dharrisstructure">paidContent Live</a> conference in New York, CEO Chet Kanojia upped the ante even more. Discussing how the company will be able to expand its channel offerings without falling into the old traps of cable pricing, he suggested that a free or low-cost news package is likely on the horizon.</p>
<p>It’s part of a bigger plan to figure out how to address consumers’ base needs first and foremost, before then adding the nice-to-have features for a price. Aereo sees the future of television content as being what Kanojia calls “skinny live, deep library,” so the live parts are only for the content people really need in real time — stuff like news and sports.</p>
<p>“(The consumer is) the one constituent in this industry that’s unserved,” Kanojia said. “Everyone’s businesses are stacked to take advantage of the consumer, not to serve the consumer.”</p>
<p>If on the other hand, the value-add of a movie channel (oh, Aereo’s probably going to add one of those, too) is to watch stuff on your own time, people will probably willing to pay 50 cents or a dollar a month, he said. The same thing goes for programming from, hypothetically, a content provider like Viacom has a broad range of shows that people don’t really need or want to see only while they’re airing.</p>
<p>The only way to do this correctly, though, is to avoid traditional licensing models that have jacked cable prices through the roof and have led to a lot bloat because consumers are getting way more channels than they ever would want to watch. Kanojia wants Aereo to provide 50 percent of the value for 10 percent of the cost of cable, and then let partners and services like Netflix or Amazon Prime fill in the rest.</p>
<p>“The last time I checked,” he joked, “there’s no need to have <em>Desperate Housewives</em> or the <em>Real Housewives of Orange County</em> running on four channels at the same time.”</p>
<p>As for those lawsuits that have plagued the company since its inception, Kanojia said he’s not surprised but he’s disappointed by threats from companies such as Fox and CBS to pull their stations off the public airwaves (the spectrum on which is provided for free because stations are supposed to operate in part in the public interest).</p>
<p>“I just don’t understand the logic behind that,” he said. “I think it’s disappointing to say the least.”</p>
<p>But with significant legal victories already behind it, the future looks a little clearer. He expects the company model could realistically net the company 20 percent of the American television market, and the company is expanding fast outside of New York. It’s supposed to be in 22 more cities by July.</p>
<p>“The one thing that would float by boat more than anything else,” Kanojia said, “is I get a chance to put my product in front of consumers and be judged by the consumers.”</p>
<p><a href="http://paidcontent.org/2013/04/17/paidcontent-live-2013-coverage/">Check out the rest of our paidContent Live 2013 coverage here</a>, and a video embed of the session follows below:</p>
<p><iframe src="http://new.livestream.com/accounts/74987/events/2000322/videos/16663340/player?autoPlay=false&amp;height=360&amp;mute=false&amp;width=640" height="360" width="640" frameborder="0" scrolling="no"></iframe><br>
A transcription of the video follows on the next page</p>
<p><a href="http://paidcontent.org/2013/04/17/aereo-ceo-says-free-content-might-be-on-the-way/2/">Go to page 2 (of 2) on paidContent .</a></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=227930&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=165909"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=165909" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://paidcontent.org/2013/04/17/aereo-ceo-says-free-content-might-be-on-the-way/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:thumbnail url="http://gigaompaidcontent.files.wordpress.com/2013/04/img_3519.jpg?w=150" />
		<media:content url="http://gigaompaidcontent.files.wordpress.com/2013/04/img_3519.jpg?w=150" medium="image">
			<media:title type="html">paidContent Live 2013 Chet Kanojia Aereo</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>Gravity giving away personalization to whichever publishers want it</title>
		<link>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/</link>
		<comments>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/#comments</comments>
		<pubDate>Fri, 01 Feb 2013 18:11:45 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[Gravity]]></category>
		<category><![CDATA[Interest Graph]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[personalization]]></category>
		<category><![CDATA[publishing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=606615</guid>
		<description><![CDATA[Gravity, a startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. Considering the increasing importance of personalization online, this could be a good deal.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=224002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.gravity.com/">Gravity</a>, a Santa Monica, Calif-based startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. If you don’t mind a few sponsored stories popping up in the newsfeed — a condition of using the free platform — this could be a pretty good deal.</p>
<p>Gravity’s recommendation system is based on its <a href="http://gigaom.com/2012/03/15/the-personalized-web-is-just-an-interest-graph-away/">interest graph</a> technology, which we detailed last year. Here’s <a href="http://gigaom.com/2012/03/11/can-big-data-fix-a-broken-system-for-software-patents/">how I described it then</a>:</p>
<blockquote id="quote-the-gist-is-that-hum"><p>[T]he gist is that humans first serve as guides for machine-learning algorithms by determining connections between terms within large data sets, then the algorithms take over to complete the job faster than humans ever could. When they’re done, the humans step in one more time to kill any bad connections between terms. The result is a system that can determine with high accuracy that a person tweeting about Vanessa Laine (Los Angeles Laker Kobe Bryant’s ex-wife), for example, is probably more interested in basketball than about Laine’s date of birth or other accurate but irrelevant information.</p></blockquote>
<p>As new content streams into Gravity’s system, it’s analyzed and categorized in real time, then presented to users accordingly based on their interests and behavioral history.</p>
<div id="attachment_606730" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg"><img alt="How Gravity's platform works" src="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg?w=708&#038;h=306" width="708" height="306" class="size-large wp-image-606730"></a><p class="wp-caption-text">How Gravity’s platform works</p></div>
<p>Graph processing and <a href="http://gigaom.com/2011/10/24/springsource-links-up-with-neo-technology-on-nosql/">graph databases</a> — which store and analyze data based on their relationship to one another — are critical to our onlines lives, powering everything from <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">online recommendations</a> to <a href="http://gigaom.com/2013/01/15/a-really-tiny-explanation-of-how-facebooks-graph-search-works/">social search</a> to <a href="http://gigaom.com/2012/08/08/for-google-keeping-search-relevant-means-baking-big-data-into-everything/">knowledge discovery</a>. Graph technologies are also the focal point of some impressive life sciences work from companies such as <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">Syapse</a> and <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a>, which will be presenting at <a href="http://event.gigaom.com/structuredata/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=224002+gravity-giving-away-personalization-to-whichever-publishers-want-it&amp;utm_content=dharrisstructure">Structure: Data</a> in New York next month.</p>
<p>But publishers struggling to stand out on a noisy web might have the most to gain from graphs and personalization, generally. At our <a href="http://event.gigaom.com/paidcontent/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=224002+gravity-giving-away-personalization-to-whichever-publishers-want-it&amp;utm_content=dharrisstructure">PaidContent Live</a> conference (April 17 in New York), executives from Prismatic, Zite and Bluefin Labs will take the stage to talk about the importance of personalization for helping consumers filter through the deluge of content online so they can find what they really want. It’s arguable that the trick to keeping readers happy is knowing what they want to read — possibly better than they do themselves.</p>
<p>According to Gravity, its platform currently “delivers more than 25 million personalized content recommendations per day to more than 200 million users. Beta partners have reported click through rates two to three times above previous levels, return visitation increases of 300 percent and session length increases up to 40 percent.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=224002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=151116"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=151116" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy.jpeg?w=150" medium="image">
			<media:title type="html">canvas-copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg?w=708" medium="image">
			<media:title type="html">How Gravity&#039;s platform works</media:title>
		</media:content>
	</item>
		<item>
		<title>Researchers mine 2.5M news articles to prove what we already know</title>
		<link>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/</link>
		<comments>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/#comments</comments>
		<pubDate>Tue, 27 Nov 2012 02:54:38 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Academia]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[data-mining]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=588140</guid>
		<description><![CDATA[A group of British researchers recently analyzed 2.5 million newspaper articles in order to prove that new data analysis techniques, such as machine learning and natural-language processing, can accurately classify media content. They hope their approach can save academicians untold hours of manual labor.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221191&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A group of British researchers has <a href="http://mediapatterns.enm.bris.ac.uk/AnalysisOfMillionsOfArticles">published the results of a data mining experiment</a> that analyzed nearly 2.5 million articles from 498 newspapers on criteria such as topic selection, writing style and sensationalism, and found &#8212; no surprise &#8212; that tabloids are the easiest to read and reporters don&#8217;t often cover women&#8217;s sports. If these findings sound predictable, that was exactly what the researchers were aiming for.</p>
<p>The experiment&#8217;s techniques actually point to a future where researchers are spared the grunt work of poring through thousands of pages of news or watching hundreds of hours of programming, and can actually focus their energy of explaining. As the researchers <a href="https://patterns.enm.bris.ac.uk/files/DigitalJournalism.pdf">note in their paper</a>, the real ramifications of this research lie more in what it accomplished than in what it found.</p>
<p>Namely, they demonstrated that with new big data techniques such as machine learning and natural-language processing, it&#8217;s possible to accurately analyze millions of pieces of content spanning almost a year without requiring humans to read and score it all. Choosing hypotheses with predictable results meant it was easier to verify their accuracy.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg"><img  title="newspapers_writing_style" alt="" src="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg?w=604&#038;h=454" height="454" width="604" class="aligncenter size-large wp-image-588153" /></a></p>
<p>Here&#8217;s how how they explain the promise of their work and some potential use cases, the latter of which they go into far more detail about in the paper:</p>
<blockquote id="quote-it-allows-researcher"><p>&#8220;[I]t allows researchers to focus their attention on a scale far beyond the sample sizes of traditional forms of content analysis. Rather than spending precious labour on the coding phase of raw data, analysts could focus on designing experiments and comparisons to test their hypotheses, leaving to computers the task of finding all articles of a given topic, measuring various features of their content such as their readability, use of certain forms of language, sources etc. (just a few of the tasks that can now be automated).</p>
<p>&#8230; Our approach &#8212; apart from freeing scholars from more mundane tasks &#8212; allows researchers to turn their attention to higher level properties of global news content, and to begin to explore the features of what has become a vast, multi-dimensional communications system.&#8221;</p></blockquote>
<p>Put more simply: This research underscores the common big data maxim that knowing the right questions to ask is now the biggest challenge in gleaning insights from data. It&#8217;s increasingly easy to get data, analyze it and visualize it, so humans really just need to hypothesize and be able to explain the results. (This also seems like a good place to plug <a href="https://scraperwiki.com">ScraperWiki</a> as a great source for gathering potential research data from websites.)</p>
<p>Creating the workflows for gathering and analyzing the data as the authors suggest still isn&#8217;t child&#8217;s play (it might take some assistance from the computer science department), but it&#8217;s a lot better than the alternative.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-731887p1.html">Shutterstock user Ruggiero Scardigno.</a></em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221191&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=542898"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=542898" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_113800528.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_113800528.jpg?w=150" medium="image">
			<media:title type="html">newspapers</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg?w=604" medium="image">
			<media:title type="html">newspapers_writing_style</media:title>
		</media:content>
	</item>
		<item>
		<title>Data isn&#8217;t just the new oil, it&#8217;s the new money. Ask Zoë Keating</title>
		<link>http://gigaom.com/2012/11/20/data-isnt-just-the-new-oil-its-the-new-money-ask-zoe-keating/</link>
		<comments>http://gigaom.com/2012/11/20/data-isnt-just-the-new-oil-its-the-new-money-ask-zoe-keating/#comments</comments>
		<pubDate>Wed, 21 Nov 2012 02:31:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[digital-copyright]]></category>
		<category><![CDATA[online data]]></category>
		<category><![CDATA[pandora]]></category>
		<category><![CDATA[streaming media]]></category>
		<category><![CDATA[user data]]></category>
		<category><![CDATA[web privacy]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=586855</guid>
		<description><![CDATA[In the fight about royalties from streaming media services like Pandora, Popular cellist Zoë Keating says she's willing to give up the money in exchange for data. It's an idea that's gaining traction elsewhere, too, as more companies are paying consumers for their truly valuable data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>People love to call data the new oil, but that might be selling it short. It&#8217;s only oil when we&#8217;re talking about pools of unrefined data like the stuff web companies collect, which has to be processed and transformed into something useful. There are certain types of data, though &#8212; especially data about consumers &#8212; that are as good as money in the bank without any work at all. And if you don&#8217;t believe me, ask popular cellist Zoë Keating.</p>
<p>As <a href="http://www.nytimes.com/2012/11/05/business/media/fight-growing-over-online-royalties.html?_r=0">a bill attempting to lower the royalty rates</a> paid to artists by streaming music services such as Pandora works its way through Congress, Keating <a href="http://zoekeating.tumblr.com/post/35737991443/what-i-want-from-internet-radio">took to her Tumblr blog last week</a> and offered a solution that both sides should listen to, but won&#8217;t. You might have <a href="http://www.billboard.biz/bbbiz/industry/digital-and-mobile/value-of-music-streaming-is-data-says-artist-1008018162.story">read about her stance in Billboard</a> or <a href="http://www.itworld.com/big-data/317769/data-ultimate-internet-music-royalty?page=0,1">ITworld</a> already, <a href="http://entertainment.slashdot.org/story/12/11/20/0312215/one-musicians-demand-from-pandora-mandatory-analytics">or perhaps on Slashdot</a>. If you haven&#8217;t, here it is in a nutshell, from Keating&#8217;s blog: &#8220;The law only demands I be paid in money, which at this point in my career is not as valuable as information. I’d rather be paid in data.&#8221;</p>
<p>Leaving aside the entire issue about royalties and copyright (and privacy policies), her statement is still powerful. Keating understands that in order to prosper in a world of digital music &#8212; just like in the world of e-commerce, digital publishing, you name it &#8212; information is power. The names, email address and perhaps mobile numbers of individuals listening to her music are nice, clean data that Keating could use with little to no analytic effort by reaching out to fans when a new tour is coming to town or a new album drops.</p>
<p>Actually, Keating <a href="http://zoekeating.tumblr.com/post/36160121213/more-about-data-vs-royalties">noted in a subsequent blog post on Tuesday</a> that even less-personal data can have a material impact on a performer&#8217;s bottom line. Using postal code data provide to her from iTunes sales, she&#8217;s able to plan tours more efficiently because she knows, or can make a safe assumption, that she has paying fans in certain cities.</p>
<p>Touring and merchandise sales remain most artists&#8217; primary means of income, and the current royalty rate of $.0011 per play doesn&#8217;t add up fast (<a href="https://docs.google.com/spreadsheet/ccc?key=0AkasqHkVRM1OdGhjdExSMzYyMXFZUkZNSUJrY3MwNXc&amp;pli=1#gid=0">at least according to Keating&#8217;s math</a>), so it&#8217;s easy to see why she &#8212; and probably many other up-and-coming or niche performers &#8212; would rather have the data that properties like Pandora almost certainly have.</p>
<p>And whether Keating knows it or not, the idea of using data as a substitute for money extends beyond web radio stations and musicians arguing about royalties. A couple weeks ago, I <a href="http://gigaom.com/data/will-consumers-trade-the-keys-to-the-data-castle-for-a-5-gift-card/">highlighted a handful of attempts</a> to convince consumers to hand over, in exchange for cash rewards or product discounts, valuable data that advertisers can&#8217;t collect by tracking their online activity. This is data such as recent and future purchases, personal interests, your web-surfing habits and where you shop in the physical world.</p>
<p>Just like Keating is willing to forgo one-tenth of one cent per play (real money, even if not a lot) in exchange for data, these brands are willing to trade cash or something like it for data <a href="http://gigaom.com/data/5-ideas-to-help-everyone-make-the-most-of-big-data/">they don&#8217;t have to run through a Hadoop cluster and seven segmentation algorithms</a> before they can tie it to a real person. They know they have to give a little bit in order to improve upon the status quo that&#8217;s good, but not nearly good enough for their purposes.</p>
<p>Previously, the notion that data is the currency of the web meant users gave away their behavior data to web sites in exchange for free services. Slowly but surely, however, that notion seems to be evolving. Maybe Zoë Keating wants data in lieu of royalties for the privilege of streaming her music, and maybe a web site wants my offline location data enough to give me a gift card worth enough that I&#8217;d hand it over. Either way, it&#8217;s all about the realization that some data is worth its weight &#8212; and then some &#8212; in cold, hard cash.</p>
<p><em>Feature image courtesy of <a href="http://www.flickr.com/photos/eschipul/3351462308/sizes/m/in/photostream/">Flickr user eschipul</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=318555"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=318555" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/20/data-isnt-just-the-new-oil-its-the-new-money-ask-zoe-keating/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/zoe-keating.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/zoe-keating.jpg?w=150" medium="image">
			<media:title type="html">Zoe Keating</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>MIT researcher says he can predict Twitter trends</title>
		<link>http://gigaom.com/2012/11/01/mit-researcher-says-he-can-predict-twitter-trends/</link>
		<comments>http://gigaom.com/2012/11/01/mit-researcher-says-he-can-predict-twitter-trends/#comments</comments>
		<pubDate>Thu, 01 Nov 2012 18:06:11 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[data-science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[social-media]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=579682</guid>
		<description><![CDATA[An MIT researcher says he has created an algorithm that can identify Twitter trends hours before the service can itself. If the algorithm works as he says, it could help Twitter -- and many more companies -- make a lot of money.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=220031&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A researcher at MIT claims to have developed an algorithm that can accurately predict what topics will trend on Twitter. But Twitter being a relatively minor business in the grand scheme of things, the algorithm might end up being more useful elsewhere, predicting stock prices, ticket sales and other dynamically changing quantities.</p>
<p>According to <a href="http://web.mit.edu/press/2012/predicting-twitter-trending-topics.html">a release from the MIT News Office</a>, Associate Professor Devavrat Shah says his model has been 95 percent accurate during testing and has been predicting trends hours before they appear on Twitter&#8217;s list. The algorithm incorporates a new approach to machine learning that compares real-time data with historical data and predicts outcomes based on past events that most closely align with the current situation. So, rather than analyzing a topic&#8217;s chances of trending equally against the entire historical corpus of topics, it will assign more weight to topics whose paths followed similar trajectories up the ranks of top trends.</p>
<p>And Twitter is certainly interested in the research. A company spokesperson emailed me to point out that Shah&#8217;s graduate research assistant, Stanislav Nikolov, is a Twitter employee.</p>
<div id="attachment_579769" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/11/trends.jpg"><img  title="trends" alt="" src="http://gigaom2.files.wordpress.com/2012/11/trends.jpg?w=300&#038;h=217" height="217" width="300" class="size-medium wp-image-579769" /></a><p class="wp-caption-text">Imagine knowing these topics before Twitter does.</p></div>
<p>However, the algorithm&#8217;s level of accuracy and speed would have to translate to a much-larger and more-complex stage &#8212; Twitter&#8217;s real-life firehose and stockpile of historical tweets &#8212; if the company were to use its predictions to charge premiums for ads associated with certain topics, as Shah suggests. Advertisers might not be happy to pay premium rates for topics that fizzle out before ever becoming top trends (although a tiered rate system based on the model&#8217;s confidence or, perhaps, projected ranking among top trends could work). Thus far, the algorithm has been trained using a set of 400 topics, half of which trended and half of which did not.</p>
<p>Shah thinks it&#8217;s a great fit for Twitter data because the data is relatively clean and he has found a strong correlation between past and future activity. Other historical data sets might be more messy or have more noise than does Twitter&#8217;s data set, which would make it much more difficult to filter out extraneous data and discern the real factors that lead to a particular result. However, even Twitter has presented research showing, in the case of its search engine at least, how the sheer volume of data it receives and the speed at which it comes in <a href="http://gigaom.com/cloud/twitter-shows-when-we-tweet-and-explains-why-its-search-sucks/">can make it difficult to accurately predict what someone wants to see</a>.</p>
<p>The good news, though, for anyone willing to give Shah&#8217;s algorithm a try is that it&#8217;s designed to process data in parallel across scale-out systems like those used by large web companies. Therefore, training it and then running it in production across a voluminous data set <a href="http://gigaom.com/cloud/skytree-intros-machine-learning-for-the-masses/">won&#8217;t run into the same obstacles traditionally faced by machine learning algorithms</a> as data sizes increase. And there are potentially more lucrative and rewarding endeavors that could benefit from this type of predictive power: Shah suggests stock markets, movie ticket sales and public transportation as possibilities, but others might include combating cybercrime by identifying threats earlier or predicting the severity of disease outbreaks.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-932215p1.html">Shutterstock user turtleteeth</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=220031&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=852001"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=852001" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/01/mit-researcher-says-he-can-predict-twitter-trends/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/twitter-network-data.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/twitter-network-data.jpg?w=150" medium="image">
			<media:title type="html">twitter network data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/trends.jpg?w=300" medium="image">
			<media:title type="html">trends</media:title>
		</media:content>
	</item>
		<item>
		<title>Forget your fancy data science, try overkill analytics</title>
		<link>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/</link>
		<comments>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/#comments</comments>
		<pubDate>Fri, 21 Sep 2012 17:00:24 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[data-science]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[kaggle]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=565355</guid>
		<description><![CDATA[Carter S. won his first-ever Kaggle competition -- our own GigaOM WordPress Challenge -- using a brute force method of data science he calls overkill analytics. Rather than spend untold hours perfecting complex models, Carter used simple algorithms and let powerful microprocessors do the rest.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=218093&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Meet Carter S. He used to be a lawyer, but now he writes predictive models for an insurance company. Admittedly green in certain new or advanced modeling methods, he prefers to use simple algorithms and throw as much computing power as possible problems. He <a href="http://www.overkillanalytics.net/about-overkill-analytics/">calls the technique &#8220;overkill analytics,&#8221;</a> and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the <a href="http://www.kaggle.com/c/predict-wordpress-likes">GigaOM WordPress Challenge: Splunk Innovation Prospect</a>  <em>(see disclosure)</em>.</p>
<p>Not only was this Carter&#8217;s first win, it was also his first contest. You can <a href="http://www.overkillanalytics.net/kaggles-wordpress-challenge-the-like-graph/">read the detailed explanation of his victory</a> on his blog, but the gist is that he didn&#8217;t get too involved with complex social graphing to determine relationships or natural language processing to determine topics readers liked. He figured out that most of what people liked came from blogs they&#8217;ve already read, and that the vast majority of posts people liked fell within a three-node radius on a simple social graph.</p>
<p>Statistically speaking, he did a <a href="http://en.wikipedia.org/wiki/Generalized_linear_model">generalized linear regression model</a>, followed by a <a href="http://en.wikipedia.org/wiki/Random_forest">random forest model</a> and averaged the results. &#8220;I&#8217;m not sure it&#8217;s a very unique technique,&#8221; he told me, &#8220;but it&#8217;s certainly a very powerful one.&#8221;</p>
<div id="attachment_565426" class="wp-caption aligncenter" style="width: 590px"><a href="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg"><img  title="blog-wordpress-centralitylift-580x295" src="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg?w=708" alt=""   class="size-full wp-image-565426" /></a><p class="wp-caption-text">Source: Overkill Analytics</p></div>
<p>And therein lies the beauty of overkill analytics, a term that Carter might have coined, but that appears to be catching on &#8212; especially in the world of web companies and big data. Carter says he doesn&#8217;t want to spend a lot of time fine-tuning models, writing complex algorithms or pre-analyzing data to make it work for his purposes. Rather, he wants to utilize some simple models, reduce things to numbers and process the heck out of the data set on as much hardware as is possible.</p>
<p>It&#8217;s not about big data so much as it is about big computing power, he said. There&#8217;s still work to be done on smaller data sets like the majority of the world deals with, but Hadoop clusters and other architectural advances let you do more to that data in a faster time than was previously possible. Now, Carter said, as long as you account for the effects of overprocessing data, you can create a black-box-like system and run every combination of simple techniques on data until you get the most-accurate answer.</p>
<p>I <a href="http://gigaom.com/data/5-ideas-to-help-everyone-make-the-most-of-big-data/">wrote about the same general theory recently</a> in explaining why Sparked.com&#8217;s Daniel Wiesenthal believes that big data (i.e., lots and lots of data combined with new storage and processing technologies) improves the practice of data science (i.e., the application of statistical techniques to data). The gist of his theory is that although complex models are great for small data sets, simple models can close the accuracy gap when applied to large data sets. Combine that with infrastructure that can process a lot of data relatively fast and support a wide variety of jobs, and you have a simpler, faster equally effective method.</p>
<p>Still, Carter said he didn&#8217;t get involved in Kaggle just to prove the effectiveness of overkill analytics. He does hope to get exposed to new data science techniques that haven&#8217;t yet caught on in the insurance industry, and he also wants to make a name for himself. When you work for a company with little turnover, he said, your professional network doesn&#8217;t grow too much, but doing Kaggle competitions is a great way to meet other data scientists &#8212; and <a href="http://gigaom.com/data/can-kaggle-make-data-science-a-spectator-sport/">winning is a great way to earn respect</a>.</p>
<p>Ali Ahmad (username Xali) won the separate Splunk Innovation portion of the contest. According to a statement from Splunk, he &#8220;used Splunk&#8217;s built in statistical and visualization features to map out the relationship between blogs containing YouTube videos with those that are most likely to be viral, as measured by likes and shares. As a bonus, he fed the data into an app to view the YouTube videos most commonly liked and shared via WordPress blogs!&#8221;</p>
<p><em><strong>Disclosure</strong>: Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOm. Om Malik, founder of GigaOm, is also a venture partner at True.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-674152p1.html">Shutterstock user nasirkhan</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=218093&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=496841"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=496841" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_86909912.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_86909912.jpg?w=150" medium="image">
			<media:title type="html">workflow</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg" medium="image">
			<media:title type="html">blog-wordpress-centralitylift-580x295</media:title>
		</media:content>
	</item>
		<item>
		<title>How India&#8217;s favorite TV show uses data to change the world</title>
		<link>http://gigaom.com/2012/08/11/how-indias-favorite-tv-show-uses-data-to-change-the-world/</link>
		<comments>http://gigaom.com/2012/08/11/how-indias-favorite-tv-show-uses-data-to-change-the-world/#comments</comments>
		<pubDate>Sat, 11 Aug 2012 19:00:36 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[bollywood]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[india]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[television]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=551595</guid>
		<description><![CDATA[Satyamev Jayate, one of India's highest-rated television shows, is using data as a means to effect meaningful change. The show's producers are aggregating and analyzing the millions of messages they receive on controversial issues to do everything from planning future episodes to pushing for political change.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=216268&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Every Sunday morning, millions of people in India tune in to watch Bollywood star <a href="http://en.wikipedia.org/wiki/Aamir_Khan">Aamir Khan</a> host one of the country&#8217;s highest-rated television shows, <a href="http://www.satyamevjayate.in/">Satyamev Jayate</a>. Only unlike so many popular programs, <a href="http://www.satyamevjayate.in/">Satyamev Jayate</a> doesn&#8217;t involve a singing competition or a collection of volatile strangers living under the same roof. It&#8217;s a documentary program tackling some of the country&#8217;s most-sensitive topics, and it has the whole country &#8212; indeed, the whole world &#8212; talking. In order to funnel millions of messages a week into something valuable, the shows producers have turned to big data.</p>
<p>Aside from Khan&#8217;s star power, the show is so popular because of the types of issues it tackles &#8212; <a href="http://en.wikipedia.org/wiki/Female_foeticide_in_India">female feticide</a>, caste discrimination, dowry deaths, child abuse and medical practice among them. According to one of the show&#8217;s producers, the amount of engagement and the number of responses from viewers is &#8220;completely unprecedented.&#8221; Here&#8217;s a sample of what we&#8217;re talking about, just 13 episodes into the show&#8217;s existence:</p>
<ul>
<li>400 million viewers on Indian television and across the world on YouTube.</li>
<li>More than 1.2 billion people have connected with Satyamev Jayate across its website, Facebook, Twitter, YouTube and mobile devices.</li>
<li>More than 8 million people have contributed a total of more than 14 million responses to the show&#8217;s content via Facebook, web comments, text-message votes and a telephone hotline. More than 100,000 new people respond each week.</li>
</ul>
<p>The responses take all sorts of forms, from votes on a weekly poll question to long, heartfelt letters explaining a viewer&#8217;s experience with an issue or how the show has changed their thinking on an issue. And although 95 percent of responses come from India, the show has received them from 5,000 locations in 165 countries, including as far away as northern Canada and Alaska. The show&#8217;s topics regularly rank among the top trends on Twitter shortly after each episode airs.</p>
<p>Surprisingly, the producer said, the India-created Satyamev Jayate has not received a single piece of hate mail from bitter geopolitical rival Pakistan. In fact, there have been numerous requests for an episode on India-Pakistan unity. (If you have 90 minutes, here&#8217;s an episode on human dignity.)</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/7OUoXsryE3c?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<h2 id="parsing-through-millions-of-me">Parsing through millions of messages</h2>
<p>In order keep up with all the messages, Satyamev Jayate turned to <a href="http://www.persistentsys.com/">Persistent Systems</a>, an Indian IT consultancy with offices around the world, which created a system for automating their analysis. Here&#8217;s how the process works.</p>
<p>About a day-and-a-half before each show, Satyamev Jayate&#8217;s production company tells Persistent what the issue will be and the two groups come up with a taxonomy that will help the system sort through messages based on what topics will be brought up during Sunday&#8217;s show. But it&#8217;s not by any means the definitive list. As activity ramps up on Twitter while the show airs (tweet rates are highest during commercials and immediately after it ends, by the way), the team gets a sense of what topics are resonating with viewers and what themes they can expect in the nearly million responses that will follow.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/08/satyamev1.jpg"><img  title="satyamev" src="http://gigaom2.files.wordpress.com/2012/08/satyamev1.jpg?w=206&#038;h=300" alt="" width="206" height="300" class="alignright size-medium wp-image-551830" /></a>When the responses actually do start pouring in after lunch, they hit a system designed by Persistent to automatically tag them and score them based on interest level and sentiment. So, as Mukund Deshpande, head of business intelligence and analytics at Persistent, told me, a long message with an interesting story will be marked as higher quality, while a short, congratulatory note will be scored lower. Because so many viewers write in &#8220;Hinglish,&#8221; a combination of Hindi and English, an off-the-shelf system wouldn&#8217;t have been as accurate for processing these messages.</p>
<p>In the future, he&#8217;d like to train the system to recognize various gradients of emotion, too, beyond just simple sentiment. That means not just &#8220;positive&#8221; or &#8220;negative,&#8221; but also &#8220;happy,&#8221; &#8220;sad,&#8221; &#8220;angry&#8221; and any other way a viewer might be feeling.</p>
<p>The best messages are then sent to a team of trained analysts &#8212; often college students and graduates, along with some Persistent employees &#8212; who decide which ones are worth following up on for a Friday radio show Khan does, and for <a href="http://www.satyamevjayate.in/issue06/indiasays/">placement on Satyamev Jayate&#8217;s web site</a>. These analysts try to ensure that the stories shared are truthful and that the messages don&#8217;t contain personal information that could get viewers in trouble or affect their privacy. Data visualizations about how many people have responded and where they come from is available on the <a href="http://www.satyamevjayate.in/impact/impact.php/">Impact section of the show&#8217;s site</a>, as well as on separate Impact pages for each episode.</p>
<h2 id="making-a-difference-with-data">Making a difference with data</h2>
<div id="attachment_551814" class="wp-caption alignleft" style="width: 209px"><a href="http://gigaom2.files.wordpress.com/2012/08/khan-copy.jpg"><img  title="khan copy" src="http://gigaom2.files.wordpress.com/2012/08/khan-copy.jpg?w=199&#038;h=300" alt="" width="199" height="300" class="size-medium wp-image-551814" /></a><p class="wp-caption-text">Aamir Khan</p></div>
<p>All this feedback has an impact, both on the show itself and on India. Satyamev Jayate&#8217;s voting process, in particular, has yielded some impressive results. After the first episode about female feticide, or the selective abortion of female fetuses, 99.8 percent of viewers said they agreed with the idea of a fast-track court to prosecute doctors who perform such operations. When Khan presented the results to the Indian government, officials <a href="http://articles.timesofindia.indiatimes.com/2012-05-11/jaipur/31668741_1_chief-justice-rajasthan-high-court-female-feticide">agreed almost immediately</a> to amend the court system accordingly, the producer told me.</p>
<p>Sometimes, though, the results simply present an interesting &#8212; if not troubling &#8212; view into the Indian subconscious. Almost 32 percent of respondents, for example, voted in favor of the right of families to use force preventing the marriage of two willing adults (subsequent analysis uncovered some reasons why, including continuing opposition to inter-caste marriage), while almost 14 percent of respondents one week said that beating a woman is a sign of masculinity. And although women comprise only about 32 percent of the show&#8217;s audience, they have accounted for the majority of responses on shows addressing issues important to them.</p>
<p>The producer said his team also uses the data to inspire ideas for future shows and to populate a weekly radio show that Khan does with a local journalist. The Satyamev Jayate team analyzes the week&#8217;s messages in order to pick the most powerful and determine trends in viewers&#8217; feelings, and Khan shares them during the interview. The second season, he said, will be shaped in part by how viewers responded to the format during the first season and the issues they want covered next.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/08/sat2.jpg"><img  title="sat2" src="http://gigaom2.files.wordpress.com/2012/08/sat2.jpg?w=178&#038;h=300" alt="" width="178" height="300" class="alignright size-medium wp-image-551817" /></a>Beyond just the next season, though &#8212; and the occasional political victory &#8212; the hope is that all the data Satyamev Jayate generates will have continuing utility. Deshpande said he&#8217;d like to see it used for ethnographic and social science research, because the dataset is larger than most academic studies could generate (something that&#8217;s <a href="http://gigaom.com/cloud/better-medicine-brought-to-you-by-big-data/">already happening with crowdsourced medical research</a>) and it&#8217;s very high quality because of the demographic and geographic information attached to it.</p>
<p>However, the producer with whom I spoke seems perfectly content right now with the way Satyamev Jayate is resonating with the public. For example, he said, viewers are reporting crimes they previously might not have considered too big a deal and are reaching out to disabled citizens. This is the first time many people are speaking openly about these issues, he said, and they&#8217;re able to track the effects because they&#8217;re able to ensure no message is left behind.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=216268&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=68418"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=68418" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/08/11/how-indias-favorite-tv-show-uses-data-to-change-the-world/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/satyamev2.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/satyamev2.jpg?w=150" medium="image">
			<media:title type="html">satyamev</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/satyamev1.jpg?w=206" medium="image">
			<media:title type="html">satyamev</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/khan-copy.jpg?w=199" medium="image">
			<media:title type="html">khan copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/sat2.jpg?w=178" medium="image">
			<media:title type="html">sat2</media:title>
		</media:content>
	</item>
	</channel>
</rss>
