<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>paidContent &#187; big-data</title>
	<atom:link href="http://paidcontent.org/tag/big-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://paidcontent.org</link>
	<description>The economics of digital content</description>
	<lastBuildDate>Tue, 21 May 2013 06:29:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='paidcontent.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/89ee7e1250b4095eefb87d28e6e64947?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>paidContent &#187; big-data</title>
		<link>http://paidcontent.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://paidcontent.org/osd.xml" title="paidContent" />
	<atom:link rel='hub' href='http://paidcontent.org/?pushpress=hub'/>
		<item>
		<title>MLB plans ad exchanges to target premium baseball fans</title>
		<link>http://paidcontent.org/2013/04/11/mlb-plans-ad-exchanges-to-target-premium-baseball-fans/</link>
		<comments>http://paidcontent.org/2013/04/11/mlb-plans-ad-exchanges-to-target-premium-baseball-fans/#comments</comments>
		<pubDate>Fri, 12 Apr 2013 00:43:59 +0000</pubDate>
		<dc:creator>Jeff John Roberts</dc:creator>
				<category><![CDATA[ad exchanges]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[bluekai]]></category>
		<category><![CDATA[bob bowman]]></category>
		<category><![CDATA[Major League Baseball]]></category>
		<category><![CDATA[Programmatic]]></category>

		<guid isPermaLink="false">http://paidcontent.org/?p=227555</guid>
		<description><![CDATA[Major League Baseball is using new data tools to create more detailed profiles of people who visit team and league websites. MLB plans to use the extra data to create profiles of affluent customers, and to let brands target those profiles on private ad exchanges.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=227555&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Major League Baseball will start collecting more data about customers who visit its websites as part of a plan to create new premium categories for online advertisers and increase revenue from MLB’s media properties.</p>
<p>MLB is also planning online ad exchanges in which brands can bid in real time to show ads to a specialty audience — say affluent female car buyers — on sites they own such as the New York Yankees team page. According to a Thursday <a href="http://www.marketwatch.com/story/play-ball-bluekai-adds-mlbam-to-dmp-lineup-2013-04-11">press release</a>:</p>
<p>“Advertisers now can identify affluent audiences based on a range of demographic, behavioral and purchasing attributes and target them across all MLB properties.” The ad initiative reflects MLB’s role as the <a href="http://gigaom.com/2013/02/26/passbook-mobile-ticketing-expanding-to-13-mlb-ballparks-this-season/">most tech-savvy sports league</a>; to learn more, come see MLB Advanced Media CEO Bob Bowman join us at <a href="http://event.gigaom.com/paidcontent?utm_source=media&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=227555+mlb-plans-ad-exchanges-to-target-premium-baseball-fans&amp;utm_content=jeffjohnroberts">paidContent Live</a> on April 17.</p>
<p>The release was put out by BlueKai, a company that collects and analyzes data from consumers as they move across different websites. In a phone interview, BlueKai Director of Business Development, Gina Kim, said the company is providing MLB with a data platform but that the league will not use it sell customer information to other publishers.</p>
<p>MLB’s plans to create more granular advertising segments also reflect a broader trend across major websites. Facebook, for instance, <a href="http://gigaom.com/2013/04/10/facebook-expands-ad-targeting-will-let-partners-show-ads-based-on-web-activity/">announced this week </a>that it’s now using third party data companies to offer super-specific audience segments like “children’s cereals” or “full-size sedan buyers.”</p>
<p>The BlueKai executive said the data deal for now just covers desktop browsers and not mobile devices.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=227555&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=539523"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=539523" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://paidcontent.org/2013/04/11/mlb-plans-ad-exchanges-to-target-premium-baseball-fans/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:thumbnail url="http://gigaompaidcontent.files.wordpress.com/2012/02/major-league-baseball-advanced-media-o.jpg?w=150" />
		<media:content url="http://gigaompaidcontent.files.wordpress.com/2012/02/major-league-baseball-advanced-media-o.jpg?w=150" medium="image">
			<media:title type="html">Major League Baseball</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/05dfcf765f1554b08954bb9e1ee63363?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">jeffjohnroberts</media:title>
		</media:content>
	</item>
		<item>
		<title>How the fastest-growing media site could help Democrats win the next election</title>
		<link>http://paidcontent.org/2013/03/06/how-the-fastest-growing-media-site-could-help-democrats-win-the-next-election/</link>
		<comments>http://paidcontent.org/2013/03/06/how-the-fastest-growing-media-site-could-help-democrats-win-the-next-election/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 22:51:56 +0000</pubDate>
		<dc:creator>Jeff John Roberts</dc:creator>
				<category><![CDATA[big-data]]></category>
		<category><![CDATA[buzzfeed]]></category>
		<category><![CDATA[Democratic Party]]></category>
		<category><![CDATA[elections]]></category>
		<category><![CDATA[Eli Pariser]]></category>
		<category><![CDATA[Peter Koechley]]></category>
		<category><![CDATA[social-media]]></category>
		<category><![CDATA[Upworthy]]></category>
		<category><![CDATA[viral media]]></category>

		<guid isPermaLink="false">http://paidcontent.org/?p=225563</guid>
		<description><![CDATA[Upworthy is attracting attention for its headlines and its viral videos about gay marriage, women's rights and other social causes. But the site's real value may be its potential to help the Democrats maintain their lead in social media and big data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=225563&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Upworthy, a viral media site, is less than a year old but already has more than 9 million monthly viewers. That outpaces the early days of other viral sites like BuzzFeed and Business Insider, and makes Upworthy the fastest-growing media company on the internet. It&#8217;s also one of the most unusual.</p>
<p>If you&#8217;re not familiar, <a href="http://www.upworthy.com/">Upworthy</a> adds splashy headlines to photos and videos it culls from across the internet, and encourages viewers to share them on Facebook and other social media sites. This is akin to what sites like BuzzFeed do but with two major differences: Upworthy doesn&#8217;t have advertising and it focuses exclusively on political and social issues like gender equality and climate change.</p>
<p>So far, the site has made a splash with fare like &#8220;<a href="http://www.upworthy.com/if-this-video-makes-you-uncomfortable-then-you-make-me-uncomfortable">If this video makes you uncomfortable, then you make me uncomfortable</a>&#8221; (advocating for gay marriage) and &#8220;<a href="http://www.upworthy.com/bully-calls-news-anchor-fat-news-anchor-destroys-him-on-live-tv">Bully Calls News Anchor Fat, News Anchor Destroys Him On Live TV</a>.&#8221; Upworthy also stands out for its editorial process: curators prepare 25 versions of each headline and engages in extensive A/B testing to find out which version is most likely to go viral.</p>
<p>&#8220;When we look at the media landscape, we see there being more of a demand problem than a supply problem &#8211; how do you get people to care about important stuff amidst the avalanche of content we all face each day?&#8221; said co-founder Peter Koechley.</p>
<p>So far, Upworthy is off to a roaring start and not just thanks to its millions of visitors. The press has <a href="http://www.poynter.org/latest-news/mediawire/167906/upworthy-seeks-the-serious-side-of-shareable-content/">praised</a> Upworthy for using viral tricks to promote content unrelated to cats, while high-profile media figures like BuzzFeed co-founder John Johnson and Facebook co-founder Chris Hughes have put their own money into it. The site has also <a href="http://allthingsd.com/20121016/with-six-million-uniques-upworthy-gets-4m-from-nea-to-find-more-virals-that-arent-cat-videos/">received $4 million</a> from venture capital firm NEA.</p>
<p>All of this has made Upworthy a darling of the startup scene. But what is the company&#8217;s business model? As noted, Upworthy has no advertising income, nor does it plan to have any. Meanwhile, the company is in the midst of a mini-hiring spree, while also maintaining a high-gloss website and social media operation.</p>
<p>Upworthy says it earns money by connecting &#8220;readers with non-profits and other organizations who are looking to grow their memberships via the sign-up boxes&#8221; on its site. In other words, the company is collecting email addresses and social media profiles for &#8220;lead generation.&#8221;</p>
<p>The company adds it will not work with just any organization &#8212; only those that &#8220;create positive social change.&#8221; In response to an email query, Upworthy co-founder Peter Koechley declined to provide financial figures but did say the site has been taking in revenue since its third month of operation.</p>
<p>It seems far-fetched, however, to build a major media venture on the backs of the Sierra Club, the American Worker or other social-change groups. Unless, that is, Upworthy&#8217;s primary goal is instead to build a political operation aimed at gathering voter data and boost the Democratic party in upcoming elections.</p>
<p>Recall how the Obama administration won the 2012 race by <a href="http://gigaom.com/2012/11/12/how-obamas-tech-team-helped-deliver-the-2012-election/">using big data</a> to identify and energize individual voters. The Democratic Party&#8217;s campaign, which relied heavily on Facebook connections and custom email messages, ran circles around Mitt Romney&#8217;s TV-based campaign. Now, with the help of Upworthy, the Democrats could be in a position to do it all over again &#8212; the site could not only help identify passionate supporters of liberal issues, but also be a laboratory to experiment with headlines and marketing messages like the ones used in an election.</p>
<p>Some members of the Upworthy team certainly have the pedigree for it. Koechley&#8217;s co-founder is Eli Pariser, the former head of Moveon.org, a liberal activist group closely tied to Democratic presidential campaigns. Meanwhile, Upworthy staff also include Zane Shelby and Ryan Resell, who worked on analytics and tech for the Obama campaign. And, according <a href="http://www.wired.com/business/2013/02/tabloid-chic-the-rise-of-racy-headlines/">to Wired</a>, Koechley is closely connected to Obama&#8217;s chief digital strategist, who gained fame for focus-tested emails like &#8220;I will be outspent&#8221; and &#8220;Do this for Michelle.&#8221;</p>
<p>Koechley told me: &#8221;We don&#8217;t view ourselves as a political organization, although some of us do have backgrounds in politics,&#8221; he said. &#8220;Some of the most popular stuff on Upworthy is about the wonders of science, building women&#8217;s self esteem, or feel-good stories about overcoming adversity.&#8221; He also pointed to the <a href="http://www.upworthy.com/about">site&#8217;s own description</a> of itself as a &#8220;mission-driven media company.&#8221;</p>
<p>There is no reason to doubt Koechley and Upworthy&#8217;s sincerity about using viral media to advance social change. And it&#8217;s hard not to support much of what they&#8217;re doing; I don&#8217;t like homophobia or bullying either. But it&#8217;s also pretty easy to look at the company and see the seeds of something far more potent than just another viral media site.</p>
<p><em>Correction: This story was updated at 12:08pm to say that BuzzFeed co-founder John Johnson is an investor in Upworthy; an earlier version listed BuzzFeed co-founder Jonah Peretti.</em></p>
<p>(Image by <a id="portfolio_link" href="http://www.shutterstock.com/gallery-589084p1.html">SoulCurry</a> via Shutterstock)</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=225563&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=623777"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=623777" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://paidcontent.org/2013/03/06/how-the-fastest-growing-media-site-could-help-democrats-win-the-next-election/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaompaidcontent.files.wordpress.com/2013/03/democrat-election.jpg?w=150" />
		<media:content url="http://gigaompaidcontent.files.wordpress.com/2013/03/democrat-election.jpg?w=150" medium="image">
			<media:title type="html">Democrat election</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/05dfcf765f1554b08954bb9e1ee63363?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">jeffjohnroberts</media:title>
		</media:content>
	</item>
		<item>
		<title>Gravity giving away personalization to whichever publishers want it</title>
		<link>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/</link>
		<comments>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/#comments</comments>
		<pubDate>Fri, 01 Feb 2013 18:11:45 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[Gravity]]></category>
		<category><![CDATA[Interest Graph]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[personalization]]></category>
		<category><![CDATA[publishing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=606615</guid>
		<description><![CDATA[Gravity, a startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. Considering the increasing importance of personalization online, this could be a good deal.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=224002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.gravity.com/">Gravity</a>, a Santa Monica, Calif-based startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. If you don’t mind a few sponsored stories popping up in the newsfeed — a condition of using the free platform — this could be a pretty good deal.</p>
<p>Gravity’s recommendation system is based on its <a href="http://gigaom.com/2012/03/15/the-personalized-web-is-just-an-interest-graph-away/">interest graph</a> technology, which we detailed last year. Here’s <a href="http://gigaom.com/2012/03/11/can-big-data-fix-a-broken-system-for-software-patents/">how I described it then</a>:</p>
<blockquote id="quote-the-gist-is-that-hum"><p>[T]he gist is that humans first serve as guides for machine-learning algorithms by determining connections between terms within large data sets, then the algorithms take over to complete the job faster than humans ever could. When they’re done, the humans step in one more time to kill any bad connections between terms. The result is a system that can determine with high accuracy that a person tweeting about Vanessa Laine (Los Angeles Laker Kobe Bryant’s ex-wife), for example, is probably more interested in basketball than about Laine’s date of birth or other accurate but irrelevant information.</p></blockquote>
<p>As new content streams into Gravity’s system, it’s analyzed and categorized in real time, then presented to users accordingly based on their interests and behavioral history.</p>
<div id="attachment_606730" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg"><img alt="How Gravity's platform works" src="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg?w=708&#038;h=306" width="708" height="306" class="size-large wp-image-606730"></a><p class="wp-caption-text">How Gravity’s platform works</p></div>
<p>Graph processing and <a href="http://gigaom.com/2011/10/24/springsource-links-up-with-neo-technology-on-nosql/">graph databases</a> — which store and analyze data based on their relationship to one another — are critical to our onlines lives, powering everything from <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">online recommendations</a> to <a href="http://gigaom.com/2013/01/15/a-really-tiny-explanation-of-how-facebooks-graph-search-works/">social search</a> to <a href="http://gigaom.com/2012/08/08/for-google-keeping-search-relevant-means-baking-big-data-into-everything/">knowledge discovery</a>. Graph technologies are also the focal point of some impressive life sciences work from companies such as <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">Syapse</a> and <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a>, which will be presenting at <a href="http://event.gigaom.com/structuredata/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=224002+gravity-giving-away-personalization-to-whichever-publishers-want-it&amp;utm_content=dharrisstructure">Structure: Data</a> in New York next month.</p>
<p>But publishers struggling to stand out on a noisy web might have the most to gain from graphs and personalization, generally. At our <a href="http://event.gigaom.com/paidcontent/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=224002+gravity-giving-away-personalization-to-whichever-publishers-want-it&amp;utm_content=dharrisstructure">PaidContent Live</a> conference (April 17 in New York), executives from Prismatic, Zite and Bluefin Labs will take the stage to talk about the importance of personalization for helping consumers filter through the deluge of content online so they can find what they really want. It’s arguable that the trick to keeping readers happy is knowing what they want to read — possibly better than they do themselves.</p>
<p>According to Gravity, its platform currently “delivers more than 25 million personalized content recommendations per day to more than 200 million users. Beta partners have reported click through rates two to three times above previous levels, return visitation increases of 300 percent and session length increases up to 40 percent.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=224002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=987150"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=987150" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy.jpeg?w=150" medium="image">
			<media:title type="html">canvas-copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg?w=708" medium="image">
			<media:title type="html">How Gravity&#039;s platform works</media:title>
		</media:content>
	</item>
		<item>
		<title>Researchers mine 2.5M news articles to prove what we already know</title>
		<link>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/</link>
		<comments>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/#comments</comments>
		<pubDate>Tue, 27 Nov 2012 02:54:38 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Academia]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[data-mining]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=588140</guid>
		<description><![CDATA[A group of British researchers recently analyzed 2.5 million newspaper articles in order to prove that new data analysis techniques, such as machine learning and natural-language processing, can accurately classify media content. They hope their approach can save academicians untold hours of manual labor.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221191&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A group of British researchers has <a href="http://mediapatterns.enm.bris.ac.uk/AnalysisOfMillionsOfArticles">published the results of a data mining experiment</a> that analyzed nearly 2.5 million articles from 498 newspapers on criteria such as topic selection, writing style and sensationalism, and found &#8212; no surprise &#8212; that tabloids are the easiest to read and reporters don&#8217;t often cover women&#8217;s sports. If these findings sound predictable, that was exactly what the researchers were aiming for.</p>
<p>The experiment&#8217;s techniques actually point to a future where researchers are spared the grunt work of poring through thousands of pages of news or watching hundreds of hours of programming, and can actually focus their energy of explaining. As the researchers <a href="https://patterns.enm.bris.ac.uk/files/DigitalJournalism.pdf">note in their paper</a>, the real ramifications of this research lie more in what it accomplished than in what it found.</p>
<p>Namely, they demonstrated that with new big data techniques such as machine learning and natural-language processing, it&#8217;s possible to accurately analyze millions of pieces of content spanning almost a year without requiring humans to read and score it all. Choosing hypotheses with predictable results meant it was easier to verify their accuracy.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg"><img  title="newspapers_writing_style" alt="" src="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg?w=604&#038;h=454" height="454" width="604" class="aligncenter size-large wp-image-588153" /></a></p>
<p>Here&#8217;s how how they explain the promise of their work and some potential use cases, the latter of which they go into far more detail about in the paper:</p>
<blockquote id="quote-it-allows-researcher"><p>&#8220;[I]t allows researchers to focus their attention on a scale far beyond the sample sizes of traditional forms of content analysis. Rather than spending precious labour on the coding phase of raw data, analysts could focus on designing experiments and comparisons to test their hypotheses, leaving to computers the task of finding all articles of a given topic, measuring various features of their content such as their readability, use of certain forms of language, sources etc. (just a few of the tasks that can now be automated).</p>
<p>&#8230; Our approach &#8212; apart from freeing scholars from more mundane tasks &#8212; allows researchers to turn their attention to higher level properties of global news content, and to begin to explore the features of what has become a vast, multi-dimensional communications system.&#8221;</p></blockquote>
<p>Put more simply: This research underscores the common big data maxim that knowing the right questions to ask is now the biggest challenge in gleaning insights from data. It&#8217;s increasingly easy to get data, analyze it and visualize it, so humans really just need to hypothesize and be able to explain the results. (This also seems like a good place to plug <a href="https://scraperwiki.com">ScraperWiki</a> as a great source for gathering potential research data from websites.)</p>
<p>Creating the workflows for gathering and analyzing the data as the authors suggest still isn&#8217;t child&#8217;s play (it might take some assistance from the computer science department), but it&#8217;s a lot better than the alternative.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-731887p1.html">Shutterstock user Ruggiero Scardigno.</a></em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221191&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=213719"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=213719" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_113800528.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_113800528.jpg?w=150" medium="image">
			<media:title type="html">newspapers</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg?w=604" medium="image">
			<media:title type="html">newspapers_writing_style</media:title>
		</media:content>
	</item>
		<item>
		<title>Data isn&#8217;t just the new oil, it&#8217;s the new money. Ask Zoë Keating</title>
		<link>http://gigaom.com/2012/11/20/data-isnt-just-the-new-oil-its-the-new-money-ask-zoe-keating/</link>
		<comments>http://gigaom.com/2012/11/20/data-isnt-just-the-new-oil-its-the-new-money-ask-zoe-keating/#comments</comments>
		<pubDate>Wed, 21 Nov 2012 02:31:13 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[digital-copyright]]></category>
		<category><![CDATA[online data]]></category>
		<category><![CDATA[pandora]]></category>
		<category><![CDATA[streaming media]]></category>
		<category><![CDATA[user data]]></category>
		<category><![CDATA[web privacy]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=586855</guid>
		<description><![CDATA[In the fight about royalties from streaming media services like Pandora, Popular cellist Zoë Keating says she's willing to give up the money in exchange for data. It's an idea that's gaining traction elsewhere, too, as more companies are paying consumers for their truly valuable data.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>People love to call data the new oil, but that might be selling it short. It&#8217;s only oil when we&#8217;re talking about pools of unrefined data like the stuff web companies collect, which has to be processed and transformed into something useful. There are certain types of data, though &#8212; especially data about consumers &#8212; that are as good as money in the bank without any work at all. And if you don&#8217;t believe me, ask popular cellist Zoë Keating.</p>
<p>As <a href="http://www.nytimes.com/2012/11/05/business/media/fight-growing-over-online-royalties.html?_r=0">a bill attempting to lower the royalty rates</a> paid to artists by streaming music services such as Pandora works its way through Congress, Keating <a href="http://zoekeating.tumblr.com/post/35737991443/what-i-want-from-internet-radio">took to her Tumblr blog last week</a> and offered a solution that both sides should listen to, but won&#8217;t. You might have <a href="http://www.billboard.biz/bbbiz/industry/digital-and-mobile/value-of-music-streaming-is-data-says-artist-1008018162.story">read about her stance in Billboard</a> or <a href="http://www.itworld.com/big-data/317769/data-ultimate-internet-music-royalty?page=0,1">ITworld</a> already, <a href="http://entertainment.slashdot.org/story/12/11/20/0312215/one-musicians-demand-from-pandora-mandatory-analytics">or perhaps on Slashdot</a>. If you haven&#8217;t, here it is in a nutshell, from Keating&#8217;s blog: &#8220;The law only demands I be paid in money, which at this point in my career is not as valuable as information. I’d rather be paid in data.&#8221;</p>
<p>Leaving aside the entire issue about royalties and copyright (and privacy policies), her statement is still powerful. Keating understands that in order to prosper in a world of digital music &#8212; just like in the world of e-commerce, digital publishing, you name it &#8212; information is power. The names, email address and perhaps mobile numbers of individuals listening to her music are nice, clean data that Keating could use with little to no analytic effort by reaching out to fans when a new tour is coming to town or a new album drops.</p>
<p>Actually, Keating <a href="http://zoekeating.tumblr.com/post/36160121213/more-about-data-vs-royalties">noted in a subsequent blog post on Tuesday</a> that even less-personal data can have a material impact on a performer&#8217;s bottom line. Using postal code data provide to her from iTunes sales, she&#8217;s able to plan tours more efficiently because she knows, or can make a safe assumption, that she has paying fans in certain cities.</p>
<p>Touring and merchandise sales remain most artists&#8217; primary means of income, and the current royalty rate of $.0011 per play doesn&#8217;t add up fast (<a href="https://docs.google.com/spreadsheet/ccc?key=0AkasqHkVRM1OdGhjdExSMzYyMXFZUkZNSUJrY3MwNXc&amp;pli=1#gid=0">at least according to Keating&#8217;s math</a>), so it&#8217;s easy to see why she &#8212; and probably many other up-and-coming or niche performers &#8212; would rather have the data that properties like Pandora almost certainly have.</p>
<p>And whether Keating knows it or not, the idea of using data as a substitute for money extends beyond web radio stations and musicians arguing about royalties. A couple weeks ago, I <a href="http://gigaom.com/data/will-consumers-trade-the-keys-to-the-data-castle-for-a-5-gift-card/">highlighted a handful of attempts</a> to convince consumers to hand over, in exchange for cash rewards or product discounts, valuable data that advertisers can&#8217;t collect by tracking their online activity. This is data such as recent and future purchases, personal interests, your web-surfing habits and where you shop in the physical world.</p>
<p>Just like Keating is willing to forgo one-tenth of one cent per play (real money, even if not a lot) in exchange for data, these brands are willing to trade cash or something like it for data <a href="http://gigaom.com/data/5-ideas-to-help-everyone-make-the-most-of-big-data/">they don&#8217;t have to run through a Hadoop cluster and seven segmentation algorithms</a> before they can tie it to a real person. They know they have to give a little bit in order to improve upon the status quo that&#8217;s good, but not nearly good enough for their purposes.</p>
<p>Previously, the notion that data is the currency of the web meant users gave away their behavior data to web sites in exchange for free services. Slowly but surely, however, that notion seems to be evolving. Maybe Zoë Keating wants data in lieu of royalties for the privilege of streaming her music, and maybe a web site wants my offline location data enough to give me a gift card worth enough that I&#8217;d hand it over. Either way, it&#8217;s all about the realization that some data is worth its weight &#8212; and then some &#8212; in cold, hard cash.</p>
<p><em>Feature image courtesy of <a href="http://www.flickr.com/photos/eschipul/3351462308/sizes/m/in/photostream/">Flickr user eschipul</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=368435"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=368435" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/20/data-isnt-just-the-new-oil-its-the-new-money-ask-zoe-keating/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/zoe-keating.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/zoe-keating.jpg?w=150" medium="image">
			<media:title type="html">Zoe Keating</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>
	</item>
		<item>
		<title>What news brand has the most pull on Twitter? Finally, some answers</title>
		<link>http://paidcontent.org/2012/10/16/what-news-brand-has-the-most-pull-on-twitter-finally-some-answers/</link>
		<comments>http://paidcontent.org/2012/10/16/what-news-brand-has-the-most-pull-on-twitter-finally-some-answers/#comments</comments>
		<pubDate>Tue, 16 Oct 2012 16:10:21 +0000</pubDate>
		<dc:creator>Jeff John Roberts</dc:creator>
				<category><![CDATA[bbc]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[Devi Bhattachary]]></category>
		<category><![CDATA[ego network]]></category>
		<category><![CDATA[network analytics]]></category>
		<category><![CDATA[news media]]></category>
		<category><![CDATA[Sudha Ram]]></category>
		<category><![CDATA[The Washington Post]]></category>
		<category><![CDATA[the-new-york-times]]></category>

		<guid isPermaLink="false">http://paidcontent.org/?p=219183</guid>
		<description><![CDATA[A new study shows that the BBC and the New York Times have the most reach and influence on Twitter among news organizations. The findings are just a taste of what we can expect as researchers apply data-based network analysis to patterns of news consumption.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=219183&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Who has more clout in spreading the news: the <em>New York Times</em>, <em>the Guardian</em> or <em>Wired</em>? Such questions have been the stuff of cocktail chatter but now, thanks to the rise of Twitter and big data analytics, we have some hard evidence.</p>
<p>In a new study, two University of Arizona researchers use Twitter&#8217;s emergence as a &#8220;serious newswire&#8221; to compare the reach and longevity of news stories tweeted by organizations like Reuters, NPR and the Washington Post. Over a three-week period last winter, the researchers looked at tweets containing story links and found that stories from the BBC and the New York Times were the most widely retweeted.</p>
<p>The study&#8217;s authors, Sudha Ram and Devi Bhattachary, also looked at metrics like articles&#8217; half-life to determine the popularity and longevity of a news story. They found that articles from BBC, Mashable and the NYT had the longest life span, while the BBC, Mashable and Wired were most likely to publish popular articles &#8212; stories on Twitter that exceeded the average article half-life of 5.5 hours (&#8220;half-life&#8221; is based on a <a href="http://blog.bitly.com/post/9887686919/you-just-shared-a-link-how-long-will-people-pay">bitly definition</a> that says it&#8217;s the amount of time at which a link receives half of the clicks it will ever receive after it’s reached its peak).</p>
<p>The study also looked at rates of engagement &#8212; how often a Twitter user is likely to tweet a given news source. On this front, financial publications like the FT and Forbes scored lowest while the NYT, NPR and the BBC scored highest.</p>
<p>The BBC&#8217;s prominence can be explained in large part by the fact that is has three major Twitter spigots that frequently retweet each other: &#8220;bbcnews,&#8221; &#8220;bbcbreaking&#8221; and &#8220;bbcworld.&#8221; This means that the BBC has far more of what the study calls &#8220;Maximum Level&#8221; retweets &#8212; Level I is an initial retweet, Level II is a retweet of Level I and so on.</p>
<p><a href="http://paidcontent.org/2012/10/16/what-news-brand-has-the-most-pull-on-twitter-finally-some-answers/screen-shot-2012-10-16-at-9-51-08-am/" rel="attachment wp-att-219192"><img  title="Screen Shot 2012-10-16 at 9.51.08 AM" alt="" src="http://gigaompaidcontent.files.wordpress.com/2012/10/screen-shot-2012-10-16-at-9-51-08-am.png?w=708"   class="aligncenter size-full wp-image-219192" /></a></p>
<p>The study, which draws on methods used for epidemics and network analysis, also uses intriguing graphics to display news organizations&#8217; influence. This picture, for example, shows how the NYT and the Washington Post stories produce similar network effects, but the NYT stories are retweeted by more people in isolation:</p>
<p><a href="http://paidcontent.org/2012/10/16/what-news-brand-has-the-most-pull-on-twitter-finally-some-answers/screen-shot-2012-10-16-at-11-02-11-am/" rel="attachment wp-att-219191"><img  title="Screen Shot 2012-10-16 at 11.02.11 AM" alt="" src="http://gigaompaidcontent.files.wordpress.com/2012/10/screen-shot-2012-10-16-at-11-02-11-am.png?w=300&#038;h=167" height="167" width="300" class="aligncenter size-medium wp-image-219191" /></a></p>
<p>So what to make of all this? One obvious observation is that the pool of data tied to Twitter gives news agencies unprecedented tools to measure their influence and shape strategy. But, as the study notes, one size may not fit all:</p>
<blockquote><p>This leads to the question of what constitutes successful news diffusion on Twitter. Bursts of 1st level tweets within the first hour of diffusion (corresponding to instant reach to a large audience) or a high network diameter indicating multiple levels of exchange of news over a period of time (longer lifespan)? This depends on the objective of the news media source.</p></blockquote>
<p>Another takeaway is that these are still early days for data and news analysis. While the Twitter study is intriguing, it is presented (appropriately) in the language of science &#8212; &#8220;edge/node ratios,&#8221; &#8220;ego network details&#8221; and so on. This means it may take time for the study&#8217;s implications to be translated into everyday guidance for publishers and editors.</p>
<div title="Page 3">
<div title="Page 3">
<div title="Page 4">
<div title="Page 5">
<div title="Page 5">
<div title="Page 5">
<div title="Page 5">
<div title="Page 5">
<div title="Page 6">
<div title="Page 6">
<p>The title of the study, <a href="http://uanews.org/story/ua-study-examines-how-news-spreads-twitter">first reported </a>by the University of Arizona, is &#8220;Sharing News Articles Using 140 Characters: A Diffusion Analysis on Twitter.&#8221; It examined tweets from three US news outlets (<em>The New York Times</em>, National Public Radio, and <em>The Washington Post</em>);  three non-US outlets (BBC, <em>Reuters </em>, and <em>The Guardian</em>); three financial News Agencies (<em>Financial Times</em> , <em>Forbes</em>, and <em>Bloomberg</em>); and three tech news sites <em>Ars Technica</em>, <em>Mashable</em>, and <em>Wired</em>.</p>
<p><em>(Image by <a href="http://www.shutterstock.com/gallery-59783p1.html">ARENA Creative</a> via Shutterstock)</em></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=219183&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=78593"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=78593" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://paidcontent.org/2012/10/16/what-news-brand-has-the-most-pull-on-twitter-finally-some-answers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaompaidcontent.files.wordpress.com/2012/10/shutterstock_78975556.jpg?w=150" />
		<media:content url="http://gigaompaidcontent.files.wordpress.com/2012/10/shutterstock_78975556.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_78975556</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/05dfcf765f1554b08954bb9e1ee63363?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">jeffjohnroberts</media:title>
		</media:content>

		<media:content url="http://gigaompaidcontent.files.wordpress.com/2012/10/screen-shot-2012-10-16-at-9-51-08-am.png" medium="image">
			<media:title type="html">Screen Shot 2012-10-16 at 9.51.08 AM</media:title>
		</media:content>

		<media:content url="http://gigaompaidcontent.files.wordpress.com/2012/10/screen-shot-2012-10-16-at-11-02-11-am.png?w=300" medium="image">
			<media:title type="html">Screen Shot 2012-10-16 at 11.02.11 AM</media:title>
		</media:content>
	</item>
		<item>
		<title>Forget your fancy data science, try overkill analytics</title>
		<link>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/</link>
		<comments>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/#comments</comments>
		<pubDate>Fri, 21 Sep 2012 17:00:24 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[data-science]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[kaggle]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=565355</guid>
		<description><![CDATA[Carter S. won his first-ever Kaggle competition -- our own GigaOM WordPress Challenge -- using a brute force method of data science he calls overkill analytics. Rather than spend untold hours perfecting complex models, Carter used simple algorithms and let powerful microprocessors do the rest.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=218093&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Meet Carter S. He used to be a lawyer, but now he writes predictive models for an insurance company. Admittedly green in certain new or advanced modeling methods, he prefers to use simple algorithms and throw as much computing power as possible problems. He <a href="http://www.overkillanalytics.net/about-overkill-analytics/">calls the technique &#8220;overkill analytics,&#8221;</a> and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the <a href="http://www.kaggle.com/c/predict-wordpress-likes">GigaOM WordPress Challenge: Splunk Innovation Prospect</a>  <em>(see disclosure)</em>.</p>
<p>Not only was this Carter&#8217;s first win, it was also his first contest. You can <a href="http://www.overkillanalytics.net/kaggles-wordpress-challenge-the-like-graph/">read the detailed explanation of his victory</a> on his blog, but the gist is that he didn&#8217;t get too involved with complex social graphing to determine relationships or natural language processing to determine topics readers liked. He figured out that most of what people liked came from blogs they&#8217;ve already read, and that the vast majority of posts people liked fell within a three-node radius on a simple social graph.</p>
<p>Statistically speaking, he did a <a href="http://en.wikipedia.org/wiki/Generalized_linear_model">generalized linear regression model</a>, followed by a <a href="http://en.wikipedia.org/wiki/Random_forest">random forest model</a> and averaged the results. &#8220;I&#8217;m not sure it&#8217;s a very unique technique,&#8221; he told me, &#8220;but it&#8217;s certainly a very powerful one.&#8221;</p>
<div id="attachment_565426" class="wp-caption aligncenter" style="width: 590px"><a href="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg"><img  title="blog-wordpress-centralitylift-580x295" src="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg?w=708" alt=""   class="size-full wp-image-565426" /></a><p class="wp-caption-text">Source: Overkill Analytics</p></div>
<p>And therein lies the beauty of overkill analytics, a term that Carter might have coined, but that appears to be catching on &#8212; especially in the world of web companies and big data. Carter says he doesn&#8217;t want to spend a lot of time fine-tuning models, writing complex algorithms or pre-analyzing data to make it work for his purposes. Rather, he wants to utilize some simple models, reduce things to numbers and process the heck out of the data set on as much hardware as is possible.</p>
<p>It&#8217;s not about big data so much as it is about big computing power, he said. There&#8217;s still work to be done on smaller data sets like the majority of the world deals with, but Hadoop clusters and other architectural advances let you do more to that data in a faster time than was previously possible. Now, Carter said, as long as you account for the effects of overprocessing data, you can create a black-box-like system and run every combination of simple techniques on data until you get the most-accurate answer.</p>
<p>I <a href="http://gigaom.com/data/5-ideas-to-help-everyone-make-the-most-of-big-data/">wrote about the same general theory recently</a> in explaining why Sparked.com&#8217;s Daniel Wiesenthal believes that big data (i.e., lots and lots of data combined with new storage and processing technologies) improves the practice of data science (i.e., the application of statistical techniques to data). The gist of his theory is that although complex models are great for small data sets, simple models can close the accuracy gap when applied to large data sets. Combine that with infrastructure that can process a lot of data relatively fast and support a wide variety of jobs, and you have a simpler, faster equally effective method.</p>
<p>Still, Carter said he didn&#8217;t get involved in Kaggle just to prove the effectiveness of overkill analytics. He does hope to get exposed to new data science techniques that haven&#8217;t yet caught on in the insurance industry, and he also wants to make a name for himself. When you work for a company with little turnover, he said, your professional network doesn&#8217;t grow too much, but doing Kaggle competitions is a great way to meet other data scientists &#8212; and <a href="http://gigaom.com/data/can-kaggle-make-data-science-a-spectator-sport/">winning is a great way to earn respect</a>.</p>
<p>Ali Ahmad (username Xali) won the separate Splunk Innovation portion of the contest. According to a statement from Splunk, he &#8220;used Splunk&#8217;s built in statistical and visualization features to map out the relationship between blogs containing YouTube videos with those that are most likely to be viral, as measured by likes and shares. As a bonus, he fed the data into an app to view the YouTube videos most commonly liked and shared via WordPress blogs!&#8221;</p>
<p><em><strong>Disclosure</strong>: Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOm. Om Malik, founder of GigaOm, is also a venture partner at True.</em></p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-674152p1.html">Shutterstock user nasirkhan</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=218093&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=66569"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=66569" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/09/21/forget-your-fancy-data-science-try-overkill-analytics/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_86909912.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/09/shutterstock_86909912.jpg?w=150" medium="image">
			<media:title type="html">workflow</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/09/blog-wordpress-centralitylift-580x295.jpg" medium="image">
			<media:title type="html">blog-wordpress-centralitylift-580x295</media:title>
		</media:content>
	</item>
		<item>
		<title>How India&#8217;s favorite TV show uses data to change the world</title>
		<link>http://gigaom.com/2012/08/11/how-indias-favorite-tv-show-uses-data-to-change-the-world/</link>
		<comments>http://gigaom.com/2012/08/11/how-indias-favorite-tv-show-uses-data-to-change-the-world/#comments</comments>
		<pubDate>Sat, 11 Aug 2012 19:00:36 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[bollywood]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[india]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[television]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=551595</guid>
		<description><![CDATA[Satyamev Jayate, one of India's highest-rated television shows, is using data as a means to effect meaningful change. The show's producers are aggregating and analyzing the millions of messages they receive on controversial issues to do everything from planning future episodes to pushing for political change.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=216268&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Every Sunday morning, millions of people in India tune in to watch Bollywood star <a href="http://en.wikipedia.org/wiki/Aamir_Khan">Aamir Khan</a> host one of the country&#8217;s highest-rated television shows, <a href="http://www.satyamevjayate.in/">Satyamev Jayate</a>. Only unlike so many popular programs, <a href="http://www.satyamevjayate.in/">Satyamev Jayate</a> doesn&#8217;t involve a singing competition or a collection of volatile strangers living under the same roof. It&#8217;s a documentary program tackling some of the country&#8217;s most-sensitive topics, and it has the whole country &#8212; indeed, the whole world &#8212; talking. In order to funnel millions of messages a week into something valuable, the shows producers have turned to big data.</p>
<p>Aside from Khan&#8217;s star power, the show is so popular because of the types of issues it tackles &#8212; <a href="http://en.wikipedia.org/wiki/Female_foeticide_in_India">female feticide</a>, caste discrimination, dowry deaths, child abuse and medical practice among them. According to one of the show&#8217;s producers, the amount of engagement and the number of responses from viewers is &#8220;completely unprecedented.&#8221; Here&#8217;s a sample of what we&#8217;re talking about, just 13 episodes into the show&#8217;s existence:</p>
<ul>
<li>400 million viewers on Indian television and across the world on YouTube.</li>
<li>More than 1.2 billion people have connected with Satyamev Jayate across its website, Facebook, Twitter, YouTube and mobile devices.</li>
<li>More than 8 million people have contributed a total of more than 14 million responses to the show&#8217;s content via Facebook, web comments, text-message votes and a telephone hotline. More than 100,000 new people respond each week.</li>
</ul>
<p>The responses take all sorts of forms, from votes on a weekly poll question to long, heartfelt letters explaining a viewer&#8217;s experience with an issue or how the show has changed their thinking on an issue. And although 95 percent of responses come from India, the show has received them from 5,000 locations in 165 countries, including as far away as northern Canada and Alaska. The show&#8217;s topics regularly rank among the top trends on Twitter shortly after each episode airs.</p>
<p>Surprisingly, the producer said, the India-created Satyamev Jayate has not received a single piece of hate mail from bitter geopolitical rival Pakistan. In fact, there have been numerous requests for an episode on India-Pakistan unity. (If you have 90 minutes, here&#8217;s an episode on human dignity.)</p>
<span class='embed-youtube' style='text-align:center; display: block;'><iframe class='youtube-player' type='text/html' width='604' height='370' src='http://www.youtube.com/embed/7OUoXsryE3c?version=3&#038;rel=1&#038;fs=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;wmode=transparent' frameborder='0'></iframe></span>
<h2 id="parsing-through-millions-of-me">Parsing through millions of messages</h2>
<p>In order keep up with all the messages, Satyamev Jayate turned to <a href="http://www.persistentsys.com/">Persistent Systems</a>, an Indian IT consultancy with offices around the world, which created a system for automating their analysis. Here&#8217;s how the process works.</p>
<p>About a day-and-a-half before each show, Satyamev Jayate&#8217;s production company tells Persistent what the issue will be and the two groups come up with a taxonomy that will help the system sort through messages based on what topics will be brought up during Sunday&#8217;s show. But it&#8217;s not by any means the definitive list. As activity ramps up on Twitter while the show airs (tweet rates are highest during commercials and immediately after it ends, by the way), the team gets a sense of what topics are resonating with viewers and what themes they can expect in the nearly million responses that will follow.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/08/satyamev1.jpg"><img  title="satyamev" src="http://gigaom2.files.wordpress.com/2012/08/satyamev1.jpg?w=206&#038;h=300" alt="" width="206" height="300" class="alignright size-medium wp-image-551830" /></a>When the responses actually do start pouring in after lunch, they hit a system designed by Persistent to automatically tag them and score them based on interest level and sentiment. So, as Mukund Deshpande, head of business intelligence and analytics at Persistent, told me, a long message with an interesting story will be marked as higher quality, while a short, congratulatory note will be scored lower. Because so many viewers write in &#8220;Hinglish,&#8221; a combination of Hindi and English, an off-the-shelf system wouldn&#8217;t have been as accurate for processing these messages.</p>
<p>In the future, he&#8217;d like to train the system to recognize various gradients of emotion, too, beyond just simple sentiment. That means not just &#8220;positive&#8221; or &#8220;negative,&#8221; but also &#8220;happy,&#8221; &#8220;sad,&#8221; &#8220;angry&#8221; and any other way a viewer might be feeling.</p>
<p>The best messages are then sent to a team of trained analysts &#8212; often college students and graduates, along with some Persistent employees &#8212; who decide which ones are worth following up on for a Friday radio show Khan does, and for <a href="http://www.satyamevjayate.in/issue06/indiasays/">placement on Satyamev Jayate&#8217;s web site</a>. These analysts try to ensure that the stories shared are truthful and that the messages don&#8217;t contain personal information that could get viewers in trouble or affect their privacy. Data visualizations about how many people have responded and where they come from is available on the <a href="http://www.satyamevjayate.in/impact/impact.php/">Impact section of the show&#8217;s site</a>, as well as on separate Impact pages for each episode.</p>
<h2 id="making-a-difference-with-data">Making a difference with data</h2>
<div id="attachment_551814" class="wp-caption alignleft" style="width: 209px"><a href="http://gigaom2.files.wordpress.com/2012/08/khan-copy.jpg"><img  title="khan copy" src="http://gigaom2.files.wordpress.com/2012/08/khan-copy.jpg?w=199&#038;h=300" alt="" width="199" height="300" class="size-medium wp-image-551814" /></a><p class="wp-caption-text">Aamir Khan</p></div>
<p>All this feedback has an impact, both on the show itself and on India. Satyamev Jayate&#8217;s voting process, in particular, has yielded some impressive results. After the first episode about female feticide, or the selective abortion of female fetuses, 99.8 percent of viewers said they agreed with the idea of a fast-track court to prosecute doctors who perform such operations. When Khan presented the results to the Indian government, officials <a href="http://articles.timesofindia.indiatimes.com/2012-05-11/jaipur/31668741_1_chief-justice-rajasthan-high-court-female-feticide">agreed almost immediately</a> to amend the court system accordingly, the producer told me.</p>
<p>Sometimes, though, the results simply present an interesting &#8212; if not troubling &#8212; view into the Indian subconscious. Almost 32 percent of respondents, for example, voted in favor of the right of families to use force preventing the marriage of two willing adults (subsequent analysis uncovered some reasons why, including continuing opposition to inter-caste marriage), while almost 14 percent of respondents one week said that beating a woman is a sign of masculinity. And although women comprise only about 32 percent of the show&#8217;s audience, they have accounted for the majority of responses on shows addressing issues important to them.</p>
<p>The producer said his team also uses the data to inspire ideas for future shows and to populate a weekly radio show that Khan does with a local journalist. The Satyamev Jayate team analyzes the week&#8217;s messages in order to pick the most powerful and determine trends in viewers&#8217; feelings, and Khan shares them during the interview. The second season, he said, will be shaped in part by how viewers responded to the format during the first season and the issues they want covered next.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/08/sat2.jpg"><img  title="sat2" src="http://gigaom2.files.wordpress.com/2012/08/sat2.jpg?w=178&#038;h=300" alt="" width="178" height="300" class="alignright size-medium wp-image-551817" /></a>Beyond just the next season, though &#8212; and the occasional political victory &#8212; the hope is that all the data Satyamev Jayate generates will have continuing utility. Deshpande said he&#8217;d like to see it used for ethnographic and social science research, because the dataset is larger than most academic studies could generate (something that&#8217;s <a href="http://gigaom.com/cloud/better-medicine-brought-to-you-by-big-data/">already happening with crowdsourced medical research</a>) and it&#8217;s very high quality because of the demographic and geographic information attached to it.</p>
<p>However, the producer with whom I spoke seems perfectly content right now with the way Satyamev Jayate is resonating with the public. For example, he said, viewers are reporting crimes they previously might not have considered too big a deal and are reaching out to disabled citizens. This is the first time many people are speaking openly about these issues, he said, and they&#8217;re able to track the effects because they&#8217;re able to ensure no message is left behind.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=216268&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=197024"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=197024" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/08/11/how-indias-favorite-tv-show-uses-data-to-change-the-world/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/satyamev2.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/satyamev2.jpg?w=150" medium="image">
			<media:title type="html">satyamev</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/satyamev1.jpg?w=206" medium="image">
			<media:title type="html">satyamev</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/khan-copy.jpg?w=199" medium="image">
			<media:title type="html">khan copy</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/08/sat2.jpg?w=178" medium="image">
			<media:title type="html">sat2</media:title>
		</media:content>
	</item>
		<item>
		<title>Big e-reader is watching you</title>
		<link>http://paidcontent.org/2012/06/29/big-e-reader-is-watching-you/</link>
		<comments>http://paidcontent.org/2012/06/29/big-e-reader-is-watching-you/#comments</comments>
		<pubDate>Fri, 29 Jun 2012 14:02:57 +0000</pubDate>
		<dc:creator>Laura Hazard Owen</dc:creator>
				<category><![CDATA[amazon]]></category>
		<category><![CDATA[barnes & noble]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[e-readers]]></category>
		<category><![CDATA[ebooks]]></category>
		<category><![CDATA[hunger games]]></category>
		<category><![CDATA[jim hilt]]></category>
		<category><![CDATA[kobo]]></category>

		<guid isPermaLink="false">http://paidcontent.org/?p=212803</guid>
		<description><![CDATA[Amazon, Barnes &#038; Noble and other e-reader companies are collecting data about your e-book reading habits, but they're keeping their most interesting findings close to the vest.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=212803&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaompaidcontent.files.wordpress.com/2012/02/kindle-commercial-the-book-lives-on-o2.png"><img  title="Kindle commercial, &quot;The Book Lives On&quot;" src="http://gigaompaidcontent.files.wordpress.com/2012/02/kindle-commercial-the-book-lives-on-o2.png?w=300&#038;h=160" alt="" width="300" height="160" class="alignright size-medium wp-image-200479" /></a>&#8220;Because sometimes things happen to people and they&#8217;re not equipped to deal with them.&#8221; That non-grammatical sentence &#8212; from <em>Catching Fire</em>, the second book in Suzanne Collins&#8217; &#8220;Hunger Games&#8221; trilogy, is the most-highlighted passage ever on Kindle, with nearly 18,000 readers marking it. But you can bet Amazon is collecting much more interesting data about Kindle users than that.</p>
<p>The company isn&#8217;t willing to share such data with the Wall Street Journal, but Barnes &amp; Noble and Kobo talk a bit about the types of data collection they&#8217;re doing in <a href="http://online.wsj.com/article/SB10001424052702304870304577490950051438304.html">this piece</a>. For instance, they can track where a reader stops reading an ebook. The article notes a few non-surprising results from Barnes &amp; Noble: readers of genre fiction (romance, mystery, science fiction) read fast and finish books; &#8220;nonfiction books, particularly long ones, tend to get dropped earlier.&#8221;</p>
<p>Jim Hilt, the company&#8217;s VP ebooks, says Barnes &amp; Noble is in &#8220;the earliest stages of deep analytics&#8221; and has &#8220;more data than we can use,&#8221; but in some cases it&#8217;s making decisions based on the data:</p>
<blockquote><p>Mr. Hilt says that when the data showed that Nook readers routinely quit long works of nonfiction, the company began looking for ways to engage readers in nonfiction and long-form journalism. They decided to launch &#8220;Nook Snaps,&#8221; short works on topics ranging from weight loss and religion to the Occupy Wall Street movement.</p></blockquote>
<p>For now, Nook Snaps, Barnes &amp; Noble&#8217;s e-singles section, <a href="http://paidcontent.org/2011/12/12/419-with-netflix-for-nook-color-barnes-noble-fights-against-kindle-fire/">remains</a> B&amp;N&#8217;s unimpressive competitor to Kindle Singles. Apparently, though, it&#8217;s an area Barnes &amp; Noble wants to focus on.</p>
<p>Amazon, which is likely doing the most interesting and large-scale data collection and analysis of e-book readers, &#8220;declined to comment on how it analyzes and uses the Kindle data it gathers.&#8221; And from this piece, we learn that <a href="http://paidcontent.org/2012/01/17/419-kindle-startup-focuses-on-interactive-fiction-for-adults/">interactive fiction startup Coliloquy</a> &#8212; the first app to be chosen for Amazon&#8217;s Kindle data developer program &#8212; can&#8217;t disclose sales figures because of an nondisclosure agreement with Amazon. That suggests that as Amazon starts adding more interactive reading apps to Kindle, it will keep the most interesting data it gleans close to the vest &#8212; while doling out &#8220;stats&#8221; like the non-surprise that people <a href="https://kindle.amazon.com/most_popular/">really like</a> the first sentence of <em>Pride and Prejudice</em>, but even more than that, they like the Hunger Games.</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=212803&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=429838"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=429838" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://paidcontent.org/2012/06/29/big-e-reader-is-watching-you/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaompaidcontent.files.wordpress.com/2012/02/kindle-commercial-the-book-lives-on-o2.png?w=150" />
		<media:content url="http://gigaompaidcontent.files.wordpress.com/2012/02/kindle-commercial-the-book-lives-on-o2.png?w=150" medium="image">
			<media:title type="html">Kindle commercial, &#34;The Book Lives On&#34;</media:title>
		</media:content>

		<media:content url="http://2.gravatar.com/avatar/83965de6c2033ee5ab075123394cec0a?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">laurahowen38</media:title>
		</media:content>

		<media:content url="http://gigaompaidcontent.files.wordpress.com/2012/02/kindle-commercial-the-book-lives-on-o2.png?w=300" medium="image">
			<media:title type="html">Kindle commercial, &#34;The Book Lives On&#34;</media:title>
		</media:content>
	</item>
		<item>
		<title>GigaOM Data Challenge: Predict which stories get read, win $10K</title>
		<link>http://gigaom.com/cloud/gigaom-meets-kaggle-predict-wholl-read-what-win-10k/</link>
		<comments>http://gigaom.com/cloud/gigaom-meets-kaggle-predict-wholl-read-what-win-10k/#comments</comments>
		<pubDate>Wed, 20 Jun 2012 16:30:18 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[data-science]]></category>
		<category><![CDATA[kaggle]]></category>
		<category><![CDATA[splunk]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=534422</guid>
		<description><![CDATA[In publishing, there's a constant struggle to determine who'll read what posts, what the ideal headline might is and when is the best time to publish. GigaOM is teaming with Splunk to find some answers via a Kaggle competition worth a total of $25,000 in prizes.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=211988&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg"><img title="shutterstock_53433448" src="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=300&#038;h=200" alt="" width="300" height="200" class="alignleft size-medium wp-image-534438"></a>In publishing, analytics matter a lot. There’s a constant struggle to determine who will read what posts or articles, what the ideal headline might be and when publishing makes the most sense. That’s why GigaOM is teaming with <a href="http://www.splunk.com/">Splunk</a> to help find that answer.</p>
<p>We’re <a href="https://www.kaggle.com/c/predict-wordpress-likes">hosting a competition</a> on <a href="http://kaggle.com">Kaggle’s data science platform</a> to find the best models around likely readership across the WordPress <em>(see disclosure) </em>ecosystem of blogs. Here are the details:</p>
<blockquote><p>The challenge is to predict whether a particular user will like a particular WordPress blog post.  The data consists of eight weeks of posts collected by WordPress, along with anonymized user responses to each post.  This challenge is an interesting mix of natural language processing (the raw blog posts) and metadata on the blogs and users. Contestants can download the data and submit prediction through the Kaggle platform, but a <strong>new feature for this competition</strong> is that they will also have free access to a Splunk server containing all the data, which they can employ for data exploration, visualization, feature extraction and modeling.</p></blockquote>
<p>Aside from offering resources to work on the data, Splunk is also putting up $25,000 in prize money. The winning model will receive $10,000, second place $5,000, third place $3,000 and fourth place $2,000.</p>
<p>There’s also a $5,000 Splunk Innovation Prize for the most innovative use of data science, whether that comes in the form of a visualization, app, business model, you name it. Submissions for the latter track can be submitted through <a href="http://gigaom.com/cloud/kaggle-is-now-crowdsourcing-data-science-creativity/">Kaggle’s new Prospect platform</a>. Winners for both competitions will be announced at <a href="http://event.gigaom.com/mobilize/?utm_source=cloud&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=211988+gigaom-meets-kaggle-predict-wholl-read-what-win-10k&amp;utm_content=dharrisstructure">GigaOM Mobilize</a> in September.</p>
<p>You can find out <a href="https://www.kaggle.com/c/predict-wordpress-likes">more about the competition here</a>. Good luck!</p>
<p><em><strong>Disclosure:</strong> Automattic, maker of WordPress.com, is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, GigaOM. Om Malik, founder of GigaOM, is also a venture partner at True.</em></p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com/gallery-421981p1.html">Shutterstock user sukiyaki</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=211988&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=129620"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=129620" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/cloud/gigaom-meets-kaggle-predict-wholl-read-what-win-10k/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=150" medium="image">
			<media:title type="html">shutterstock_53433448</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/06/shutterstock_53433448.jpg?w=300" medium="image">
			<media:title type="html">shutterstock_53433448</media:title>
		</media:content>
	</item>
	</channel>
</rss>
