<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>paidContent &#187; machine-learning</title>
	<atom:link href="http://paidcontent.org/tag/machine-learning/feed/" rel="self" type="application/rss+xml" />
	<link>http://paidcontent.org</link>
	<description>The economics of digital content</description>
	<lastBuildDate>Wed, 19 Jun 2013 13:57:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='paidcontent.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/89ee7e1250b4095eefb87d28e6e64947?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>paidContent &#187; machine-learning</title>
		<link>http://paidcontent.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://paidcontent.org/osd.xml" title="paidContent" />
	<atom:link rel='hub' href='http://paidcontent.org/?pushpress=hub'/>
		<item>
		<title>Gravity giving away personalization to whichever publishers want it</title>
		<link>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/</link>
		<comments>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/#comments</comments>
		<pubDate>Fri, 01 Feb 2013 18:11:45 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[graph database]]></category>
		<category><![CDATA[graph processing]]></category>
		<category><![CDATA[Gravity]]></category>
		<category><![CDATA[Interest Graph]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[personalization]]></category>
		<category><![CDATA[publishing]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=606615</guid>
		<description><![CDATA[Gravity, a startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. Considering the increasing importance of personalization online, this could be a good deal.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=224002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.gravity.com/">Gravity</a>, a Santa Monica, Calif-based startup that personalizes reader content for web publishers, is opening up its recommendation engine to anyone that wants to use it. If you don’t mind a few sponsored stories popping up in the newsfeed — a condition of using the free platform — this could be a pretty good deal.</p>
<p>Gravity’s recommendation system is based on its <a href="http://gigaom.com/2012/03/15/the-personalized-web-is-just-an-interest-graph-away/">interest graph</a> technology, which we detailed last year. Here’s <a href="http://gigaom.com/2012/03/11/can-big-data-fix-a-broken-system-for-software-patents/">how I described it then</a>:</p>
<blockquote id="quote-the-gist-is-that-hum"><p>[T]he gist is that humans first serve as guides for machine-learning algorithms by determining connections between terms within large data sets, then the algorithms take over to complete the job faster than humans ever could. When they’re done, the humans step in one more time to kill any bad connections between terms. The result is a system that can determine with high accuracy that a person tweeting about Vanessa Laine (Los Angeles Laker Kobe Bryant’s ex-wife), for example, is probably more interested in basketball than about Laine’s date of birth or other accurate but irrelevant information.</p></blockquote>
<p>As new content streams into Gravity’s system, it’s analyzed and categorized in real time, then presented to users accordingly based on their interests and behavioral history.</p>
<div id="attachment_606730" class="wp-caption aligncenter" style="width: 718px"><a href="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg"><img alt="How Gravity's platform works" src="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg?w=708&#038;h=306" width="708" height="306" class="size-large wp-image-606730"></a><p class="wp-caption-text">How Gravity’s platform works</p></div>
<p>Graph processing and <a href="http://gigaom.com/2011/10/24/springsource-links-up-with-neo-technology-on-nosql/">graph databases</a> — which store and analyze data based on their relationship to one another — are critical to our onlines lives, powering everything from <a href="http://gigaom.com/2013/01/29/you-might-also-like-to-know-how-online-recommendations-work/">online recommendations</a> to <a href="http://gigaom.com/2013/01/15/a-really-tiny-explanation-of-how-facebooks-graph-search-works/">social search</a> to <a href="http://gigaom.com/2012/08/08/for-google-keeping-search-relevant-means-baking-big-data-into-everything/">knowledge discovery</a>. Graph technologies are also the focal point of some impressive life sciences work from companies such as <a href="http://gigaom.com/2013/01/22/biotech-startup-syapse-wants-to-be-salesforce-com-for-our-genomes/">Syapse</a> and <a href="http://gigaom.com/2013/01/16/has-ayasdi-turned-machine-learning-into-a-magic-bullet/">Ayasdi</a>, which will be presenting at <a href="http://event.gigaom.com/structuredata/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=224002+gravity-giving-away-personalization-to-whichever-publishers-want-it&amp;utm_content=dharrisstructure">Structure: Data</a> in New York next month.</p>
<p>But publishers struggling to stand out on a noisy web might have the most to gain from graphs and personalization, generally. At our <a href="http://event.gigaom.com/paidcontent/schedule/?utm_source=data&amp;utm_medium=editorial&amp;utm_campaign=intext&amp;utm_term=224002+gravity-giving-away-personalization-to-whichever-publishers-want-it&amp;utm_content=dharrisstructure">PaidContent Live</a> conference (April 17 in New York), executives from Prismatic, Zite and Bluefin Labs will take the stage to talk about the importance of personalization for helping consumers filter through the deluge of content online so they can find what they really want. It’s arguable that the trick to keeping readers happy is knowing what they want to read — possibly better than they do themselves.</p>
<p>According to Gravity, its platform currently “delivers more than 25 million personalized content recommendations per day to more than 200 million users. Beta partners have reported click through rates two to three times above previous levels, return visitation increases of 300 percent and session length increases up to 40 percent.”</p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=224002&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=66874"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=66874" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2013/02/01/gravity-giving-away-personalization-to-whichever-publishers-want-it/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy.jpeg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/08/canvas-copy.jpeg?w=150" medium="image">
			<media:title type="html">canvas-copy</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2013/02/gravity.jpg?w=708" medium="image">
			<media:title type="html">How Gravity&#039;s platform works</media:title>
		</media:content>
	</item>
		<item>
		<title>Researchers mine 2.5M news articles to prove what we already know</title>
		<link>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/</link>
		<comments>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/#comments</comments>
		<pubDate>Tue, 27 Nov 2012 02:54:38 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Academia]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[big-data]]></category>
		<category><![CDATA[data-mining]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=588140</guid>
		<description><![CDATA[A group of British researchers recently analyzed 2.5 million newspaper articles in order to prove that new data analysis techniques, such as machine learning and natural-language processing, can accurately classify media content. They hope their approach can save academicians untold hours of manual labor.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221191&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A group of British researchers has <a href="http://mediapatterns.enm.bris.ac.uk/AnalysisOfMillionsOfArticles">published the results of a data mining experiment</a> that analyzed nearly 2.5 million articles from 498 newspapers on criteria such as topic selection, writing style and sensationalism, and found &#8212; no surprise &#8212; that tabloids are the easiest to read and reporters don&#8217;t often cover women&#8217;s sports. If these findings sound predictable, that was exactly what the researchers were aiming for.</p>
<p>The experiment&#8217;s techniques actually point to a future where researchers are spared the grunt work of poring through thousands of pages of news or watching hundreds of hours of programming, and can actually focus their energy of explaining. As the researchers <a href="https://patterns.enm.bris.ac.uk/files/DigitalJournalism.pdf">note in their paper</a>, the real ramifications of this research lie more in what it accomplished than in what it found.</p>
<p>Namely, they demonstrated that with new big data techniques such as machine learning and natural-language processing, it&#8217;s possible to accurately analyze millions of pieces of content spanning almost a year without requiring humans to read and score it all. Choosing hypotheses with predictable results meant it was easier to verify their accuracy.</p>
<p><a href="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg"><img  title="newspapers_writing_style" alt="" src="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg?w=604&#038;h=454" height="454" width="604" class="aligncenter size-large wp-image-588153" /></a></p>
<p>Here&#8217;s how how they explain the promise of their work and some potential use cases, the latter of which they go into far more detail about in the paper:</p>
<blockquote id="quote-it-allows-researcher"><p>&#8220;[I]t allows researchers to focus their attention on a scale far beyond the sample sizes of traditional forms of content analysis. Rather than spending precious labour on the coding phase of raw data, analysts could focus on designing experiments and comparisons to test their hypotheses, leaving to computers the task of finding all articles of a given topic, measuring various features of their content such as their readability, use of certain forms of language, sources etc. (just a few of the tasks that can now be automated).</p>
<p>&#8230; Our approach &#8212; apart from freeing scholars from more mundane tasks &#8212; allows researchers to turn their attention to higher level properties of global news content, and to begin to explore the features of what has become a vast, multi-dimensional communications system.&#8221;</p></blockquote>
<p>Put more simply: This research underscores the common big data maxim that knowing the right questions to ask is now the biggest challenge in gleaning insights from data. It&#8217;s increasingly easy to get data, analyze it and visualize it, so humans really just need to hypothesize and be able to explain the results. (This also seems like a good place to plug <a href="https://scraperwiki.com">ScraperWiki</a> as a great source for gathering potential research data from websites.)</p>
<p>Creating the workflows for gathering and analyzing the data as the authors suggest still isn&#8217;t child&#8217;s play (it might take some assistance from the computer science department), but it&#8217;s a lot better than the alternative.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-731887p1.html">Shutterstock user Ruggiero Scardigno.</a></em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=221191&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=170778"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=170778" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/26/researchers-mine-2-5m-news-articles-to-prove-what-we-already-know/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_113800528.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/shutterstock_113800528.jpg?w=150" medium="image">
			<media:title type="html">newspapers</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/newspapers_writing_style.jpg?w=604" medium="image">
			<media:title type="html">newspapers_writing_style</media:title>
		</media:content>
	</item>
		<item>
		<title>MIT researcher says he can predict Twitter trends</title>
		<link>http://gigaom.com/2012/11/01/mit-researcher-says-he-can-predict-twitter-trends/</link>
		<comments>http://gigaom.com/2012/11/01/mit-researcher-says-he-can-predict-twitter-trends/#comments</comments>
		<pubDate>Thu, 01 Nov 2012 18:06:11 +0000</pubDate>
		<dc:creator>Derrick Harris</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[data-science]]></category>
		<category><![CDATA[machine-learning]]></category>
		<category><![CDATA[predictive analytics]]></category>
		<category><![CDATA[social-media]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://gigaom.com/?p=579682</guid>
		<description><![CDATA[An MIT researcher says he has created an algorithm that can identify Twitter trends hours before the service can itself. If the algorithm works as he says, it could help Twitter -- and many more companies -- make a lot of money.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=220031&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A researcher at MIT claims to have developed an algorithm that can accurately predict what topics will trend on Twitter. But Twitter being a relatively minor business in the grand scheme of things, the algorithm might end up being more useful elsewhere, predicting stock prices, ticket sales and other dynamically changing quantities.</p>
<p>According to <a href="http://web.mit.edu/press/2012/predicting-twitter-trending-topics.html">a release from the MIT News Office</a>, Associate Professor Devavrat Shah says his model has been 95 percent accurate during testing and has been predicting trends hours before they appear on Twitter&#8217;s list. The algorithm incorporates a new approach to machine learning that compares real-time data with historical data and predicts outcomes based on past events that most closely align with the current situation. So, rather than analyzing a topic&#8217;s chances of trending equally against the entire historical corpus of topics, it will assign more weight to topics whose paths followed similar trajectories up the ranks of top trends.</p>
<p>And Twitter is certainly interested in the research. A company spokesperson emailed me to point out that Shah&#8217;s graduate research assistant, Stanislav Nikolov, is a Twitter employee.</p>
<div id="attachment_579769" class="wp-caption alignleft" style="width: 310px"><a href="http://gigaom2.files.wordpress.com/2012/11/trends.jpg"><img  title="trends" alt="" src="http://gigaom2.files.wordpress.com/2012/11/trends.jpg?w=300&#038;h=217" height="217" width="300" class="size-medium wp-image-579769" /></a><p class="wp-caption-text">Imagine knowing these topics before Twitter does.</p></div>
<p>However, the algorithm&#8217;s level of accuracy and speed would have to translate to a much-larger and more-complex stage &#8212; Twitter&#8217;s real-life firehose and stockpile of historical tweets &#8212; if the company were to use its predictions to charge premiums for ads associated with certain topics, as Shah suggests. Advertisers might not be happy to pay premium rates for topics that fizzle out before ever becoming top trends (although a tiered rate system based on the model&#8217;s confidence or, perhaps, projected ranking among top trends could work). Thus far, the algorithm has been trained using a set of 400 topics, half of which trended and half of which did not.</p>
<p>Shah thinks it&#8217;s a great fit for Twitter data because the data is relatively clean and he has found a strong correlation between past and future activity. Other historical data sets might be more messy or have more noise than does Twitter&#8217;s data set, which would make it much more difficult to filter out extraneous data and discern the real factors that lead to a particular result. However, even Twitter has presented research showing, in the case of its search engine at least, how the sheer volume of data it receives and the speed at which it comes in <a href="http://gigaom.com/cloud/twitter-shows-when-we-tweet-and-explains-why-its-search-sucks/">can make it difficult to accurately predict what someone wants to see</a>.</p>
<p>The good news, though, for anyone willing to give Shah&#8217;s algorithm a try is that it&#8217;s designed to process data in parallel across scale-out systems like those used by large web companies. Therefore, training it and then running it in production across a voluminous data set <a href="http://gigaom.com/cloud/skytree-intros-machine-learning-for-the-masses/">won&#8217;t run into the same obstacles traditionally faced by machine learning algorithms</a> as data sizes increase. And there are potentially more lucrative and rewarding endeavors that could benefit from this type of predictive power: Shah suggests stock markets, movie ticket sales and public transportation as possibilities, but others might include combating cybercrime by identifying threats earlier or predicting the severity of disease outbreaks.</p>
<p><em>Feature image courtesy of <a href="http://www.shutterstock.com/gallery-932215p1.html">Shutterstock user turtleteeth</a>.</em></p>
<br />  <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=paidcontent.org&#038;blog=33319749&#038;post=220031&#038;subd=gigaompaidcontent&#038;ref=&#038;feed=1" width="1" height="1" /><p><a href="http://pubads.g.doubleclick.net/gampad/jump?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=472118"><img src="http://pubads.g.doubleclick.net/gampad/ad?iu=/1008864/PaidContent_RSS_300x250&#038;sz=300x250&#038;c=472118" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://gigaom.com/2012/11/01/mit-researcher-says-he-can-predict-twitter-trends/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:thumbnail url="http://gigaom2.files.wordpress.com/2012/11/twitter-network-data.jpg?w=150" />
		<media:content url="http://gigaom2.files.wordpress.com/2012/11/twitter-network-data.jpg?w=150" medium="image">
			<media:title type="html">twitter network data</media:title>
		</media:content>

		<media:content url="http://0.gravatar.com/avatar/9e48ffa0913f65c577727457dd63023f?s=96&#38;d=retro&#38;r=PG" medium="image">
			<media:title type="html">dharrisstructure</media:title>
		</media:content>

		<media:content url="http://gigaom2.files.wordpress.com/2012/11/trends.jpg?w=300" medium="image">
			<media:title type="html">trends</media:title>
		</media:content>
	</item>
	</channel>
</rss>
