<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rethinking Markets &#187; Data</title>
	<atom:link href="http://www.rethinkingmarkets.org/category/data/feed" rel="self" type="application/rss+xml" />
	<link>http://www.rethinkingmarkets.org</link>
	<description>Economic Sociology from the Ground Up</description>
	<lastBuildDate>Thu, 20 Oct 2011 13:36:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>&#039;under the surface&#039;</title>
		<link>http://www.rethinkingmarkets.org/2010/06/16/under-the-surface.html</link>
		<comments>http://www.rethinkingmarkets.org/2010/06/16/under-the-surface.html#comments</comments>
		<pubDate>Wed, 16 Jun 2010 23:19:20 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Ramble]]></category>

		<guid isPermaLink="false">http://www.rethinkingmarkets.org/?p=1231</guid>
		<description><![CDATA[I find this post on data visualization insightful in a way that connects deeply to how I think about the world. But also because I have never ever (consciously) noticed before the &#8216;arrow&#8217; in the FedEx logo.]]></description>
			<content:encoded><![CDATA[<p>I find this post on <a href="http://flowingdata.com/2010/06/16/visualization-underneath-the-surface/">data visualization</a> insightful in a way that connects deeply to how I think about the world. But also because I have never ever (consciously) noticed before the &#8216;arrow&#8217; in the FedEx logo.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rethinkingmarkets.org/2010/06/16/under-the-surface.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Filed away for future use: Facebook&#039;s findings on user use/retention</title>
		<link>http://www.rethinkingmarkets.org/2010/06/04/filed-away-for-future-use-facebooks-findings-on-user-useretention.html</link>
		<comments>http://www.rethinkingmarkets.org/2010/06/04/filed-away-for-future-use-facebooks-findings-on-user-useretention.html#comments</comments>
		<pubDate>Fri, 04 Jun 2010 11:02:15 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Data]]></category>

		<guid isPermaLink="false">http://www.rethinkingmarkets.org/?p=1224</guid>
		<description><![CDATA[From the Dataspora Blog, ostensibly on the use and abuse of R, comes this gem about Facebook: Itamar Rosenn, Facebook Itamar conveyed how Facebook’s Data Team used R in 2007 to answer two questions about new users: (i) which data points predict whether a user will stay? and (ii) if they stay, which data points [...]]]></description>
			<content:encoded><![CDATA[<p>From the <a href="http://dataspora.com/blog/predictive-analytics-using-r/">Dataspora Blog</a>, ostensibly on the use and abuse of R, comes this gem about Facebook:</p>
<blockquote><p>
Itamar Rosenn, Facebook</p>
<p>Itamar conveyed how Facebook’s Data Team used R in 2007 to answer two questions about new users: (i) which data points predict whether a user will stay? and (ii) if they stay, which data points predict how active they’ll be after three months?</p>
<p>For the first question, Itamar’s team used recursive partitioning (via the rpart package) to infer that just two data points are significantly predictive of whether a user remains on Facebook: (i) having more than one session as a new user, and (ii) entering basic profile information.</p>
<p>For the second question, they fit the data to a logistic model using a least angle regression approach (via the lars package), and found that activity at three months was predicted by variables related to three classes of behavior: (i) how often a user was reached out to by others, (ii) frequency of third party application use, and (iii) what Itamar termed “receptiveness” — related to how forthcoming a user was on the site.</p></blockquote>
<p>Now at least you know why Facebook is obsessed with the &#8216;You haven&#8217;t spoken to X in a while. Drop them a line!&#8217;</p>
<p>I am mostly R-illiterate, and like LaTex, Emacs, and all other things Kieran, I suspect that having passed graduate school without these tools, I&#8217;ve missed my window for ever really incorporating them into my habitus&#8230;As <a href="http://www.imdb.com/title/tt0100405/">Edward Lewis</a> once said, People&#8217;s reaction to R is very dramatic. They love it or they hate it. If they love it, they will always love it. If they don&#8217;t, they will learn to appreciate it, but it will never become part of their soul.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rethinkingmarkets.org/2010/06/04/filed-away-for-future-use-facebooks-findings-on-user-useretention.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Two models for understanding what people like, which is better?</title>
		<link>http://www.rethinkingmarkets.org/2009/11/16/two-models-for-understanding-what-people-like-which-is-better.html</link>
		<comments>http://www.rethinkingmarkets.org/2009/11/16/two-models-for-understanding-what-people-like-which-is-better.html#comments</comments>
		<pubDate>Mon, 16 Nov 2009 12:22:33 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Culture]]></category>
		<category><![CDATA[Data]]></category>

		<guid isPermaLink="false">http://www.rethinkingmarkets.org/?p=1111</guid>
		<description><![CDATA[I have a sort of interesting question, though perhaps it&#8217;s less interesting than I imagine it to be. Given the following two scenarios, which is more likely and why? Scenario 1: In the first case, we have a series of attributes attached to a person, and then we can make arguments (empirical, theoretical) about how [...]]]></description>
			<content:encoded><![CDATA[<p>I have a sort of interesting question, though perhaps it&#8217;s less interesting than I imagine it to be. Given the following two scenarios, which is more likely and why?</p>
<p>Scenario 1:<br />
<a href="http://www.rethinkingmarkets.org/wp-content/uploads/2009/11/introvert.jpg"><img src="http://www.rethinkingmarkets.org/wp-content/uploads/2009/11/introvert.jpg" alt="Introvert" title="Introvert" width="350" height="348" class="alignleft size-full wp-image-1112" /></a> In the first case, we have a series of attributes attached to a person, and then we can make arguments (empirical, theoretical) about how these attributes lead to outcomes. A person who is value-conscious is likely to pass on high-ticket items; a person who is sexually adventurous is likely to seek out partners to participate in kinky sex; a person who likes crappy romantic comedies is likely to see Miss Congeniality.</p>
<p>To make these kinds of arguments, we would have to seek out attributes to are causally related to the outcomes in question. In this case, if we want to predict what kinds of parties a person would want to go to, we would ask questions about their introvertedness/extrovertedness and the results of these questions would have a positive/negative/null effect on the kinds of parties they either go to or report going to. That is, the empirical and theoretical difficulty is in finding variables related to the outcomes we want to know about.</p>
<p>Scenario 2:<br />
<a href="http://www.rethinkingmarkets.org/wp-content/uploads/2009/11/correspondence.jpg"><img src="http://www.rethinkingmarkets.org/wp-content/uploads/2009/11/correspondence.jpg" alt="Correspondence" title="Correspondence" width="450" height="350" class="alignleft size-full wp-image-1113" /></a> In the second scenario, the argument is that we should give primacy to a likeness or correspondence analysis to help understand what kinds of parties you want to go to &#8211; and a whole host of other things.</p>
<p>In this case, we rely less on the causal relationship between introvertedness and the kinds of parties you would want to go to, and more on the fact that <em>whatever</em> you want to do, another person who is just like you would also want to do those same things. If this were true, then what&#8217;s important is not so much to find a causal link between what kind of person you are and what kinds of parties you want to go to. Instead, the challenge is in finding what makes you different from one person and similar to another. If we can somehow &#8216;clump&#8217; all the similar people, we would be more likely to know what kinds of things they like and do, <em>regardless of their specific characteristics</em>.</p>
<p>In this case, a person who is value-conscious, sexually adventurous, or likes crappy romantic comedies is likely to like the same kinds of things as other people who are value-conscious, sexually adventurous, or who have questionable movie tastes.</p>
<p>I think that approach #2 is behind much of the recommendation engine work that has emerged of late, but I&#8217;m wondering in a more and more pointed fashion which of these is a more reasonable approach to understanding what people like. What do you lurkers think?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rethinkingmarkets.org/2009/11/16/two-models-for-understanding-what-people-like-which-is-better.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Types of variables, drop-down menus</title>
		<link>http://www.rethinkingmarkets.org/2008/07/09/types-of-variables-drop-down-menus.html</link>
		<comments>http://www.rethinkingmarkets.org/2008/07/09/types-of-variables-drop-down-menus.html#comments</comments>
		<pubDate>Wed, 09 Jul 2008 22:49:36 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Organizations]]></category>

		<guid isPermaLink="false">http://www.rethinkingmarkets.org/?p=185</guid>
		<description><![CDATA[Over at 37 Signals, they have a regular series detailing their design decisions. It is an insightful feature and an insightful blog. Their latest discussion is about how they managed a question on their support forms. I want to drop some research methodology on this problem. While their discussion is about how to design a [...]]]></description>
			<content:encoded><![CDATA[<p>Over at <a href="http://www.37signals.com/svn/">37 Signals</a>, they have a regular series detailing their design decisions. It is an insightful feature and an insightful blog.</p>
<p>Their <a href="http://www.37signals.com/svn/posts/1111-design-decisions-basecamp-support-request-form">latest</a> discussion is about how they managed a question on their support forms. I want to drop some research methodology on this problem. While their discussion is about how to design a feedback form, it is also about the kinds of questions you should ask on a survey. And it would benefit from a discussion of categorical/nominal variables, ordinal variables, and interval-level variables. Yep.</p>
<p>So, terms. <em>Categorical variables</em> (also called nominal variables) are those variables with 2 or more &#8216;states&#8217;, but without an intrinsic ordering. Male/female is categorical, as is eye or hair color, race, what school you went to. <em>Ordinal variables</em> are those variable with two or more states <em>that have an ordering</em> to them. Low/medium/high are ordinal variables. Less than HS, HS, some college, BA is ordinal. <em>Interval-level variables</em> are ordered, and the distance between categories is evenly spaced. Income, height, and years of education are all interval-level variables.</p>
<p>The difference between categorical/nominal variables and ordinal variables is the hierarchical ordering of the latter. What school you went to is nominal, but tiered ranking of what school you went to is ordinal. Tiered ranking may be ordinal, but amount of school endowment is interval-level. And quantitative variables are a hint that the variable is ordinal or interval, but not decisive (zip codes are nominal, for instance).</p>
<p>So, back to 37 Signals. What they <em>want</em> is some variable that would be easy to understand (from the customer perspective) and helpful to process (from the company perspective). Their first attempt looked something like this:</p>
<form name="myform" method="POST">
<select name="mydropdown">
<option value="confused">I&#8217;m Confused</option>
<option value="worried">I&#8217;m worried something bad happened</option>
<option value="upset">I&#8217;m upset or disappointed</option>
<option value="panic">I&#8217;m panicking right now</option>
<option value="nobiggie">No big deal, just need help</option>
</select>
</form>
<p>It takes time to think through what my state of mind is, because the items are <em>almost</em> ordered, but not really. Confused, worried, upset, and  panicked are not points on a continuum, they are just different states of being. The question is asked as a categorical variable question. But it is one that they <em>really</em>wanted to be an ordinal variable question. They tried to solve the problem by formalizing an ordinal variable, and putting a numbered ordering system on it to make it clearly so:</p>
<form name="myform2" method="POST">
<select name="mydropdown2">
<option value="1">1 &#8211; not a big deal, just need help</option>
<option value="2">2 &#8211; I&#8217;m freaking out a little bit</option>
<option value="3">3 &#8211; This is pretty serious</option>
<option value="4">4 &#8211; I&#8217;m panicking and need help now</option>
</select>
</form>
<p>This is easier to deal with as a customer, since you can sort of pick up your relative state of panic. In other words, you pick up that there is a rough ordering very quickly, and the numbers help a lot in this respect. But alas, what&#8217;s good for the customer was no good for 37Signals. The reason is that while they began by wanting to know the subjectivity of the customer, what they <em>really really</em> wanted to know was, &#8216;how important is this problem for our company?&#8217;. Reasonable, but different. This is their final solution:</p>
<form name="myform3" method="POST">
<select name="mydropdown3">
<option value="a">General Feedback</option>
<option value="b">Feature Request</option>
<option value="c">Billing Issue or Inquiry</option>
<option value="d">I&#8217;m confused and don&#8217;t know how something works</option>
<option value="e">Something is broken</option>
<option value="f">Other</option>
</select>
</form>
<p>They have a reason for this, and it is a decent one:</p>
<blockquote><p>Now, if something’s broken, we can spot it and fix it right away. A system failure is much more important to us (and our customers) than a feature request or general feedback. This method lets us prioritize these queries accordingly, instead of treating them like they’re all the same.</p></blockquote>
<p>However, this final solution kind of sucks, I think. The problem is that it moves priority from the customer to the company, while giving an illusion of giving control to the customer. That is, the variable is categorical/nominal for the customer, but ordinal for the company.  In other words, what is important to the company is more important than what is important to this particular customer. This is probably even more true for those customers for whom <em>everything</em> for them is the most important thing in the world. And yet.</p>
<p>I think perhaps a better solution splits the question into two, which provides space for both &#8216;urgency for the customer&#8217; and &#8216;urgency for the organization&#8217;.</p>
<form name="myform3" method="POST">
<select name="mydropdown3">
<option value="a">General Feedback</option>
<option value="b">Feature Request</option>
<option value="c">Billing Issue or Inquiry</option>
<option value="d">I&#8217;m confused and don&#8217;t know how something works</option>
<option value="e">Something is broken</option>
<option value="f">Other</option>
</select>
</form>
<form action="">
Very urgent:</p>
<input type="radio" name="Sex" value="very"/>
<br />Somewhat urgent:</p>
<input type="radio" checked="checked" name="Sex" value="some"/>
<br />Not Urgent:</p>
<input type="radio" name="Sex" value="not"/>
</form>
<p>At the risk of adding yet another item to your survey/form, you have solved both problems with some cognitive ease: customers are defaulted to medium (which is easy for people to just ignore/skip over or shift with little cognitive difficulty), which would allow the organization to give its own priority to the categorical variable. If the customer changes the default to &#8216;not urgent&#8217;, this still stands. If the customer makes their own priority &#8216;urgent&#8217;, then the organization has some discretion on whether to treat this as &#8216;urgent for us&#8217; or not, but at least has a sense of the panic level for the customer.</p>
<p>I&#8217;m sure there is an aesthetic here as well, but the general lesson should be two-fold. 1) Consider carefully the meanings behind your survey variables. Categorical variables often require thinking about, particularly as the category options become large. Ordinal and interval variables (which create an ordering) are easy, until they are too refined. Let&#8217;s say you are at a hospital, and a doctor asks you &#8216;how do you feel?&#8217; Sorting through a list of adjectives that describe your feelings sucks. And assessing your level of pain between 1 and 7 is easier than between 1 and 1000.</p>
<p>The thing is, sometimes what is important to you is not what&#8217;s important to the doctor. If you feel throbbing, it&#8217;s not lethal. If you feel numbness, it is. In this case, the options are categorical to the patient, but ordinal to the doctor.</p>
<p>Which leads to 2) If you want your customers&#8217; opinions, it may behoove you to give them a way to tell these to you. In the medical example, what kind of pain and how much does it hurt are two questions; patients care more about the second, even if doctors care more about the first. So don&#8217;t ask what kind of pain without asking how much does it hurt. Even if this saves the patient&#8217;s life, they will still be pissed at you for dismissing their subjective reality.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.rethinkingmarkets.org/2008/07/09/types-of-variables-drop-down-menus.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is XBRL, and Who does XBRL help?</title>
		<link>http://www.rethinkingmarkets.org/2008/02/07/what-is-xbrl-and-who-does-xbrl-help.html</link>
		<comments>http://www.rethinkingmarkets.org/2008/02/07/what-is-xbrl-and-who-does-xbrl-help.html#comments</comments>
		<pubDate>Thu, 07 Feb 2008 15:02:00 +0000</pubDate>
		<dc:creator>Peter</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Institutional]]></category>
		<category><![CDATA[Markets]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://rethinkingmarkets.org/2008/02/07/what-is-xbrl-and-who-does-xbrl-help.html</guid>
		<description><![CDATA[Put it on your radar screens, the next big thing is going to be XBRL. It stands for extensible business reporting language, and it is meant to commensurate business reporting via standardization. So instead of entering text into an annual report, companies, governments, NGOs, anyone who would like to comply with governmental mandate will be [...]]]></description>
			<content:encoded><![CDATA[<p>Put it on your radar screens, the next big thing is going to be XBRL. It stands for extensible business reporting language, and it is meant to commensurate business reporting via standardization. So instead of entering text into an annual report, companies, governments, NGOs, anyone who would like to comply with <a href="http://www.xbrl.org/Announcements/UK-XBRL22March2006.htm">governmental mandate</a> will be using XBRL. You can think of XBRL as a set of metatags for financial and company data, so that instead of bracket-tags for header, title, links, etc. you would have bracket-tags for earnings, time periods, definitions of costs, etc.</p>
<p>From <a href="http://www.corefiling.com/insight/20071221-1200.html">CoreFiling&#8217;s</a> insight blog: &#8220;It won&#8217;t be very long before it is those documents &#8211; the bar-coded financial disclosures &#8211; that will be the primary materials consumed by financial market systems to help analysts and investors make decisions about the best way to invest. This is vastly more sophisticated than today&#8217;s processes that rely on slow and inaccurate re-keying of a subset of the financial information published by companies.&#8221;</p>
<p>This is commensuration more than just standardization, since the tags are designed to be specific to a particular business enough so that everyone is not required to give the <em>same</em> information, yet the tags are standardized enough that everyone is required to give information that can be made comparable. The pitch for companies (other than, because otherwise we&#8217;ll fine you and take away your business license) is that XBRL will make their financial reporting less costly, less prone to error, and ultimately more efficient.</p>
<p>Personally, I think this is a flat out misrepresentation of what&#8217;s going on here. XBRL helps one group of people orders of magnitude more than anyone else: investors. And the trade-off between increased government efficiency and business streamlining of compliance data on the one hand, and increased ability for data-gatherers for banks, hedge funds, and the investor class is totally totally off the charts. What this will end up doing is: 1) creating a standard way for companies to report financials; 2) creating some increased efficiency for government entities to keep tabs on the finances of these organizations; and 3) create a <em>massive</em> additional datastream for financial services and investment firms to work with. If you think it is a challenge for public firms to resist making short-term decisions based on financial analysts&#8217; quarterly reports of earnings now, wait until this information is directly readable by quant trading models.</p>
<p>This would be an amazing dissertation topic. I would track: a) the creation of the standard; b) the adoption of the standard around the world; c) how XBRL is being incorporated into financial modeling; d) the before-and-after effects of XBRL on market prices for firms; and e) qualitatively, what gets excised from XBRL, or rather, what remains incommensurable about firms, governments, etc.</p>
<p><a href="http://www.ubmatrix.com/index.htm">UBMatrix</a><br />
<a href="http://www.xbrl.org">XBRL&#8217;s main site</a><br />
<a href="http://www.sec.gov/spotlight/xbrl/xbrlwebapp.htm">US SEC&#8217;s &#8216;Interactive Data Viewers&#8217;</a><br />
<a href="http://www.readwriteweb.com/archives/microsoft_advances_xbrl.php">Microsoft uses XBRL</a><br />
<a href="http://xbrl.us/usgaapreview/Pages/default.aspx">US GAAP XBRL Taxonomy</a> (GAAP is the accounting standard in the US)<br />
<a href="http://www.corefiling.com/">CoreFiling</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rethinkingmarkets.org/2008/02/07/what-is-xbrl-and-who-does-xbrl-help.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

