<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: About 33 Bits</title>
	<atom:link href="http://33bits.org/about/feed/" rel="self" type="application/rss+xml" />
	<link>http://33bits.org</link>
	<description>The End of Anonymized Data and What to Do About It</description>
	<lastBuildDate>Fri, 24 May 2013 20:23:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: dsr</title>
		<link>http://33bits.org/about/#comment-3738</link>
		<dc:creator><![CDATA[dsr]]></dc:creator>
		<pubDate>Fri, 02 Mar 2012 16:00:39 +0000</pubDate>
		<guid isPermaLink="false">#comment-3738</guid>
		<description><![CDATA[Just to give a bit of information theory background for readers who don&#039;t have much exposure to computer science or mathematics, when we say a &quot;bit&quot; we mean a binary digit--a piece of information that can have one of two values (which we express in binary--the base 2 number system--as 1 or 0).  Depending on the context, we often name those value pairs true/false, yes/no, on/off, or high/low, but they can represent any single choice between two options.  We can then encode data by representing the choices it took to describe that data.

For example, let&#039;s say we want to represent different types of people using binary.  We want to encode whether they are male or female, an American citizen or not, and a native speaker of English or not.  I know some people don&#039;t identify as either male or female, but let&#039;s assume for simplicity that the above three parameters all are binary choices.  We now want to find out the number of distinct people we can categorize with these three parameters.  Since there are two options for each choice, each new choice doubles the number of people we can categorize.  So one choice lets us encode 2 people (either male or female).  Two choices let us encode 2*2=4 people (either male American, female American, male non-American, or female non-American).  Three choices let us encode 2*2*2=8 people (two sets of the above four people, with one set speaking English natively and the other not).  We can represent this using exponents as 2^n where n is the number of choices.  So when there are three choices we can differentiate between 2^3=8 different people.  To see this easily, we can assign each type of person a three-digit binary number, where the first digit represents their gender (0 is male and 1 is female), the second represents if they are American (0) or not (1), and the third represents if they speak English natively (0) or not (1).  So we have 000, 001, 010, 011, 100, 101, 110, and 111.  These work out to be the binary representations of the base ten numbers 0 through 7, which is a range of eight distinct values.  So we know the math worked out.  To summarize so far, three binary digits can represent eight different values because three yes/no choices can distinguish between eight different people.  In math terms, 2^3=8.  Likewise, 2^4=16, so four binary digits can represent 16 different values--the numbers from 0 to 15 or just sixteen different people.

So how did the author of the article know that we would need 33 bits to differentiate between the 6.6 billion people (I think it&#039;s over 7 billion now, but we&#039;ll stick to the data in the article) in the world?  Put another way, how many choices between two options must we make to differentiate between 6.6 billion people?  Using the math above, we can translate this into the equation 2^x = 6.6 billion, where x is the number of choices or binary digits or bits of information.  To solve for x, mathematicians take what&#039;s called the logarithm--the inverse of an exponent.  Computer scientists represent the base 2 logarithm with the notation lg.  So to use our above examples, to solve for x in 2^x=8, we can say x = lg(8) = 3.  In English, that&#039;s &quot;the base 2 logarithm of 8 is 3&quot; or &quot;the number of bits required to represent eight distinct things is 3.&quot;  Likewise, lg(16) = 4 and lg(32) = 5.

So getting back to the equation 2^x = 6.6 billion, we can rewrite it as x = lg(6.6 billion), which my calculator gives me as 32.61981887845735.  So we&#039;d need 33 bits to represent up to 2^33 = 8,589,934,592 distinct people.  Hope that makes this blog accessible to laypeople!]]></description>
		<content:encoded><![CDATA[<p>Just to give a bit of information theory background for readers who don&#8217;t have much exposure to computer science or mathematics, when we say a &#8220;bit&#8221; we mean a binary digit&#8211;a piece of information that can have one of two values (which we express in binary&#8211;the base 2 number system&#8211;as 1 or 0).  Depending on the context, we often name those value pairs true/false, yes/no, on/off, or high/low, but they can represent any single choice between two options.  We can then encode data by representing the choices it took to describe that data.</p>
<p>For example, let&#8217;s say we want to represent different types of people using binary.  We want to encode whether they are male or female, an American citizen or not, and a native speaker of English or not.  I know some people don&#8217;t identify as either male or female, but let&#8217;s assume for simplicity that the above three parameters all are binary choices.  We now want to find out the number of distinct people we can categorize with these three parameters.  Since there are two options for each choice, each new choice doubles the number of people we can categorize.  So one choice lets us encode 2 people (either male or female).  Two choices let us encode 2*2=4 people (either male American, female American, male non-American, or female non-American).  Three choices let us encode 2*2*2=8 people (two sets of the above four people, with one set speaking English natively and the other not).  We can represent this using exponents as 2^n where n is the number of choices.  So when there are three choices we can differentiate between 2^3=8 different people.  To see this easily, we can assign each type of person a three-digit binary number, where the first digit represents their gender (0 is male and 1 is female), the second represents if they are American (0) or not (1), and the third represents if they speak English natively (0) or not (1).  So we have 000, 001, 010, 011, 100, 101, 110, and 111.  These work out to be the binary representations of the base ten numbers 0 through 7, which is a range of eight distinct values.  So we know the math worked out.  To summarize so far, three binary digits can represent eight different values because three yes/no choices can distinguish between eight different people.  In math terms, 2^3=8.  Likewise, 2^4=16, so four binary digits can represent 16 different values&#8211;the numbers from 0 to 15 or just sixteen different people.</p>
<p>So how did the author of the article know that we would need 33 bits to differentiate between the 6.6 billion people (I think it&#8217;s over 7 billion now, but we&#8217;ll stick to the data in the article) in the world?  Put another way, how many choices between two options must we make to differentiate between 6.6 billion people?  Using the math above, we can translate this into the equation 2^x = 6.6 billion, where x is the number of choices or binary digits or bits of information.  To solve for x, mathematicians take what&#8217;s called the logarithm&#8211;the inverse of an exponent.  Computer scientists represent the base 2 logarithm with the notation lg.  So to use our above examples, to solve for x in 2^x=8, we can say x = lg(8) = 3.  In English, that&#8217;s &#8220;the base 2 logarithm of 8 is 3&#8243; or &#8220;the number of bits required to represent eight distinct things is 3.&#8221;  Likewise, lg(16) = 4 and lg(32) = 5.</p>
<p>So getting back to the equation 2^x = 6.6 billion, we can rewrite it as x = lg(6.6 billion), which my calculator gives me as 32.61981887845735.  So we&#8217;d need 33 bits to represent up to 2^33 = 8,589,934,592 distinct people.  Hope that makes this blog accessible to laypeople!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scotto</title>
		<link>http://33bits.org/about/#comment-3736</link>
		<dc:creator><![CDATA[Scotto]]></dc:creator>
		<pubDate>Thu, 01 Mar 2012 23:18:18 +0000</pubDate>
		<guid isPermaLink="false">#comment-3736</guid>
		<description><![CDATA[Your 33 bits research is very well illustrated by the following blog post:

Death Note: L, Anonymity &amp; Eluding Entropy

http://www.gwern.net/Death%20Note%20Anonymity

Describes how in the movie Death Note the super detective is able to narrow down the killer from the worlds population to one individual.

PS: Death Note is an interesting Japanese movie.]]></description>
		<content:encoded><![CDATA[<p>Your 33 bits research is very well illustrated by the following blog post:</p>
<p>Death Note: L, Anonymity &amp; Eluding Entropy</p>
<p><a href="http://www.gwern.net/Death%20Note%20Anonymity" rel="nofollow">http://www.gwern.net/Death%20Note%20Anonymity</a></p>
<p>Describes how in the movie Death Note the super detective is able to narrow down the killer from the worlds population to one individual.</p>
<p>PS: Death Note is an interesting Japanese movie.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob Miles (@robertskmiles)</title>
		<link>http://33bits.org/about/#comment-3733</link>
		<dc:creator><![CDATA[Rob Miles (@robertskmiles)]]></dc:creator>
		<pubDate>Thu, 01 Mar 2012 17:30:20 +0000</pubDate>
		<guid isPermaLink="false">#comment-3733</guid>
		<description><![CDATA[33 bits is the mathematical hard limit for the minimum number of bits of data you need to uniquely identify a person. Any amount of *data* that uniquely identifies a single person out of 6.6 billion equiprobable people, transfers 32.6 bits of *information*.

The distinction between &#039;data&#039; and &#039;information&#039; is very important. I could send you a unique identification number in 33 bits of data, or I could send you a full name, date of birth and address in maybe 400 bits of data, and both of those messages would contain 32.6 bits of information for identifying the person.

You can only fit 33 bits of information into 33 bits of data if every single bit eliminates exactly half of the possibilities.]]></description>
		<content:encoded><![CDATA[<p>33 bits is the mathematical hard limit for the minimum number of bits of data you need to uniquely identify a person. Any amount of *data* that uniquely identifies a single person out of 6.6 billion equiprobable people, transfers 32.6 bits of *information*.</p>
<p>The distinction between &#8216;data&#8217; and &#8216;information&#8217; is very important. I could send you a unique identification number in 33 bits of data, or I could send you a full name, date of birth and address in maybe 400 bits of data, and both of those messages would contain 32.6 bits of information for identifying the person.</p>
<p>You can only fit 33 bits of information into 33 bits of data if every single bit eliminates exactly half of the possibilities.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arvind</title>
		<link>http://33bits.org/about/#comment-1013</link>
		<dc:creator><![CDATA[Arvind]]></dc:creator>
		<pubDate>Fri, 12 Feb 2010 05:15:00 +0000</pubDate>
		<guid isPermaLink="false">#comment-1013</guid>
		<description><![CDATA[No, I&#039;m not implying that. Entropy is simply a mathematical construct that describes how much information there is to be gained from a piece of data. It says nothing about how easy it is to find data  about a person. The fact that is is easy to find auxiliary sources of data in order to determine someone&#039;s identity is an empirical observation.

&quot;Also there might only be 6billion humans on this earth but it doesn’t mean we can’t have 6000billion identities, right?&quot;

That is true; however - 

1. unless you make sure your behavior under each identity is completely independent of the others, your various identities can still be tied to each other.

2. the effort needed to create even a single believable alternate virtual identity is too high for most people to bother with.

3. going from 6 billion to 6000 billion identities only increases the entropy requirement from 33 to 43 bits, which is a negligible increase!]]></description>
		<content:encoded><![CDATA[<p>No, I&#8217;m not implying that. Entropy is simply a mathematical construct that describes how much information there is to be gained from a piece of data. It says nothing about how easy it is to find data  about a person. The fact that is is easy to find auxiliary sources of data in order to determine someone&#8217;s identity is an empirical observation.</p>
<p>&#8220;Also there might only be 6billion humans on this earth but it doesn’t mean we can’t have 6000billion identities, right?&#8221;</p>
<p>That is true; however &#8211; </p>
<p>1. unless you make sure your behavior under each identity is completely independent of the others, your various identities can still be tied to each other.</p>
<p>2. the effort needed to create even a single believable alternate virtual identity is too high for most people to bother with.</p>
<p>3. going from 6 billion to 6000 billion identities only increases the entropy requirement from 33 to 43 bits, which is a negligible increase!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anonymouscat</title>
		<link>http://33bits.org/about/#comment-1009</link>
		<dc:creator><![CDATA[anonymouscat]]></dc:creator>
		<pubDate>Fri, 12 Feb 2010 03:32:10 +0000</pubDate>
		<guid isPermaLink="false">#comment-1009</guid>
		<description><![CDATA[Arvind Sahib, 
when you say that you only require 33 bits of information and that my hometown contributes 16 or so bits, you are implying that getting more of the relevant bits is as easy as finding my hometown. Also there might only be 6billion humans on this earth but it doesn&#039;t mean we can&#039;t have 6000billion identities, right?]]></description>
		<content:encoded><![CDATA[<p>Arvind Sahib,<br />
when you say that you only require 33 bits of information and that my hometown contributes 16 or so bits, you are implying that getting more of the relevant bits is as easy as finding my hometown. Also there might only be 6billion humans on this earth but it doesn&#8217;t mean we can&#8217;t have 6000billion identities, right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lars</title>
		<link>http://33bits.org/about/#comment-830</link>
		<dc:creator><![CDATA[Lars]]></dc:creator>
		<pubDate>Mon, 23 Nov 2009 09:37:28 +0000</pubDate>
		<guid isPermaLink="false">#comment-830</guid>
		<description><![CDATA[You probably should mention that it is 32.6 distinguished bytes of data. Although this should be clear to anybody ever having heard the term &#039;entropy&#039; in the computer science context, comments above show that this is not the case for all of your readers. 

The actual truth is, that you need only 33 bits to encode an unique identity for every person. Needing only 33 bits to identify everyone sounds freakish alarming (considering that my name above in ASCII already takes up 32 bits. But it could be a little more clear, that you need very special 33 bits.

Having said that, I envy you for having had the idea for that title, instead of me that is.]]></description>
		<content:encoded><![CDATA[<p>You probably should mention that it is 32.6 distinguished bytes of data. Although this should be clear to anybody ever having heard the term &#8216;entropy&#8217; in the computer science context, comments above show that this is not the case for all of your readers. </p>
<p>The actual truth is, that you need only 33 bits to encode an unique identity for every person. Needing only 33 bits to identify everyone sounds freakish alarming (considering that my name above in ASCII already takes up 32 bits. But it could be a little more clear, that you need very special 33 bits.</p>
<p>Having said that, I envy you for having had the idea for that title, instead of me that is.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arvind</title>
		<link>http://33bits.org/about/#comment-615</link>
		<dc:creator><![CDATA[Arvind]]></dc:creator>
		<pubDate>Sun, 09 Aug 2009 21:44:42 +0000</pubDate>
		<guid isPermaLink="false">#comment-615</guid>
		<description><![CDATA[Thanks for your comment. This thread is too getting deep, I&#039;ve replied to you below.]]></description>
		<content:encoded><![CDATA[<p>Thanks for your comment. This thread is too getting deep, I&#8217;ve replied to you below.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arvind</title>
		<link>http://33bits.org/about/#comment-614</link>
		<dc:creator><![CDATA[Arvind]]></dc:creator>
		<pubDate>Sun, 09 Aug 2009 21:43:43 +0000</pubDate>
		<guid isPermaLink="false">#comment-614</guid>
		<description><![CDATA[John,

It would indeed be good to have such a script; at the same time, I think limiting the rate at which an attacker can guess passwords has a far greater effect on password security than making the user choose stronger passwords.

The entropy of a password is only vaguely defined. It can only be measured relative to the algorithm that the attacker is going to use, which of course is unknown. I have a &lt;a href=&quot;http://www.cs.utexas.edu/~shmat/abstracts.html#pwd&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt; on password cracking, which might help explain what I&#039;m talking about.]]></description>
		<content:encoded><![CDATA[<p>John,</p>
<p>It would indeed be good to have such a script; at the same time, I think limiting the rate at which an attacker can guess passwords has a far greater effect on password security than making the user choose stronger passwords.</p>
<p>The entropy of a password is only vaguely defined. It can only be measured relative to the algorithm that the attacker is going to use, which of course is unknown. I have a <a href="http://www.cs.utexas.edu/~shmat/abstracts.html#pwd" rel="nofollow">paper</a> on password cracking, which might help explain what I&#8217;m talking about.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://33bits.org/about/#comment-613</link>
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 09 Aug 2009 21:32:12 +0000</pubDate>
		<guid isPermaLink="false">#comment-613</guid>
		<description><![CDATA[Great work, very interesting Arvind.

Somewhat unrelated, but I&#039;m hoping someone comes up with a &quot;password meter&quot; that shows actual bits of entropy.

As you are no doubt aware, many folks choose, um, poorly.

http://www.webresourcesdepot.com/10-password-strength-meter-scripts-for-a-better-registration-interface/ has varying levels; some are graphically great, but some call back to servers for parts of their functionality.

I question the algorithms that are used to determine effectiveness; I&#039;m imagining things like the old Pgp versions that used to show a calculated entropy in bits (now you know how I found your site!) in real time, as you typed what you hoped was a good pass phrase.

&lt;a href=&quot;https://leveron.com/&quot; rel=&quot;nofollow&quot;&gt;Thanks, John&lt;/a&gt;]]></description>
		<content:encoded><![CDATA[<p>Great work, very interesting Arvind.</p>
<p>Somewhat unrelated, but I&#8217;m hoping someone comes up with a &#8220;password meter&#8221; that shows actual bits of entropy.</p>
<p>As you are no doubt aware, many folks choose, um, poorly.</p>
<p><a href="http://www.webresourcesdepot.com/10-password-strength-meter-scripts-for-a-better-registration-interface/" rel="nofollow">http://www.webresourcesdepot.com/10-password-strength-meter-scripts-for-a-better-registration-interface/</a> has varying levels; some are graphically great, but some call back to servers for parts of their functionality.</p>
<p>I question the algorithms that are used to determine effectiveness; I&#8217;m imagining things like the old Pgp versions that used to show a calculated entropy in bits (now you know how I found your site!) in real time, as you typed what you hoped was a good pass phrase.</p>
<p><a href="https://leveron.com/" rel="nofollow">Thanks, John</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arvind</title>
		<link>http://33bits.org/about/#comment-416</link>
		<dc:creator><![CDATA[Arvind]]></dc:creator>
		<pubDate>Sun, 24 May 2009 01:24:44 +0000</pubDate>
		<guid isPermaLink="false">#comment-416</guid>
		<description><![CDATA[Relax, she didn&#039;t say she was a computer scientist. It is perfectly reasonable for a non-CS/math/physics person not to be able to figure out entropy without any explanation.]]></description>
		<content:encoded><![CDATA[<p>Relax, she didn&#8217;t say she was a computer scientist. It is perfectly reasonable for a non-CS/math/physics person not to be able to figure out entropy without any explanation.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
