<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>33 Bits of Entropy &#187; conference</title>
	<atom:link href="http://33bits.org/tag/conference/feed/" rel="self" type="application/rss+xml" />
	<link>http://33bits.org</link>
	<description>The End of Anonymized Data and What to Do About It</description>
	<lastBuildDate>Mon, 30 Jan 2012 06:39:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='33bits.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>33 Bits of Entropy &#187; conference</title>
		<link>http://33bits.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://33bits.org/osd.xml" title="33 Bits of Entropy" />
	<atom:link rel='hub' href='http://33bits.org/?pushpress=hub'/>
		<item>
		<title>Web Crawlers and Privacy: The Need to Reboot Robots.txt</title>
		<link>http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/</link>
		<comments>http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/#comments</comments>
		<pubDate>Sun, 05 Dec 2010 19:54:56 +0000</pubDate>
		<dc:creator>Arvind Narayanan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[policy]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[robots]]></category>

		<guid isPermaLink="false">http://33bits.org/?p=599</guid>
		<description><![CDATA[This is a position paper I co-authored with Pete Warden and will be discussing at the upcoming IAB/IETF/W3C Internet privacy workshop this week. Privacy norms, rules and expectations in the real world go far beyond the “public/private” dichotomy. Yet in the realm of web crawler access control, we are tied to this binary model via [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=33bits.org&amp;blog=5017838&amp;post=599&amp;subd=33bits&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is a position paper I co-authored with <a href="http://petewarden.typepad.com/">Pete Warden</a> and will be discussing at the upcoming <a href="http://www.iab.org/about/workshops/privacy/">IAB/IETF/W3C Internet privacy workshop</a> this week.</p>
<hr />
<div style="float:right;margin-left:10px;"><a href="http://api.tweetmeme.com/share?url=http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/"><img src="http://api.tweetmeme.com/imagebutton.gif?url=http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/" alt="" width="51" height="61" /></a></div>
<p><em>Privacy norms, rules and expectations in the real world go far beyond the “public/private<em>”</em> dichotomy. Yet in the realm of web crawler access control, we are tied to this binary model via the robots.txt allow/deny rules. This position paper describes some of the resulting problems and argues that it is time for a more sophisticated standard.</em></p>
<p><strong>The problem: privacy of public data</strong>. The first author has <a id="esh8" title="argued" href="http://33bits.org/2010/07/06/what-every-developer-needs-to-know-about-public-data-and-privacy/">argued</a> that individuals often expect privacy constraints on data that is publicly accessible on the web. Some examples of such constraints relevant to the web-crawler context are:</p>
<ul>
<li>Data should not be archived beyond a certain period (or at all).</li>
<li>Crawling a small number of pages is allowed, but large-scale aggregation is not.</li>
<li>“Linkage<em>”</em> of personal information to other databases is prohibited.</li>
</ul>
<p>Currently there is no way to specify such restrictions in a machine-readable form. As as result, sites resort to hacks such as <a id="nitz" title="identifying and blocking crawlers" href="http://online.wsj.com/article/SB10001424052748703358504575544381288117888.html">identifying and blocking crawlers</a> whose behavior they don&#8217;t like, without clearly defining acceptable behavior. Other sites specify restrictions in the Terms of Service and <a id="evf8" title="bring legal action" href="http://petewarden.typepad.com/searchbrowser/2010/04/how-i-got-sued-by-facebook.html">bring legal action</a> against violators. This is clearly not a viable solution — for operators of web-scale crawlers, manually interpreting and encoding the ToS restrictions of every site is prohibitively expensive.</p>
<p>There are two reasons why the problem has become pressing: first, there is an ever-increasing quantity of behavioral data about users that is <a id="l6na" title="valuable to marketers" href="http://petewarden.typepad.com/searchbrowser/2010/02/public-profiles-privacy-and-robotstxt.html">valuable to marketers</a> — in fact, there is even a <a id="c1ds" title="black market" href="http://radar.oreilly.com/2010/10/the-black-market-for-data.html">black market</a> for this data — and second, crawlers have become <a id="wprc" title="very cheap" href="http://petewarden.typepad.com/searchbrowser/2010/10/what-rules-should-govern-robots.html">very cheap</a> to set up and operate.</p>
<p>The desire for control over web content is by no means limited to user privacy concerns. Publishers concerned about copyright are equally in search of a better mechanism for specifying fine-grained restrictions on the collection, storage and dissemination of web content. Many site owners would also like to limit the acceptable uses of data for competitive reasons.</p>
<p><strong>The solution space</strong>. Broadly, there are three levels at which access/usage rules may be specified: site-level, page-level and DOM element-level. Robots.txt is an example of a site-level mechanism, and one possible solution is to extend robots.txt. A disadvantage of this approach, however, is that the file may grow too large, especially in sites with user-generated content what may wish to specify per-user policies.</p>
<p>A page-level mechanism thus sounds much more suitable. While there is already a <a id="l2ci" title="&quot;robots&quot; attribute" href="http://en.wikipedia.org/wiki/Meta_element#The_robots_attribute">&#8220;robots&#8221; attribute</a> to the META tag, it is part of the robots.txt specification and has the same limitations on functionality. A different META tag is probably an ideal place for a new standard.</p>
<p>Taking it one step further, tagging at the DOM element-level using microformats to delineate personal information has also been <a id="iutl" title="proposed" href="http://blog.80legs.com/proposal-for-a-new-type-of-robotstxt">proposed</a>. A possible disadvantage of this approach is the overhead of parsing pages that crawlers will have to incur in order to be compliant.</p>
<p><strong>Conclusion</strong>. While the need to move beyond the current robots.txt model is apparent, it is not yet clear what should replace it. The challenge in developing a new standard lies in accommodating the diverse requirements of website operators and precisely defining the semantics of each type of constraint without making it too cumbersome to write a compliant crawler. In parallel with this effort, the development of legal doctrine under which the standard is more easily enforceable is likely to prove invaluable.</p>
<p>To stay on top of future posts, <a href="http://33bits.org/feed/">subscribe</a> to the RSS feed or <a href="http://twitter.com/random_walker">follow me on Twitter</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/33bits.wordpress.com/599/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/33bits.wordpress.com/599/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/33bits.wordpress.com/599/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/33bits.wordpress.com/599/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/33bits.wordpress.com/599/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/33bits.wordpress.com/599/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/33bits.wordpress.com/599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/33bits.wordpress.com/599/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=33bits.org&amp;blog=5017838&amp;post=599&amp;subd=33bits&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aa438b63ff1e9b75693aeabbeddae5eb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">randomwalker</media:title>
		</media:content>

		<media:content url="http://api.tweetmeme.com/imagebutton.gif?url=http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/" medium="image" />
	</item>
		<item>
		<title>Conferences: The Good, the Bad and the Ugly aspects</title>
		<link>http://33bits.org/2010/06/17/conferences-the-good-the-bad-and-the-ugly-aspects/</link>
		<comments>http://33bits.org/2010/06/17/conferences-the-good-the-bad-and-the-ugly-aspects/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 07:24:04 +0000</pubDate>
		<dc:creator>Arvind Narayanan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cfp]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[iapp]]></category>
		<category><![CDATA[law]]></category>
		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://33bits.org/?p=502</guid>
		<description><![CDATA[I attended a couple of conferences this week that are outside my usual community. Taking stock of and interacting with a new crowd is always a very interesting experience. The first was the IAPP Practical Privacy Series. The International Association of Privacy Professionals came about as a result of the fact that the Chief Privacy [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=33bits.org&amp;blog=5017838&amp;post=502&amp;subd=33bits&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I attended a couple of conferences this week that are outside my usual community. Taking stock of and interacting with a new crowd is always a very interesting experience.</p>
<p>The first was the <a id="e744" title="IAPP Practical Privacy Series" href="https://www.privacyassociation.org/events_and_programs/practical_privacy_series">IAPP Practical Privacy Series</a>. The <a id="g3xm" title="IAPP" href="http://en.wikipedia.org/wiki/International_Association_of_Privacy_Professionals">International Association of Privacy Professionals </a> came about as a result of the fact that the <a id="yldv" title="Chief Privacy Officer" href="http://en.wikipedia.org/wiki/Chief_privacy_officer">Chief Privacy Officer</a> (and equivalent) positions have suddenly emerged &#8212; over the last decade &#8212; and become ubiquitous. The role can be broadly described as &#8220;privacy compliance.&#8221; A big part of the initial impetus seems to have been HIPAA compliance, but the IAPP composition has now diversified greatly, because virtually every company is sitting on a pile of consumer data. There was even someone from Starbucks.</p>
<p>I spoke about anonymization. I was trying to answer the question, &#8220;I need to share/sell my data and you&#8217;re telling me that anonymization is broken. So what should I do?&#8221;. It&#8217;s always a fun challenge to make computer science accessible to a non-tech audience (largely lawyers in this case). I think I managed reasonably well.</p>
<p>Next was the ACM <a id="eu6q" title="ACM Computers, Freedom and Privacy" href="http://www.cfp2010.org/wiki/index.php/Main_Page">Computers, Freedom and Privacy</a> conference (which goes on until Friday). As I understand it, CFP was <a id="iw3." title="born" href="http://cpsr.org/prevsite/conferences/cfp91/efballad.html/">born</a> at a time when &#8220;Cyberspace&#8221; was analogous to the Wild West, and there was a big need for self-governance and figuring out the emerging norms. The landscape is of course very different now, since the Internet isn&#8217;t a band of outlaws anymore but integrated into normal society. The conference has accordingly morphed somewhat, although a lot of the old crowd still definitely comes here.</p>
<p>The quality of the events I attended were highly variable. I checked out the &#8220;unconferences,&#8221; but only a couple had a meaningful level of participation and the one I went to seemed to devolve pretty quickly into a penis-waving contest. The session I liked best was a tutorial by <a id="yanm" title="Mike Godwin" href="http://en.wikipedia.org/wiki/Mike_Godwin">Mike Godwin</a> (of Godwin&#8217;s law, now counsel for the Wikimedia foundation) on Cyberlaw, mainly First Amendment law.</p>
<p>CFP has parallel sessions. I had a great experience with that format at the <a id="grgu" title="Privacy Law Scholars Conference" href="http://33bits.org/2009/06/10/privacy-law-scholars-conference/">Privacy Law Scholars Conference</a>, but this time I&#8217;m not so sure &#8212; I&#8217;m regularly finding conflicts among the sessions I want to attend.</p>
<p>I&#8217;m bummed about the fact that there is really no mechanism for me to learn about conferences that are relevant to my interests but are outside my community. (I only learned about the IAPP workshop because I was invited to speak, and CFP purely coincidentally.) Do other researchers face this problem as well? I&#8217;m curious to hear about how people keep abreast. I mean, it&#8217;s 2010, and this is exactly the kind of problem that social media is supposed to be great at solving, but it&#8217;s not really working for me.</p>
<p>To stay on top of future posts, <a href="http://33bits.org/feed/">subscribe </a>to the RSS feed or <a href="http://twitter.com/random_walker">follow me on Twitter</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/33bits.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/33bits.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/33bits.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/33bits.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/33bits.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/33bits.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/33bits.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/33bits.wordpress.com/502/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=33bits.org&amp;blog=5017838&amp;post=502&amp;subd=33bits&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://33bits.org/2010/06/17/conferences-the-good-the-bad-and-the-ugly-aspects/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aa438b63ff1e9b75693aeabbeddae5eb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">randomwalker</media:title>
		</media:content>
	</item>
		<item>
		<title>Privacy Law Scholars Conference</title>
		<link>http://33bits.org/2009/06/10/privacy-law-scholars-conference/</link>
		<comments>http://33bits.org/2009/06/10/privacy-law-scholars-conference/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 20:16:28 +0000</pubDate>
		<dc:creator>Arvind Narayanan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[law]]></category>
		<category><![CDATA[privacy]]></category>

		<guid isPermaLink="false">http://33bits.org/?p=221</guid>
		<description><![CDATA[I had a great time at the Privacy Law Scholars Conference in Berkeley last week, perhaps more so than at any CS conference I&#8217;ve attended. A major reason was that there were &#8212; get this &#8212; no talks. Well, just one keynote speech. The format centered around 75 minutes-long discussion sessions (which seem to be [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=33bits.org&amp;blog=5017838&amp;post=221&amp;subd=33bits&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I had a great time at the <a href="http://docs.law.gwu.edu/facweb/dsolove/PLSC/">Privacy Law Scholars Conference</a> in Berkeley last week, perhaps more so than at any CS conference I&#8217;ve attended. A major reason was that there were &#8212; get this &#8212; no talks. Well, just one keynote speech. The format centered around 75 minutes-long discussion sessions (which seem to be called workshops), with 5 parallel tracks; in each session, you pick which track you want to attend. You are supposed to have read the paper beforehand, and usually everyone in the room has something to say and gets a chance to do so.</p>
<p>This seems way more sensible to me than the format of CS conferences, where there is only one track. I can&#8217;t imagine that anyone would genuinely want to attend all the talks. Ideally, for any given talk, half the people should skip it and spend their time networking instead, but in my experience this never happens. Worse, the talks are only 20-30 minutes long; while this is enough time to motiviate the paper and inspire the listeners to go read it afterward, it is never enough to explain the whole paper. Sometimes speakers don&#8217;t get this concept, and the results are not pretty.</p>
<p>Anyways, I was surprised by the ease with which I could read law papers and participate in the discussions, even if my understanding was (obviously) not nearly as deep as that of a law scholar. This is something to ponder &#8212; while legalese is dense and frequently obfuscated, law papers are a breeze to read, at least based on my small sample size.</p>
<p>There is one paper, by <a href="http://paulohm.com/">Paul Ohm</a>, that I particularly enjoyed: it is about re-examining privacy laws and regulatory strategies in the light of re-identification techniques. This generated a lot of interest at the conference, and I found the discussion fascinating. A major reason I started 33bits was to to be able to play a part in informing these developments; it seems that this blog has indeed helped, which is highly gratifying. I learnt a lot about privacy and anonymity in general, and I look forward to writing more about it in future posts, to the extent that I can do so without talking about specific workshop discussions, which are confidential.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/33bits.wordpress.com/221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/33bits.wordpress.com/221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/33bits.wordpress.com/221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/33bits.wordpress.com/221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/33bits.wordpress.com/221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/33bits.wordpress.com/221/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/33bits.wordpress.com/221/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/33bits.wordpress.com/221/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=33bits.org&amp;blog=5017838&amp;post=221&amp;subd=33bits&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://33bits.org/2009/06/10/privacy-law-scholars-conference/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aa438b63ff1e9b75693aeabbeddae5eb?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">randomwalker</media:title>
		</media:content>
	</item>
	</channel>
</rss>
