2003-04-24 Data Mining Social Cyberspaces: Tools for enhaving online communities Marc A Smith http://netscan.research.microsoft.com ob-blogging. Object blogging. Handheld barcode scanner that records your tags on a private website . Support groups . Markets . Entertainment . News Places "where association can take place" [that's a really nice fundamental definition of "place". Didn't we hit this earlier, with distance? Distance was having things in common, I think. And distance loses meaning with distance.] Social Cyberspaces created, many kinds of network interaction media . email . email list . irc . chat many axes: point-to-point, multipoint public, private assynchronous, synchronous and there are more media being made! rapid speciation can we swap anything of value? info tech catalyzes collective action because it changes the nature of core social resources . identity . reputation . exchange . initiative new communication technologies invariably raise issues of trust and authenticity "not every smart mob is a wise one" (howard rheingold) you're have to be able to recognise someone is the same person they were yesterday in order to have trust in them. these are issues that came up over the telephone. William H White's book, City. Location of street conversations, 1952, in New York. it shows more conversations in the middle of the sidewalk on the corner than elsewhere. * THE CHALLENGE online, hard to determine size of crowd, composition, distribution of people into clusters, and variation. but this is easy offline! we're social animals. we behave differently depending on whether we hear murmering, etc. but "online it's like we have our *everything* amputated" - Existing interfaces are limited . email clients -- show subject line text, author, not much more . list of newsgroups more info: "is he a flame warrior?" [<- nice line!] okay, so these are the things Netscan Research at Microsoft are doing to make these interfaces better. they're running a news server, as an experimentation place. so it's: UI design by socialologists giving us a new view onto Usenet Social Accounting Metrics... for each newsgroup, they show: number of posts, posts, un-replied-to posts, crossports, returnees, lines/post and that tells you more about what the newsgroups are list. this is on a normal looking front page... rows with columns. but it's giving signs. some research about the main people in newsgroups (which is important): . core group is always a minority . about 2%, 60-70% of the content they're going to make an opt-in thing, you add a netscan email address to your mail list, and they sit there and analyse your group, figuring stuff out. that's pretty cool. some more stats: 178GB/day on usenet 600,000 messages/day 178 million messages in 2002, up 13% on 2001 13.1 million unique authors in 2002 ooh, nice graph of usenet daily posts and posters, september 1999 - march 2003. there are spam storms which make big spikes -- they're huge! [...I'd very much like the graph from this presentation. have to remember to grab this] more stats... They also have a report card page for each newsgroup, graphs of usage. normal metrics from earlier, but with more. the cross post metric is a good metric of the on-topicness of the newsgroup. then thread views, how long it takes for a thread to grow that long, etc. [it would be interesting to do a weblog thing like this. what kind of cool metrics could we get?] [this is also nice stat] -> 67% of the threads are 2 messages there's also an author tracker! how often do they post -- if they post a lot it's what economists call a "costly signal", it's very costly to spoof that kind of thing. other metrics on authors: . thread/post ratio . threads touched, threads initiated Q - but spammers are going to start behaving like this, if you try to filter them out on metrics. A - yes, we're going to get into an arms race thread visualisations look very cool, big trees with where the authors post, etc. more stats - most poeple 67% post only once (about 3 million last year) - a minority (2%) post more than 3 days a month over a year another great map! usenet tree map visualisation, boxes where the size of the box is the size of the usenet hierarchy. [there are a whole bunch of maps, different slices, but it's a nice way of seeing what's there.] "What is a healthy community?" - there's a big list of all the usenet metrics, and an interpretation, what does it mean -- what would a healthy community look like. more great stats! likelyhood of messages getting a reply on all of usenet: . 48% 6 hours . 81% 24 hours so in the end they're coming up with a better interface, a new personalised interface to microsoft.com. - at the top of the website, "what we tell other people about you" - also, data about other people who answer your questions: reputation metrics - ways to encourage you to give good answers: did other people select your answer as the best answer? the top data, then, is: have you been replied to? what are the stats you have on them? aha! data both ways. if you want to know when somebody posts, then you're giving the system data for free: it now knows how many people think X is worth watching! very good, close the look. they're stuck on getting it working technically, but the metrics work. netscan will work doing daily scans by the end of the year (monthly at the moment) * NEW INTERFACES newsgroup crowds: maps of all the contributors in a newsgroup. a load of circles in a map: the location of the circle tells you a couple of metrics. the core group is top right, for example. bigger circles means you're a good poster elsewhere, so you can see visiting experts. different newsgroups have *really* different maps. "the technical support group with no answers" -- alt.atheism the VB support group is pretty interesting. a bunch of people who hang around answering, and loads of tiny people who just pop in to ask occassionally. authorlines: see how people change over time. threads initiated and contributed to, showing how many posts are many there. and again, people have really different authorlines! so you can learn things about them. [this would be really interesting to run over my email] * BARCODE SCANNERS you can scan things and join annotations against things. that's what it's for: a series of objects you've seen in the world that you've scanned. [the semiotcracy] AURA platforms -- this is worth having a look into. funny way of making the scanners cheaper -- replace the laser with a video camera then use software to recognise the barcode. serviceobjects.net offers lookups for free on UPC (barcodes) to the metadata. cool! -> "the mouse for physical objects"... [oh yes!]