RSS

Story Telling – the Art, Science, and Business of Data: Talking with Edd Dumbill about Strata, NYC, 2011

Wed, Aug 31, 2011

I’m really looking forward to the O’Reilly Strata events that are coming to NYC in a couple of weeks. I’m fascinated to see where the art, science, and business of data has gone since February, when I attended the first Strata Conference in Santa Clara – a sold out event imbued with an awareness that this was an important gathering of cognoscenti working on the next big thing.

Strata in New York City is a sequence of events,  Strata JumpStart, Sept. 19th, and then The Strata Summit, “The Business of Data,” Sept. 20th & 21st, and followed by the Strata Conference, “Making Data Work,” Sept. 22nd, 23rd.

“The future belongs to those who understand how to collect and use their data successfully.”

Below is a transcript of a conversation I had last Friday with Strata Program Chair, Edd Dumbill about some of the highlights of the schedule from my perspective.  However, I highly recommend taking a good look at all that is planned through the three events because there is a depth and breadth that could not be covered in one conversation.

The video opening this post is from visual.ly.com – a start-up making it easier for people to create, explore, share, and promote data visualizations and infographics.

Talking with Edd Dumbill

Tish Shute: It seems a dialogue between the art of data and the science of data is going to be center stage at Strata NYC, and there will be much discussion about story telling with data.

Is that observation correct or is there something else going on there?

Edd Dumbill: No, I think that’s a great characterization. For the Summit, the core realization for me has been that when you have these tools for getting value from data and when you can drive what you’re doing by data, then actually, the biggest consequences are human ones, and they are organizational ones, and they are strategic ones once you have the technology in place.

So what the summit is doing is really looking at how, in a variety of industries, governments, and within disciplines within those, how the amount of data, the ease of which it can be communicated and mined is changing the way industry is shaped.

Tish Shute: Also, I noticed  that the Strata Summit Schedule (Sept 20th & 21st), and even through to the Strata Conference (Sept 22nd & 23rd), has more of an emphasis on pop culture; sports – baseball, dating – OKCupid, and Narrative Science, all have a place on the schedule, for example?

Is this the culture of New York City being reflected – interests in media and marketing, or is there something else going on?  Has the data tool stack matured since the Strata Conference in Silicon Valley at beginning of the year?


Edd Dumbill
: Yes, there’s certainly a different flavor to the event because we’re in New York. And, yes, the tool stack has matured, but it is, by no means mature, and the maturity’s only coming at the lowest level.

I think there’s many years left in maturing the tool stack. But one of the beauties of big data is that once you have the data together, the algorithms to get value from it initially are pretty simple.

So, focusing on the stories of success of being data driven, particularly in the Summit, is important to us because the two questions people are asking are, “One, I’ve got data. Two, What do I do with it?”  We don’t need to make the argument that data is important anymore. But we do need to demonstrate what you can do with it.

The data isn’t necessarily big; it’s just there. It’s about having an analytical approach to your business that compliments your intuition, and compliments your vision.

“One of the most powerful ways of presenting data to people is in a story,” Edd Dumbill

Tish Shute: Yes I can see the emphasis in the schedule on how to tell meaningful stories with data. Narrative Science seem to be doing something very interesting re turning data into stories?

Edd Dumbill: Yes. They absolutely fascinate me with what they do. There’s this kind of hierarchy and sort of chain of needs right now where business is going, “We need data scientists. Find me data scientists. Train me data scientists. Hire me data scientists.” And the data scientists are all going, “I need visualization. I’ve got this data, I now need to turn it back into a story that’s going to be useful to people or provide interfaces that are going to help people understand and explore this,” because it doesn’t scale to have to have an interpreter all the time between the data and the results.

You need to be able to present it in a way that means something to people.

People can look at a graph and get many things out of it, maybe not even get anything at all out of it if they are not used to it. But particularly for digesting certain kinds of high-level summaries and results, if you can put the data back into prose, it makes it very accessible to people.

Tish Shute:
Natural Language Generation from data really opens up so many possibilities..

Edd Dumbill: Yes, it’s interesting. I think it’s a very novel use. A lot of people would consider that the end result of their data was a spreadsheet or a graph that they are processing.

But if you turn that back into a story, I think there’s a lot of potential of helping executives understand what’s going on. It makes it possible to use language to understand the results.

Tish Shute:
I am really excited to see the emphasis on stories, data design and visualization, and the way we experience data is as much part of The Strata Summit and The Strata Conference as some of the more hardcore big data challenges and analytics stuff.

Edd Dumbill:
Yes. We are definitely ramping up on visualization. And I think that’s going to become more important. Having a fundamental grasp of how to use graphics and charts is still incredibly core to what we’re saying. But I’m also interested in ways that go beyond, because at least 50% of the point of visualization is to help people understand the dynamics of the data, to really augment their senses with the results of the computation.

You know, the people who are some of our best leaders, the ones who know how to ask the right questions of the data, have a sort of indefinable fingertip feel that you get for numbers when you live around them for a while. And anything we can do with interfaces to accelerate this is going to be very beneficial, whether it comes to being visual and flying through the data or hearing it in natural language.

Tish Shute: Have I missed anything in that in terms of what you’ve got on the schedule re visualization? VisualizingData.com published an ideal schedule from the visualizing data perspective. But have you added anything recently?

Edd Dumbill: Well, there’s one event which isn’t actually listed on the schedule yet, which is on Tuesday night. There’s a venue called EyeBeam in New York; we’re having a visualization showcase that evening. So there will be stuff to walk around and then a few talks, really from some of the most interesting companies doing viz and viz approaches. So that’s not up on the schedule yet, but that will be in addition. It gives a nice focus on Tuesday night.

Tish Shute: Oh, that’s super awesome. I’ll definitely go to that.

Tish Shute:
I am very interested in mobile social communications and augmented reality – especially augmented reality that feels different, not just looks different, as Kevin Slavin puts it.

I am excited to see people thinking about data not just in terms of visualization, but in other ways too that we can feel it through our secondary senses as well (see Mike Kuniavsky’s talk at ARE2011, “Somatic Data Perception”).

Edd Dumbill: Yes, absolutely. That is where we view this as going. I will be incredibly depressed if I’m still looking at the world through a glowing rectangle in 10 years time.

Tish Shute: Yes, it would be! I am looking forward to see the new data start ups too.

Edd Dumbill: Yes, there are a variety of interesting startups, that I feel are particularly important in the data space. Media Sift and Data Sift, for example, Data Sift is doing a lot of real time processing on the Twitter fire hose. They provide real time analytics on Twitter, which I think is very important.

Tish Shute: In terms of using data to provision mobile experiences, real time is massively important, isn’t it?

Edd Dumbill:
Absolutely. Yes.

Tish Shute: But real time data is still a big challenge, isn’t it?

Edd Dumbill:
Yes. I mean right now, our focus on real time is probably at the technology level. Looking at real time, people are kind of building out the frameworks, companies like Media Sift and Data Sift creating parts of the experience.

And yes, our Where 2.0 conference will be focused more on the mobile experience.

Tish Shute: Re mobile experiences, I am very excited about Infochimps and their new geo APIs, and sensor data is becoming such a big part of the picture now too. But the Kinect has also opened up a whole set of possibilities for the future of sensor data!

Edd Dumbill: Yeah. I still think Kinect is probably one of the most exciting things going down because of the democratization of that kind of capability. Interesting things happen when the sensors become cheap, right?

When alongside a little camera in your iPad you have a Kinect sensor equivalent. That’s become extremely interesting because everybody has it with them and can do things based off it.

So the things that always fascinate me are when it becomes cheap and hackable.

Tish Shute:
And if Kinect went mobile, that would be exciting?

Edd Dumbill: I think it’s entirely likely in the next couple years, yes.

The more sensors we can start instrumenting our mobile and personal devices with, I think it’s going to always result in some much more novel uses that we ever dreamed of.

Tish Shute: There was a lot of hoo-ha about Color when they launched this year. They were unable to capture a user base, but if they had issues of privacy might have come to the fore because they were really collecting more sensor data than any other app, right?

We are still waiting to see a breakthrough app in that area in terms of using all the phone sensors in ways that will really enhance a user experience rather than just the aims of data mining, aren’t we?

Edd Dumbill: Yes. I think this is one of the things where, in parallel, we’re really learning out the social and privacy implications of this kind of technology. It seems to me the focus has shifted from the tech in the second half of the year too. Frankly, everybody getting kind of freaked out about the amount of data that’s being mined and, you know, what’s acceptable use for that.

But on a slightly more prosaic level, there are some rather fabulous things being done. If you look at the Google Maps navigation experience on an Android phone. For instance, there’s some very practical applications of sensors collecting data with traffic and a variety of other augmentations going in that to actually do something useful.

So maybe we’d like to think we carry our sixth sense around with us in our pocket, and maybe we will. But we certainly can in our car right now with all the automatic rerouting and so on. That’s slightly more prosaic, but I think a lot more significant in terms of a pattern of how that can be applied.


Tish Shute:
One of the Startups that really excited me in February at Strata, Santa Clara was Singly and The Locker project. They are really thinking innovately in the area of putting people at the center of their data.

I am looking forward to seeing the fruition of that work. And, while I’m enjoying Google +, it seems, we are just sort of holding up our hands and saying, “Well, there’s only one business model for data, and that is a centralized Fort Knox,” isn’t it? Or is there something that I’m missing?

Edd Dumbill: You’re right. I mean I think Google +, for instance, is rather the walled garden is a hedged garden. You know, there is a certain barrier there that I think is more about the fact that you need to put certain barriers up to actually create a decent user experience in the first place. I think user experience is one of the BIG problems with open data, and private data, to be honest.

There’s a reason we are not all writing PGP encrypted emails to each other, right? Because it’s so hard to make a UI for encryption that’s safe. Most people don’t use passwords properly. And I think a lot of the same user experience considerations come into this whole data thing.

Facebook can get away with anything they want to because have you ever tried using their privacy settings? Google, I think, more than anybody has tried to address this issue using sensible defaults, making the explanations clear. And they probably succeeded for a geek tech audience.

So I honestly think, probably, Locker’s biggest challenge, in that kind of approach, is definitely UI and giving the concept to the users so they can understand it.

But there’s certainly a very useful contribution to this conversation.

I think there are parallels in blogging, actually. There is a case where people have information they want to disseminate. And do you choose to do in on your own website, set everything up, publish for yourself, host for yourself, so you have complete control, or do you cede, for convenience, control to Blogger or Tumbler, knowing that you are being monetized somehow and that you’re playing in somebody else’s walled garden and don’t have that control?

So I haven’t really expanded that thought too much, but I think there’s something there in following that along and seeing where that actually leads.

But, you know, there is a whole technical challenge as well.

I really like the idea of being able to give permission to people. Being able to say, well, “I’m engaging you to do X,Y,Z in return for such and such. That seems like a good bargain to me. Giving up my data is a decent bargain for the services I’m getting back.” I mean that’s generally the contract we make in real life with people anyway.

That’s another thing re Google+, –why it’s a promising approach. At least in their rhetoric, they’re trying to say, well, “We’re trying to model this on the real life economy, the economy of real life interactions.”

Tish Shute: Yes. Any movement towards saying, well, “I’m not just collecting your data randomly, I’m collecting this data because I want to give something back to you that will enhance your interactions,” definitely feels like an improvement, doesn’t it?

Edd Dumbill:
Yes. I think that bargain is clear. I’m just fascinated by who could be trusted and… I do actually wonder if there will be some kind of, rather than necessarily everything being decentralized like Lockers suggests, there might be an idea of a variety of inter-operating, trusted identity brokers. People who we would actually trust. Banks, right? We do that right now. Banks are pretty much our identity brokers. Who knows?

Tish Shute: I think, that is where the Locker project’s going with Singly, isn’t? Isn’t Singly the trusted broker for the Lockers, right?

Edd Dumbill: Yes. Now the question is whether you trust a startup with that or whether you’re going to trust… I mean, who knows? Trust levels are at such all-time lows with everybody right now. People in America won’t trust the government. I think Google are probably one of the most trusted brokers out there online.

Tish Shute:
Perhaps, that’s interesting, isn’t it?

Edd Dumbill:
I did write a piece, which kind of speculated that Google may become some sort of center brokering of social information and kind of a platform.

Tish Shute: Oh, yes, “Google+ is the social backbone” – a very thought provoking piece! It deserves an interview on it’s own!

But back to the Strata schedule! I notice you have DePodesta doing the Moneyball talk, right? What’s the 2011 twist on Moneyball?

Edd Dumbill: I think the twist on that is that the’re a lot more people can play now, really, which is why we’re having Strata in the first place. That 10 years ago the people doing this kind of stuff are McDonalds and Walmart and sports teams. Everybody, where there was large money, they could afford to gather the data. Maybe they could try this service out in making decisions based on it.

Well, we’re now in a very instrumented society where every business, every person has instrumented data about their interactions. I think the kind of resistance and dynamics and opinions that Moneyball brought up are the ones that people are going to be facing again right now as they seek to be more data-driven in what they’re doing.

It’s also very interesting to know 10 years on, what do you think? You’ve had 10 years of this, of sort of sabermetrics and so on. Have you matured in your view, have you softened?

What I’m endlessly and ultimately fascinated by is, where does this fit in the decision process and in the organization tree? Where does it mesh with vision?

Steve Jobs achieved it perfectly. He had vision and all kinds of things for his products. But Apple succeeded through a relentless operational efficiency. Absolutely relentless in their suppliers, their supply train, their manufacturing lines down to their detail. They are an utterly data-driven, process-driven organization at the same time as melding that with vision, design values and good quality. That’s a case where it worked together.

I’m eager to try and tease it out, figure out how that really works and how those things come together.

Tish Shute:
And that’s another thread I see being explored at Strata, NYC. It’s not human versus machine or machine trumps human, but it’s human with machine. This is another theme, isn’t it?

Edd Dumbill:
Exactly. We all operate by feedback loops. Really, what machines are doing enables us to get better quality data and in a tighter feedback loop.

Tish Shute: One feedback loop that we’re finding machines very useful for is understanding how we feel. I think that’s really interesting.

Edd Dumbill: Yes. I’m very fascinated by all the quantified-self stuff and where that can take us. At the end of the day, we have a very personal little organization to deal with, which is ourselves.


Quid: Building Software and Mathematical Solutions 
to Simplify Complex Decisions

Tish Shute: Yes! But the thing is we don’t understand ourselves in isolation, do we? I am definitely going to attend the session by Sean Gourley, CTO of Quid, on semantic clustering analysis. It seems like sentiment analysis is going big-time now, isn’t it?

Edd Dumbill:
Yes. I mean, sentiment analysis is actually becoming a checkbox feature in databases now. The latest release of Greenplum has it built it. It’s that kind of level of feature that people want as social data is so important. Of course a lot of this is being driven by marketing and advertising.

Tish Shute: Yes but even re marketing data story telling has been taking some interesting and quirky turns hasn’t it?

Edd Dumbill:
Yes, absolutely. I think there’s a lot of interesting research ahead of us there as well.


OKCupid Trends

Tish Shute: OkCupid is a very interesting example of data story telling that leverages our desire to know ourselves, and ourselves in relation to others.
.
Edd Dumbill:
Yes. I mean they’re an example of a shift that’s happening in the PR industry, actually, which is companies understanding that telling marketing stories with data is very, very compelling. OkCupid really used that to hit well above their weight. Of course they got acquired as a direct result of that and their profile.

Tish Shute: I know OKCupid got acquired by Match.com, but you were saying they hit above their weight by using this analysis? How did that work?

Edd Dumbill:
I think a lot of it’s down to their blog. That they analyze these things, publish them on their blog. It got a lot of attention, generated a lot of media stories, which brought them to Match.com’s attention. There’re millions of – well a large number of dating sites. But they differentiated themselves through the smart use of their data.

Tish Shute: Data and Games is an area I am very interested in. Zynga changed the game with game analytics and social games. And now we are seeing Rovio partner with Medio for analytics, (see Green pigs and data). But I noticed that you don’t have games as a strong theme on the schedule?

Edd Dumbill: I think you’ll see more of that on the West Coast to be honest. It’s not that we’re not interested. I just feel that the center of gravity to that topic is probably back on the West at the moment.

Tish Shute: So what’s after Zynga in terms of game analytics? A nice easy question!

Edd Dumbill:
Sure. Let me predict the future for you.

Tish Shute: Yes please do!

Edd Dumbill: I don’t know, to be honest. One of the very interesting things about games is that it helps us understand the real world by modeling and playing around. I’m highly fascinated to see some more of those things played out through real life actors. There’s been some examples right out of Scavngr and whatnot. But if any of those techniques can really start to make a way into mobile technology, that’s one interesting thing.

What lessons can we take from what we’ve actually learned in game analytics that are reproducible and useful elsewhere?

Gamification is a bit of a trend right now. I am slightly skeptical… But I am fascinated by a lot of systems that are having these game elements added to them. And so the second question is, if you’re having games added to things, like losing weight or saving money or writing a book, I’ve seen that too, what can you apply from the analytics world on top of that, and learn about systems and tweak them?

I don’t have that good of an answer for you. How my game is, is not steeped in that. But I am aware that there’s probably a lot of progress in games that has yet to be applied anywhere else.

Zynga and whatnot, is kind of a space race, isn’t it, to monetize that. Space races generate technologies that can be applied in a variety of places.

What are the spinouts of game analytics that we can actually use elsewhere?

“These Bloom Instruments aren’t merely games or graphics. They’re new ways of seeing what’s important.”


Cartagr.am by Bloom

Tish Shute: Last February, at Strata, I was very struck by the new work by Ben Cerveny and Bloom on “pop cultural instruments for data expression” (also see Ben Cerveny’s talk at ARE2011).

Edd Dumbill:
Yeah. I love every time the visualization comes onto a tablet….there’s an interesting back channel there.

And Google has done this in extreme to add to their great advantage. There’s a potential when you read an E-book, or you interact with the visualization of a tablet, that it can learn from your interactions.

If you read an E-book, and the book is instrumented and sends stuff back, then the book can read you at the same time that you’re reading it. That kind of collective intelligence can then be harnessed.

So what if Bloom’s pop culture visualizations are instrumented so that they know how people are using it? Well what can they learn about that? About either the quality of the visualization, about what’s interesting to data and back at the same time?

This is what the fundamental principles I think even of Web 2.0 and definitely in this era of big data that we’re in, is that the secondary signals, the exhaust from any electronic product, can be incredibly valuable.

We know that every time you run Google you are probably a part of at least one experiment that they are running to determine an optimal, and optimize their product through that. And how can you turn this up to generalize that out?

Tish Shute: I agree.  This is at the core of the art, science and business of data.  I hear your phone ringing, but do I have time for one more quick question?

Edd Dumbill: Oh yes.

Tish Shute: So it sort of follows on from my previous question.  The relationship between the crowd sourced intelligence and machine intelligence has played a huge role in making data work and  solve real world problems – Crowd Flower, for example.

Where are we at now with this relationship between crowdsourcing power of, for example, Crowd Flower and Mechanical Turk when combined with machine intelligence. Is there anything new going on here?

Edd Dumbill:
What we’re actually starting to do is learn where to apply these tools. We’re reaching a point of understanding what crowd-sourcing is for, how to better design crowd-source tasks and so on in innovative uses.

One of the things I am particularly excited about is Natala Menezes who was at Amazon working on Mechanical Turk, she’s now moved to a company called GigWalk, which is a Turk platform that’s mobile.

So if you want to assign tasks that depend on people being in particular places and being able to do particular things, this is a platform for turking using that, which I think is fascinating. That’s definitely a new approach.

Tish Shute:
Yes GigWalk is awesome – I saw that Photosynth is partnering with GigWalk. That is interesting – perhaps a step towards strong AR! ( see Read Write World and Blaise Aguera Y Arcas’s work on Photosynth was big news at ARE2011).

Edd Dumbill: Natala will be talking about GigWalk. I think the session is called quirky crowdsourcing. I want to call it Quirky Turks.

Tish Shute: [laughs] I like that.

categories: Ambient Findability, Android, Augmented Reality, Big Data, data science, Hadoop, Instrumenting the World, Linked Data, mobile augmented reality, mobile meets social, Mobile Reality, Mobile Technology, New Interfaces, online privacy, Open Data, privacy and online identity, Real Time Big data, social gaming, ubiquitous computing, websquared
tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Leave a Reply