We are in the age of unearthing and uncovering data, and only just at the beginning of the age of processing data and dealing with it (see my interview with Anselm Hook, Part 2 upcoming). O’Reilly’s Strata Confernence 2011, will explore, “the change brought to technology and business by data science, pervasive computing, and new interfaces.” It is, perhaps, one of the most important events of 2011.
Data is driving a revolution much as coal, oil, and steel powered the industrial revolution. And the world changing insight from Karl Marx that “the industrial revolution polarized the world into two groups: those who own the means of production and those who work on them,” is taking on on new life, as Alistair Croll, co-chair of Strata 2011, points out in his post, “Who Owns Your Data?”
“The important question isn’t who owns the data. Ultimately, we all do. A better question is, who owns the means of analysis? Because that’s how, as Brand suggests, you get the right information in the right place. The digital divide isn’t about who owns data — it’s about who can put that data to work.”
Strata is where a vanguard will be meet, not only to discuss this revolution’s futures, but to define how to create, handle, and build the platforms and experiences that will harness the data. My flight is booked! (Also check out BigDataCamp which takes place the night before Strata.)
The picture opening this post is from Michael EdgeCumbe’s Fall 2010: ITP Winter Show Project. A project exploring ways to intuitively get the feel of what it going on with big data sets using “the gestural manipulation and stereoscopic visualization of complex data to create a meditative state for data analysis.” Michael project will be part of the Science Fair at Strata. For more on Michael’s work see Noise Derived. I also have a number of the interesting new interface sessions at Strata in my schedule.
The daily Strata Gems on O’Reilly Radar are great place to get a gestalt of some of the Strata themes, and this post by Edd Dumbill, program chair for Strata, Three key data trends for 2011, looks at the year ahead. This week, I got the chance to ask Edd a few of the questions that I will have on mind at Strata – see his responses below.
If you have been reading Ugotrade, you will know I am interested in our mobile social augmented futures and there is no question in my mind that these will be unleashed by our new capacities to work with data (see my post here).
Data is the how.
The pic above is from “Secrets of BackType’s Data Engineers.” This post on ReadWriteHack by Pete Warden, an ex-Apple engineer, and founder of OpenHeatMap, really lives up to its title. Check it out if you want to know how “three guys (the BackType team ) with only seed funding process a hundred million messages a day?”
I asked on Quora, “What would be the most important developments for Augmented Reality in 2011,” Michal Avny, Strategist & Real Time search expert, wrote:
“AR strongly relies on localized personalized real time information.
Having a stream of tweets based on keyword search, location or circle of friends doesn’t really make the AR experience; it is the processed real time relevant information that will make AR useful and intensify the experience.”
In 2011 Real Time search and Social Search will drastically change to provide the infrastructure required.”
I followed up on Michal’s Quora answer with some more questions – see below in this post.
“2. A wave of actionable, important data APIs opened up, enabling useful non-gimmicky AR apps for the first time. Think geoloqi.com , or the work Max Ogden has done with Portland civic data. Plus of course face.com , email providers and calendar providers, etc.”
Amber Case, one of the founders of Geoloqi, is on the programming committee of Strata and will be speaking. Be sure to catch her session! Posthumans, Big Data and New Interfaces, and if you haven’t already seen it, Amber’s TED talk is a must see.
Geographic proximity is a powerful filter, as is route, and time. But clearly social proximity, social relevance, and shared tastes are also key dimensions for location based experiences, (see my convo with Schuyler of Simple Geo, upcoming).
While the whole business of location based search and curation of augmented mobile social experiences is still, for the most part, uncharted terrain, the danger of key points of control being only really accessible to elite players looms large. I asked Sophia Parafina, a pioneer in the open geo space for some thoughts on real-time local /geosearch and geomessaging, and the future of openess & big data (see Sophia’s response below).
This is another question I’m following, Is the market ready yet for P2P cloud computing? It is one of those questions that we seem to have been asking in various forms for a very long while now, but without a major shift in sight. The pic above is from, The Cloud Made Open Source “Invisible” This Year. But, perhaps, we are at the point when open p2p clouds will find a place in the market because of their potential importance in real time social search and discovery. Borislav Agapiev, Search Entrepreneur and founder of Vast.com, writes on Quora:
“I believe a P2P cloud is ideally suited for social & real-time search and discovery.
Consider MapReduce, a very interesting and popular paradigm for distributed computing. MapReduce is very much about bringing computation to data i.e. doing computation at nodes (map) and then aggregating results through network (reduce).
It is very clear now that user attention data (what they click on) is very valuable for search and discovery, yet a centralized model relies upon uploading all that to a single location and then doing a supposed local MapReduce. Clearly, MapReduce could be done across the network, without any centralized uploads.
In addition to the efficiency argument raised here, it is even more important to consider privacy issues. Uploading massive amounts of user attention data to a centralized location is not something that is going to make users warm and fuzzy as we are increasingly seeing.
In a P2P cloud, there is no big brother watching over anyone, all computation and data storage is done in the cloud, fragmented in many, many small encrypted pieces ala BitTorrent.”
Picture above from Brynn Marie Evans, “It takes two to tango: review of my social search panel“
The Delta of Now – Transforming Search into a Social Democratic Act
Picture of Maneki Neko “beckoning” cats from Journeyetc
New ecologies of human and machine intelligence are beginning to change basic social structures – see the Future of Work (Biewald and Chirayath Janah 2010). And projects like Swift River, using search and machine mining to filter out streams on topics of interest that can then be subsequently curated by human beings. This may be extended to the curation of real-time data streams and employment of machine learning algorithms based upon the explicit relationships.
Augmented mobile social experiences are a new frontier in which ideas and practices from a number of fields collide, including: ambient findability (Morville 2005), urban psychogeography, narrative structures, ambient games and devices, 4d (time-space), explorations of place and memory, enchanted objects and people (Kuniavsky 2010), and designed animism (Laurel 2010), to mention just a few.
Mobile local interaction presents an opportunity to invert the search pyramid and to transform search into a social, democratic act (see my interview with Anselm Hook upcoming). Up until now search has been predicated around a very narrow revenue model. Google has an implicit model of a B2C – business to consumer brokerage. We are only just beginning to get a glimpse of the disruptive potential of C2C – consumer to consumer brokerages. Mobile local C2C brokerages that allow us to transact in a trustworthy way over our local geography in close to real time (Hook 2010) have the potential to enable new forms of social organization. Bruce Sterling’s short story about a networked gift economy, Maneko Neki, is a brilliant glimpse at the disruptive potential of such re-imaginings.
Augmented experiences that shift or change a person’s situated geolocal experience of social reality, and change our relationship to the people and the place by augmenting engagement in, and reputation through, socially driven consumer tie ins and game dynamics, like Four Square, & Gowalla are beginning to emerge, as Kati London pointed out in her excellent keynote at Web 2.0 Expo. And, while the integration of mobile local interaction and an augmented view that shifts our geolocal experience visually will involve creative solutions to some well churned mobile, tracking, mapping and registration challenges, the exploration and development of new dimensions through which we can filter and create trusted and meaningful augmented mobile social experiences is vital, whether you are considering a mobile screen, map, camera view, or futuristic HUDs and gestural interfaces.
Talking with Edd Dumbill
Picture from O’Reilly Community.
Tish Shute: First congratulations on Strata! On the Strata homepage there is a quote from Jason Hoffman:
“My gut feeling is that we’re going to look back at the upcoming Strata Conference like we do at the Web 2.0 Conference in 2004/2005.”
—Jason Hoffman, CTO/Founder, Joyent, Inc.
Why do you think Jason’s comparison might be prescient?
Edd Dumbill: Web 2.0 is a development that ran through every brand that has a web presence and radically changed the way business is done for many companies and brands.
Strata will have a similar impact: every business has data, every business collects an increasing amount of data. This data is the new oil – a valuable raw material that when refined or combined creates value and opportunity.
Tish Shute: The rise of real time was one of your three key data trends for 2011. Hadoop is bringing the capacity to work with big data to more than just a few elite players. But the challenge is still real time. You mention we will be seeing a hybrid approach to real time and batch MapReduce processing. Will we hear more about these approaches to real time at Strata? And, what do you see as the most important conversations on real time data analytics emerging at Strata?
You point out “open source projects and cloud infrastructure means developers can evaluate and learn to love technologies without requiring support or approval from above.” What are the most exciting developments on the horizon for open source tools?
Edd Dumbill: Here are some projects worth watching, in the key areas of real time, cluster management and Hadoop.
* Cassandra and MongoDB — NoSQL databases that will prove vital for anybody with real time big data needs
* Mesos — a compute cluster management tool, modeled after that which powers Google
* Hadoop ecosystem’s continuing maturation, especially HBase and Hive.
Tish Shute: Do you think the market is ready for p2p cloud computing?
Edd Dumbill: The market is emerging for decentralized and distributed cloud computing, and P2P technologies are one way of achieving that. They key trends will be moving computation nearer the data sets or nearer the point of user consumption of the result.
P2P is a difficult model for anybody wanting to commercialize a service, so I think it will tend to form part of a hybrid solution.
Tish Shute: We have seen enormous strides in our ability to work with giant unstructured databases recently. Do you think, perhaps, that the dream of a web of linked data – “a web of data that can be processed directly and indirectly by machines,” will be attained through brute force – i.e. through our ability to harness the power of massively parallel processing, as much as by Semantic Web approaches focused on machine readable metadata? [Also see my question on Quora, “Is this a good approach (www.dist-systems.bbn.com/people/…) to use Hadoop to build a scalable, distributed triple store?”]
Edd Dumbill: I’ve been an observer of the SW for over a decade and I tend to believe that on the web, data means to you whatever meaning you give it as the consumer. With that model, the links are made by the consumer rather than sitting out there explicitly. Some links become de facto standards, and some very few become web standards.
I think the actuality will be a mix of both explicitly stated metadata and that which is inferred. The Semantic Web is a great framework for certain operations, especially interoperable exchange of metadata. A great many more private meanings, never intended to be shared, will be created by consuming software.
There’s no question that machines will learn how to process most of the Web. Furthermore, machines will learn how to process most of the physical world we’re in. And that by the end of this decade.
Talking with Sophia Parafina
Picture of Sophia at Where 2.0
Tish Shute: Sophia you have worked in the trenches for a long time now to support the growth of open geo data. What do you hope to see emerge in 2011 in the field of geo-data?
Sophia Parafina: Better support for displaying and handling location data across multiple apps. Fred Wilson recently blogged about content-shifting, he talks about overcoming content silos across devices. We’ve worked very hard to reduce data silos via formats, but devices are creating their own silos. I would like to see a standard method for sending geo data and geo information to mobile devices.
Producing content for mobile is different from producing content for a computer browser. Web 2.0 produced a lot of infrastructure for browser based interfaces, but in mobile devices that gap has been filled with apps which is fragmenting how data is handled by various devices. What is even more interesting in the mobile space is that devices can push data back that contains location, user updates, photos and even sensor data. If mobile data standardizes, it could lead to browser based applications and stem the continued fragmentation of the mobile application market.
Sophia Parafina: In the near future think we’ll see startups providing curated data + API and in response we will also see companies that provide a single interface across multiple data providers. We saw this when everyone released a mapping API and companies such as Mapufacture provided a single interface across multiple APIs.
We will see a resurgence in data providers repackaging the the 2010 US Census data in different ways to respond to market segments, some of this will be open data but all of it will be provided through an API instead of file. Additionally, we’ll see more data from outside the US.
Tish Shute: What are the biggest obstacles to having the open geodata sets available that we need to enable mobile local interactions and social augmented experiences?
Sophia Parafina: Licensing for both crowd sourced data and private curated open data will become an issue. We recently seen VLC, the open source video player, pulled from the Apple app store because of licensing issues. Also, licensing of content by geography will be problematic, limiting searches by geographical location. In addition, how will licensing of data that is updated by crowd sourcing work?
Multiple APIs for accessing data sources. The current trend for each provider to create an API for their data sets will result in data silos – there needs to be a single sign-on equivalent for requesting data.
Size of data on the wire, the current models for delivering data is based on broadband connections. However, as mobiles increasingly become the way people use the web, the data needs to be sized accordingly. This also goes for mobile interfaces. Have you tried to shop on a mobile device, or buy a train or plane ticket? It’s frustrating and error prone. There is a large untapped market of people who only use the Internet on mobile devices.
Tish Shute: You pointed me to this link in Strata Gems re “an interesting and pertinent (also a competitor to GeoLoqi),” – the Android Tasker app. What do these emerging services bring to the table in terms of the next generation of location based services?
Sophia Parafina: This app let’s your device interact with the environment. I think that this is a great way of using the sensors on existing platforms to increase interaction and to implement ambient findability. The basic premise of Tasker is that some action happens in response to an event in an application, time, date, location, event, or gesture. Tasker has defined 180 actions that can occur based and number or combination of events. This can provide a basic vocabulary for interaction between the user and the device and more importantly between users. Tasker also can use Android script plugins, which lowers the bar to creating your own ambient application.
Programs such as Tasker can provide a way for people to interact with social networks beyond sending messages. People can use their mobile devices to interact with their surroundings with out having to interact with the device.
Tish Shute: We have had many conversations about emerging ideas of geo-search, geo-messaging and geo-fencing. What are the most interesting developments in these areas and what do you see on the horizon for 2011?
Sophia Parafina: The map will fade into the background and become less important. Display of information will be context aware, that includes location. For example, let’s say I make a grocery list, when I’m at the grocery story, the list will just pop-up without the need for me to find the app that has the list. Or reminders or offers pop-up when you are near a place at a certain time, let’s say you need to buy a present for a birthday party for a child, you could send out a request that you are looking for an item and retailers could offer “on the spot” discounts if you are in the area.
Geo-search, geo-messaging, and geo-fencing are geared to towards mobile devices, so I expect to see them soon as part of apps. Building generic applications that implement geo* will fail because that sort of information is useful only within a context. Geo* apps are solutions looking for an problem. The killer mobile app will use these functions transparently to reduce the cognitive load of the user who is busy moving around in the world.
User data gathered from multiple web applications will become consolidated profiles that will used for context aware applications. For example, there could be a service which matches prices of items that you have shopped for on the web, so for example the service would have access to your cookies, know your favorite retailers, things you have shopped for, your location and activity patters (when you are at home, work, restaurant). When you are in the vicinity of a brick and mortar retailer with the same or similar items, the service can send you alert to match the price of the item you found on line. So your digital life will become more closely linked with your day to day activities.
Talking with Michal Avny
“One of my takeaways from #w2s is that #quora points to future of augmented mobile social experiences – a search filter for experience! #AR”
In your view what are the biggest challenges for location Q&A to emerge as a search filter for location based experiences?
Michal Avny: The biggest location Q&A challenges yet to be conquered are immediacy (real time dynamic data), relevancy (strong personalized filters) and user experience (simplified interface).
Location Q&A enables different use cases. The most prominent are Follow (follow places, topics and friends to learn about a location), Interact (meet new people based on common interests), Plan ahead (plan a trip, night out or a shopping day by asking and searching for local information) and On-site (check for recommendations, friends, deals, events and traffic nearby).
Unlike Follow, Interact and Plan ahead that can be added to existing Q&A platforms (such as Quora) by attending location specifics as they share similar characteristics, the on-site mode introduces a completely different experience, first and foremost it requires immediate attention. It is real time based and the nature of the data is dynamic. Traffic updates, current events, nearby friends, all that changes constantly. Posting a location question on-site implies the response should be in real time (e.g. best kid friendly restaurant), the normal Q&A response latency wouldn’t work.
Strong relevancy filters are required to accommodate for the overwhelming flood of information. Moreover, some of the data should be filtered by user behavior and preferences, check in notifications (type of relation), restaurant recommendations (type of food, price level, etc), shopping deals (commercial categories) and more.
Mobile experience requires ease of use and simplicity. A new Q&A interface and query language that allows for posting questions should be defined as well as coherent summarized response interface. User on the go should not have to post lengthy questions, browse through tens of results or search for the right service, but instead use a simple intuitive tool.
Tish Shute: Real- time location based search is in its infancy. Real time questions can be answered using different services such as Yelp, TripAdvisor, Waze, Foursquare, IMDb and more. But what are the challenges to moving forward with aggregating these sources and then into “locals” that are able to process and deal with vast amounts of information?
Michal Avny: Using some of the leading location services to answer question is sufficient to start with.
In order to provide broad coverage (worldwide) and reliable information, aggregation of the different services is required for instance to normalize product and service rank, aggregate classified, and more. This is quite challenging as there is no one standard available.
When location Q&A user base is big enough, I foresee a tendency to rely more on ‘locals’ input as the base of information. As the platform grows, communities will be formed with different cultures, relationships and trust levels, making the information more valuable and customizable. Some of the challenges I already mentioned are implementing filters, query language and interfaces to enable using the vast amounts of real time data in a mobile environment. More of the challenges lying ahead are integrating the ‘locals’ data with location based services as they are integral components of the Q&A ecosystem. Merging trust levels and relationships while adhering to different privacy guidelines is a challenge yet to be explored. (This should be discussed in more detail under the protocols topic).
It is quite evident that Quora is now facing growing pains and is struggling to maintain its character. Same as with Quora, it will also be a challenge to support and maintain the ecosystem while allowing for massive scale-up.
Tish Shute: I have been very interested in exploring protocols that will be enablers to micro local interaction and mobile social interaction for AR – particularly the XMPP extensions and operational transform work of Google Wave (now Apache Wave), and PubSub protocols like PubHubSubbub and Erlang based RabbitMQ. We are beginning to see protocols emerging that could enable new real time local services. What do you think are some of the most valuable use cases for “locals” that this new generation of real time protocols can enable?
Michal Avny: AR is about interacting with digital information; the AR ecosystem is composed of layers and components such as devices, platforms, browsers, applications and content. For the different components to interact new protocols, security guidelines, and privacy policies must be in place. A standard will enable local vendors and service providers to publish specials, deals, updates and events for any application to broadcast, identify people and places by proximity (without having to use the same application or device), local recommendations will be shared by services, devices will be able to interact, location based platforms, such as Q&A, will have access to vast breadth of information, geo aware devices will provide consistent experience globally, and much more.
Tish Shute: What do you think are the biggest challenges to going mainstream for this emerging field of real time social discovery?
Michal Avny: The biggest challenge is building towards real time, geo-aware, localized, personalized ambient data. Discovery is in its infancy, location social based Best, Top, and Trending lists with some basic filtering options are available, and this is great as people are getting accustomed to information surrounding them. To some degree it can intensify the AR experience, for instance suggest the most popular dish in a restaurant, or map the best coffee shops nearby, but it is customized at best by friend recommendations and depends on the coverage and broadness of the specific discovery service.
There is a need for the next generation of discovery, customized geo social aware discovery that filters the vast amount of real time data by learning user preferences and behavior (built on top of the much needed local social real time open protocol)
Tish Shute: Who are your favorite startups/upstarts in the the field of real time search and why?
Micha Avny: My6Sense - My6sense provides a sharper and better way to experience your information from feeds you subscribe to (Social Networks, News, RSS feeds, etc.). It’s personal – Content is ranked based on what’s relevant to you. It learns what’s valuable to you by translating your consumption behavior into a personalized ranking function.
My6Sense – because it is a personalized prediction filter, a critical foundation for AR
Topsy – Topsy is realtime search powered by the social web that finds the most relevant conversations happening online. The site’s underlying technology examines popular links as well as the influence of each person citing a link. Topsy augments traditional search engines by finding information that people are talking about.
Topsy – because its ranking is based on retweets and influencers, a great social experience
Collecta – Collecta is a real-time search engine for the social web. It monitors the update streams of popular realtime blogs and sites like Twitter, WordPress, and Flickr, and shows results as they happen. Results can be filtered by status updates, comments, stories, or photos. The entire engine is built around the XMPP standard, which pushes out data on a continual basis, so that for every search you end up watching a stream that keeps updating itself.
Collecta – because it is built around XMPP, a real time experience