Recently, Qualcomm announced an SDK for vision based augmented reality – currently in private beta and open to the public this fall. The Qualcomm augmented reality (AR) bonanza will launch with a $200,000 developer challenge and a SDK that will put vision based augmented reality into the hands of developers without licensing fees.
This is a big step forward for augmented reality and a very important move made by an industry giant to support the rapidly evolving AR industry. Innovation at all levels of the AR stack, particularly at the hardware level (CPU/GPU optimization) is vital for the full vision of augmented reality – media tightly registered to physical space, to take center stage. Vision based AR takes mobile AR beyond compass/GPS based AR post-its, which are only loosely connected to the world (but the staple of most current AR apps), towards the holy grail of AR – markerless tracking with the whole world as the platform.
Click on the image above or see here for a video demo of an AR version of Rock’em Sock’em Robots game. Mattel, one of the first companies working with the SDK demoed AR Rock’em Sock’em, at the Uplinq 2010 conference (see Chris Cameron’s ReadWriteWeb write-up on Uplinq 2010).
The Qualcomm AR stack, which reaches from the metal to developer APIs, will give Android developers an important edge in AR development. And, when vision based AR starts getting integrated with visual search capabilities, and combined with cool tools like Unity, we will start to see the augmented world get really interesting.
Visual search is already an area of AR getting a lot of attention, with Google Goggles, Point and Find, Japan’s NTT DoCoMo set to launch “chokkan nabi,” or “intuitive navigation,” in September, and the recent partnership between Layar and Kooaba. Metaio’s mobile augmented reality platform Junaio is already integrated with Kooaba’s computer vision capabilities.
And, of course, I am particularly excited about including open distributed real time communications for AR in this stack, which is why I asked a group of developers who have been inputting into the ARWave project if they had questions for Jay Wright, Qualcomm. Thank you Yohan Baillot, Gene Becker, Anselm Hook, Patrick O’Shaughnessey, Thomas Wrobel, Markus Strickler, and Davide Carnovale for your input. [Note: see my upcoming post, about the future of ARWave and real time distributed communications for AR following this Google announcement.]
Jay Wright, “is responsible for developing and driving Qualcomm’s augmented reality commercialization strategy.” He “handles partnerships with leading innovators in industry and academia and leads Qualcomm’s efforts in enabling augmented reality within the mobile ecosystem.” In the interview below, Jay very generously answers our questions in detail.
A key contributor of questions for this interview is Yohan Baillot. Yohan is working on a full vision of AR – integrating computer vision, visual search, open distributed real time communications and AR eyewear. Yohan Baillot is founder of Simulation3D, a consulting and system integration company specializing in interactive visualization systems and eyewear-based AR systems. (I hope to bring you an interview with Yohan soon!).
Qualcomm was the title sponsor for are2010, Augmented Reality Event, and played a vital role in making this event an historic gathering of the talent and creative minds at the heart of the emerging AR industry. Watch out for the videos of the are2010 sessions to be posted at the end of August. My are2010 co-chair, Ori Inbar, is preparing them to go online while kicking his newly funded start up, Ogmento, into high gear! Ogmento is also one of the start ups pioneering vision based AR.
Metaio, (with Total Immersion, they are one of the first augmented reality companies), has played a key role in bringing a vision component to smart phone augmented reality apps with their Unifeye mobile SDK. Junaio, Metaio’s own mobile augmented reality platform has gone beyond location based AR with “junaio glue” – “the camera’s eye is now able to identify objects and “glue” object specific real-time, dynamic, social and 3D information onto the object itself,” (see my upcoming interview with Metaio founder, Thomas Alt). Also, recently, Layar – who continue to innovate at a breathtaking pace, announced a partnership with the computer vision company Kooaba.
Both Maarten Lens-FitzGerald, Layar, and Thomas Alt, Metaio, when I spoke to them recently, saw the Qualcomm SDK as a very positive development for AR, and they look forward to exploring its capabilities and integrating it where appropriate with their AR tools. See more about Layar’s upcoming visit, to the US here – August 10th NYC, and August 12th SF. Also save the date, Sept 27th, Munich, for InsideAR, Metaio’s upcoming conference.
It is clear that vision based AR will be driving the next wave of AR apps. And, as Maarten and Thomas both pointed out, it will be interesting to see which use cases capture the imagination of users the most. Having more tools freely available to AR developers will certainly be a boost to creativity. And, Qualcomm’s SDK is going to give Android developers, in particular, a big opportunity to take the lead.
Interview with Jay Wright, Director, Business Development, Qualcomm
Tish Shute: Before I start with questions on the new Qualcomm vision based augmented reality SDK, I want to briefly look ahead to what many people feel is vital for the full realization of augmented reality – head mounted displays, or more specifically, comfortable, sexy AR eye wear. Is Qualcomm going to be involved in the development of augmented eye wear and wearable displays?
Jay Wright: I think there’s some core technology that needs to come together so we can have what we think needs to be a see-through head mounted display with a decent field of view. And that looks like something that is quite possibly further than a three to five year horizon.
Tish Shute: Gene Becker asked some interesting general questions about the Qualcomm AR initiatives. He said, “I’m unclear exactly what Qualcomm’s goal is.” It would be interesting to hear from you the Qualcomm view, from the top down.
Jay Wright: Our largest revenue stream comes from sales of chipsets. And we see augmented reality as a technology that drives demand for increasing amounts of processing power. So we want to create demand for chips, higher-end chips, and augmented reality does that. Specifically vision based augmented reality because it is so computationally intensive.
Tish Shute: Yes. And I think that is why people are very excited by the Qualcomm SDK. It is not only the first free toolkit for developers to build vision apps from, isn’t it? There’s been nothing freely available before this, has there? But also Qualcomm is paying attention to the complete AR stack to support vision based AR development, from the chips to game/app development tools like Unity.
Jay Wright: That’s really the goal. We’re not here to be in the augmented reality applications business. Qualcomm’s role in the ecosystem has been to serve as an enabler. And that’s what we want to do with augmented reality: provide the enabling technology that allows the entire ecosystem to flourish.
“Augmented Reality has a number of attributes that make it a great fit for Qualcomm’s core competencies”
Augmented Reality has a number of attributes that make it a great fit for Qualcomm’s core competencies. It’s very computationally intensive, algorithmically complex, requires tight integration of hardware and software, and benefits from tight integration of multiple hardware components. And that’s the kind of problem we like here, where we can apply our core competence of really optimizing complex systems for performance, while at the same time minimizing power consumption.
And as you know Tish, mobile AR is really extremely power sensitive. We sometimes talk about it as a battery’s worst nightmare. It’s roughly equivalent to playing a 3D game and recording a video all at the same time.
Whenever there is something that takes a lot of power, that’s a definite opportunity for us to optimize it.
Tish Shute: Right. One of the core business is chips right, but for Qualcomm there’s basically a lot of profit in licensing. When I talked to the developer community about the Qualcomm SDK developers first question was, “What’s the licensing? What’s this going to cost us in the long run to develop on this SDK re licensing?” And they had all different takes on this. So everyone had different ideas about what your approach to licensing might or might not be. Could you clarify the approach to licensing, as I think this is a core concern for developers.
Jay Wright: Anytime you see something for free, you kind of say, “Hey, what’s the hook?” So yes, it’s definitely a logical question. Our intent is not to generate licensing revenue from application developers using the SDK. So the SDK will be made available free of charge for development, and it will also be free of charge for developers to deploy applications.
Tish Shute: Now, this is another question. You also include not just image recognition capabilities but Unity in the package you are offering developers. Unity products usually involve a license. They do have some free products too, I think. But how does this work? And how do you separate your part from their part, or don’t you?
Jay Wright: That’s a good question. What we’re trying to do with the platform is incorporate it into tools that people already know how to use. So we’re actually going to have the SDK support two different tool chains. One of them is the Android SDK and NDK. And then the other one, is Unity.
We’re working with Unity to create an extension to the Unity environment that will be available as part of the Unity installer when you install Unity from the Unity website. Developers will still be paying whatever license fees are associated with Unity’s products on their existing pricing schedule.
Tish Shute: One of Thomas Wrobel’s question is whether developers can just use the image recognition without Unity? Your answer is yes, you can work with the computer vision component of the SDK separate from Unity?
Jay Wright: Yes, you can.
Tish Shute: Good because we would like to build a completely open Android client for ARWave, and not tie it to Unity unless people choose to. He’s using the open Android JPCT 3D engine, which he’s adapting for AR. So he could actually use the part of the SDK that does image recognition and association with that, right?
Jay Wright: That’s correct. You are not required to use Unity. Unity is just one option for building the application.
Tish Shute: Great! That’s very good. But I’m sure many developers are going to jump on the chance to use Unity. But I mean it’s nice to be flexible because it’s so early for AR that people have different ideas and new use cases coming up all the time. I think it’s excellent you’ve divided that.
Another of Thomas’s questions was, “Can developers use their own positioning data sharing solution?” He’s really talking about AR blips.
Jay Wright: With data sharing solutions, I am assuming that by data he means referring to augmentation data or graphics?
Tish Shute: Yes, and I’ll ask him to elaborate. But, at the moment, everyone is using different ideas for POI, aren’t they?
“The goal with our platform is to make it just as easy for a developer to create 3D content for the real world as it is for a game world or a virtual world.”
Jay Wright: Yes. So let me answer it this way, Tish. The goal with our platform is to make it just as easy for a developer to create 3D content for the real world as it is for a game world or a virtual world. So all we’re really trying to do is provide the computer vision piece that makes the real world look like a bunch of geometric surfaces and potentially some meta data that is associated with this so you know what you are looking at.
So that means from a developer’s perspective, you are still doing all of the 3D content, all of the animations, all of the game logic, all of the rendering. You are still doing that all yourself. So if you think about doing an AR game, you are doing everything you used to do, except you are not creating a virtual terrain. You are just going to map it in the real world.
So if you want to do a browser that is doing POI’s, your POI data, or augmentation, or meta data, or whatever it is, that can be in your application, it can be in the cloud, it can be wherever you want to put it. We’re not putting any constraints on what that content is or where it’s stored.
Tish Shute: Right, and that’s what I hoped for. And I think that does answer the question. People are interested to know how far Qualcomm is going with this. For instance, Gene Becker asked: “do they see a business at a certain level in the AR stack?” As you said AR development basically feeds into the core business of chip development, right? But does Qualcomm also see some new business models developing?
Jay Wright: I think it’s foreseeable that Qualcomm could identify other business opportunities down the line. But we’re certainly not there today. Today, our motivation for the investment in AR is to create technology that is going to advance the chipset business.
Tish Shute: When the news came out about Qualcomm’s support of a game development studio at Georgia Tech at the same time as the SDK I think I wondered what was the scope of Qualcomm’s interest [for more on using Unity for AR development see Vision-Based Augmented Reality Technical Super Session video from Uplinq 2010]. For example, I am interested to know how the Qualcomm initiative in developing an AR stack connects to the effort to introduce an AR browser based on web standards, i.e., the Kharma/Kamra KML/HTML Augmented Reality Mobile Architecture from Blair MacIntyre and the Georgia Tech team (image below)? Are you supporting the open standards based browser development too?
Jay Wright: Blair is going to continue to work on the browser effort. And it’s our expectation that he will use our SDK and technologies for vision pieces of the browser effort where appropriate. So they are certainly not mutually exclusive. I would just think about our technology as one element of what may be used in that browser, as I expect it would be an element of what any other app developer would put in their application, whether it be browser, or game, or whatever.
Tish Shute: Yes Now, this is an interesting question, which is sort of connected…I’m trying to keep some form of narrative for this! It follows from the question about Blair’s web-based standards browser. A few people have asked me why we haven’t heard more from Qualcomm in all these various standard discussions that are starting to come up. I mean is it just too early, or are you too busy, or what?
Jay Wright: No, let me explain. The type of standards that have come up so far have been around how HTML should be extended for geo-browser type applications. And while that’s interesting, I think the standards efforts that Qualcomm would be more likely to be associated with in the near term are those related to API’s that are hardware accelerated.
So one of the things that we are in the process of doing right now, Tish – because as you know, Qualcomm is a company that adheres to standards and strives to produce a leading implementation of those standards on our hardware and software – is we are in the process of determining what API set within the existing SDK should be standardized.
Tish Shute: Right.
Now, my next question is, “Who are the other players at this level of the AR stack in the standards conversation? Who else is working at that level?” Obviously, the AR Lab in Graz was, but now they are Qualcomm, right?
Jay Wright: They are still independent. Qualcomm is the exclusive industrial partner of the Christian Doppler Handheld AR LAB in Graz.
Tish Shute: Does this compete with, say, the work that other AR start ups are doing?
Jay Wright: Our intent is not to compete with companies that have done augmented reality technology. Our intent is to enable the entire ecosystem. So we would like to work with both Metaio and Total Immersion to find ways that they can benefit from our technology. That would be the hope – that our technology can kind of lift and float all boats in the ecosystem.
Tish Shute: There are not many implementations of vision based AR right now? I mean obviously Microsoft is doing stuff because they have Georg Klein now, right, and there is Google Goggles, Total Immersion, Metaio, and it will be interesting to see where Layar’s partnership with Kooaba will lead?
Jay Wright: Yes. I think there are relatively few commercial implementations of vision based AR stacks.
Tish Shute: One of Patrick O’Shaughnessey’s question is he wants to understand what features are going to be in the vision component, very specifically. Patrick O’Shaughnessy, Patched Reality, working with Circ.us, Edelman, and Metaio used the Unifeye SDK to do a vision based AR app for Ben and Jerry’s that’s been getting all the attention lately. He was a speaker at are2010.
He very specifically wants to know what features will be included in the computer vision component. He says, “I’m most interested in understanding what features are going to be in the vision component. Is it marker based?” Well I know it’s more than marker based. I saw some of it in Chris Cameron’s ReadWriteWeb write-up on Uplinq 2010. Is it “NFT? PTAM? other? Also, are you are integrating any backend services.” That is an interesting question!
Jay Wright: So let’s get to the features on the client side, the vision based features. There’s support for, what AR aficionados would know as natural feature targets, or image based targets. And we use those to represent, obviously, 2D planar surfaces.
The other thing that we are trying to do to set expectations, Tish, about where these can be used is to let people know that they work best in what we’re calling near-field environments. So the idea isn’t that you use the system to create a large scale AR system that can recognize buildings indoors and outdoors. It’s the idea where I can recreate 3D experiences that take place on surfaces that are in my immediate field of view, whether that be on the table in front of me, or on the floor, or on the wall, or on the shelf.
Also, when you talk about near field experiences, there are some other constraints that are implied. Like, if it’s in front of me and my immediate field of view is probably going to be pretty well lit. And lighting, of course, is an important requirement.
So we’ll support these natural feature targets, or image targets. And we also have support for sort of a hybrid marker image type. It’s something called a frame marker, which has kind of a black border with some dots on it.
Click on the image above or here to view Vision-Based Augmented Reality Technical Super Session video from Uplinq 2010
Jay Wright: So there’s this additional type. And the reason for this additional hybrid marker type is it has a lower computational requirement than a natural feature target. So the idea is these things can be used as game pieces or elements of play where I want to have a large number of them detected and tracked simultaneously.
So you can have, for example, one big natural feature target that serves as a game board or game surface, and you can use these other things as smaller game pieces. And when you put them out, different types of content can appear on them and do different things.
Tish Shute: Yes, that’s nice! And the other thing I noticed was the virtual buttons. How well developed is that?
Jay Wright: The idea behind virtual buttons is, in addition to supporting augmentation, we want to support interaction. And we think there are going to be different types of user interaction with augmented reality content. It may be hand tracking and finger tracking, but another compelling form we’ve identified so far is the ability for me to touch particular surfaces and have an event fire within the application..
So virtual buttons are rectangular areas on image targets that a developer can define, and they serve as buttons. So you can create a target that is a game board, for example, and define certain regions. And when the user covers that region with his hand, like pushing a button, your application can detect that event and take some action.
Tish Shute: Nice! And what is the documentation on these capabilities that is offered by Qualcomm…For example Yohan Baillot, who is interested in integrating eyewear-based AR systems with smartphones asked. How deep does this go? Will there be full documentation on Snapdragon, people who want to work at that level? Is there a chip SDK?
Jay Wright: . Qualcomm’s model is to work with providers of the operating systems and deliver functionality of the chip through the operating system. So many operating systems APIs will take advantage of functionality that’s in the chip. But there is no separate chip SDK per se.
Tish Shute: I suppose that does come up a little bit with one of Anselm Hook’s questions, because there is some overlap with Google Goggles here, isn’t there, in terms of what you’re doing, right? Are you going to work closely with Google Goggles ?
Jay Wright: Google Goggles is performing what we’ve described ‘visual search’. So the idea is you take a picture, send it to the cloud and identify it and the results come back. I think if we see Google Goggles go in a direction where there’s an AR experience that would be a good area for us to collaborate with Google.
Tish Shute: Anselm Hook is very interested in having some kind of open standard around this physical tagging of the world, right, – the physical world as a platform. But I suppose that’s down the road but is there a plan to start talking about open standards here – visual search with image recognition? That’s a very powerful combination. (see my interview with Anselm Hook here).
Jay Wright: I think it is. And we’re very interested to hear from developers and others that have ideas about how they would want to integrate with the functionality that we have to best enable those kinds of combined experiences.
Tish Shute: Well, I know Anselm has a lot of very important ideas on that.
Jay Wright: I’d be very interested in hearing those because we want to do everything we can to enable the maximum number of applications and best user experience for anything that people want to do.
Tish Shute: Let’s go back to some specific questions about the platform, right? For example Yohan Baillot asked, “Is it arbitrary image/tag recognition supported? Is the tag / image specifiable by user? Is face recognition supported?” Not yet, face recognition, right?
Jay Wright: Not yet.
Tish Shute: What are the plans with that?
Jay Wright: I think we’ve identified it as an interesting area and something that there’s some interest in, but have not made a decision on a particular technology direction.
Tish Shute: You’ve answered some of these but 3D model based vision tracking. Yohan’s question was, “Is 3D model based vision tracking supported (that is recover the pose of the camera using a known 3D model and a 2D camera view of this model)?”
Jay Wright: That’s something we’re looking at very closely, but again, don’t have a plan, or don’t have a future date for.
Tish Shute: And you said with the natural landmark tracking that’s not supported, right?
Jay Wright: I don’t know if I know what that means, Tish. But we don’t have any APIs that provide compass or GPS functionality other than already exists in the operating system. So if you want to take advantage of the compass or other sensors, you can absolutely do that, but the SDK does not currently provide anything different or anything more than already exists in the OS.
Tish Shute: This is an interesting question, “Is Snapdragon offloading some processing to the GPU, if any?”
Jay Wright: Certainly rendering functionality that utilizes OpenGL is being offloaded to the GPU. We’re currently in the process of determining multiple methods for offloading functionality between both symmetric and heterogeneous cores on Snapdragon. Which would include the GPU, the apps processor, and DSPs.
Tish Shute: No one has truly solved optimizing the GPU/CPU for mobile AR yet have they?
Jay Wright: That really gets to the heart of the optimization here. Which pieces ought to be operating on which cores and when, and why? And that’s something that we’re looking at very closely.
Tish Shute: Right. The only AR – that is truly 3D media tightly registered to the physical world has been done for military and medical (and that has often been with a locked of camera!). But to take mobile AR to the next level I think many developers would like access to the CPU/GPU, for example a developer interested in the future of eyewear like Yohan?
Jay Wright: We’re very interested in hearing what kinds of tools developers would like to see.
Tish Shute: What is the best forum for discussing feature specifics?
Jay Wright: To provide feature requests to us?
Tish Shute: Yes. And discuss them.
Jay Wright: if people go to qdev.net/AR there’s an application up there for the private beta program. So if people do have ideas about features or other things they would like to see, they’re welcome to submit [their requests and ideas] there.
Tish Shute: I also have some questions about the specifics of the competition. Some people are a little confused about some things. Yohan asked, “What is the expected form of the project? Lab demonstration? Specific capability? Complete end to end system?”
Jay Wright: The only requirement is that they submit an Android application that we can then get running on a device. So if it has a backend component or backend server that it works against, great. If it does, it does. If it doesn’t, it doesn’t. But that’s really it. There’s no limit to the application category. It can be a game, it can be a museum tour, it can be a children’s learning game or learning experience. It can really be anything. The idea is we want to find experiences for which AR delivers some unique value. We’ll be announcing more specifics about the competition in the near-future.
Tish Shute: Right, because some people weren’t sure about the Unity being separated whether it was biased towards games. And it’s not really, is it?
Jay Wright: Unity is a bias toward just rapid development for 3D, I think. It’s most commonly associated with games, but there are also a lot of Unity customers that use it for medical simulations and other types of applications that aren’t really games at all.
Tish Shute: Yes. It’s very flexible, I know. You did bring up the backend services again. Are you thinking of offering any of that?
Jay Wright: There is a backend tool that we offer. And the backend tool is what you use to generate your targets. So if you want to create or use a particular image for a target in your application, you upload it to our target management application, and then it will evaluate that target and tell you how well it will work. So as you know, certain images are more likely to be recognizable than others. And so there’s metrics in that application that will give you some feedback.
And then you can download your target resource from the website that you can then incorporate into your application project.
Tish Shute: So this is available at the moment to people who are in the private beta and not to…you know, all of this information and documentation, right?
Jay Wright: That’s correct.
Tish Shute: So that’s an incentive. Now, just to encourage people to submit to the private beta is the other thing that people seem confused about. In one part you say 25 developers. And some people have thought that meant it was limited to 25 individuals. And some people have like maybe four people on their team, so they were going, “Well, are we going to be accepted because we have four developers, or do we count as one because we are all working at the same project?”
Jay Wright: it’s just 25 companies.
Tish Shute: OK. I think we’ve gone through the questions. Just to clarify and maybe give some incentive for people to apply to the private beta…the big advantage of getting in the private beta, aside from getting a month’s start on the competition, is that you get a chance to input, right?
Jay Wright: Yes. A chance to provide feedback, get early access to the technology. And then we are also providing a free HTC phone.
Tish Shute: Oh, yes. I forgot the phone. Yes, right. In the requirements, though, you basically seem to be asking for sort of a full app…some people get reticent about delivering their full application plan, right?
Jay Wright: Yes. I understand that. People should just reveal what they are comfortable talking about. Just so you understand the constraint on this end, this is early technology and we’re trying to understand exactly what the support requirement is going to be. And we have limited supported resources at this time, so we want to make sure that we can focus the resources that we have on folks that are really going to use the technology and have a sound plan to actually build something. So that’s really the motivation behind limiting the size of the private beta.
Tish Shute: OK. Yes, it’s good to reiterate that. We’re down to the last question that I have, and then I’ll ask you if there is anything that I missed. You say you are partnering with Mattel. Who are the developers? Because I mean Mattel isn’t an augmented reality development team.
Jay Wright: Mattel used a subcontractor, Aura Interactive.
Tish Shute: Nice. But that’s your only partner that I saw, right? Why Mattel?
Jay Wright: Well, to launch a new technology, companies will often find showcase partners to demonstrate compelling uses of it. And we thought Mattel and the Rock’em Sock’em™ toy was a great example of combining augmented reality with an existing toy.
Tish Shute: And I think people agree with you on Rock’em Sock’em (see Chris Cameron’s RWW post).
Jay Wright: And there’s other showcase partners and applications that we will continue to work on to kind of spur the ecosystem and show what is possible.
Tish Shute: OK. Now, is there anything I’ve left out that you think? What’s the core of this narrative that we need to get across, and if I’ve left anything out that is a key piece?
Jay Wright: I think you’ve done an excellent job of covering all the bases, Tish.
Tish Shute: [laughs]
Jay Wright: I think the important overriding message to get across is that we really see ourselves in an enablement role here, and that we are trying to provide….we’d like to provide fundamental technology that helps all developers build content for the real world.