Google’s DeWitt Clinton, in a comment on my Get In the Cloud piece, asserts that both Google App Engine and Amazon EC2/S3 are already lockin-free by my definition. That’s not quite consistent with the word I’m hearing on the street. I’d appreciate testimony and pointers from others, because this is a really important issue.
Reproducing DeWitt for convenience:
...you can already do that in both the case of Amazon’s services and App Engine. Sure, in the case of EC2 and S3 you'll need to find a new place to host the image and a new backend for the data, but Amazon isn't trying to stop you from doing that. (Actually not sure about the AMI format licensing, but I assumed it was supposed to be open.)
In App Engine’s case people can run the open source userland stack (which exposes the API you code to) on other providers any time they want, and there are plenty of open source bigtable implementations to chose from. Granted, bulk export of data is still a bit of a manual process, but it is doable even today and we’re working to make it even easier.
Are you saying that lock-in is avoided only once the alternative hosts exist?
But how does Amazon or Google facilitate that, beyond getting licensing correct and open sourcing as much code as they can? Obviously we can't be the ones setting up the alternative instances. (Though we can cheer for them, like we did when we saw the App Engine API implemented on top of EC2 and S3.)
To address one of his points: Yes, I think that being lockin-free in theory is much less interesting than having actual concrete commercial alternatives.
But in principle, is DeWitt correct? Would either of Google or Amazon like to say so, officially?
Comment feed for ongoing:
From: Mark Carey (Oct 15 2008, at 10:04)
If people adopt a standard API using something like Eucalptus then I think that will go along way to achieving what you are talking about:
http://eucalyptus.cs.ucsb.edu/
EUCALYPTUS - Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems - is an open-source software infrastructure for implementing "cloud computing" on clusters. The current interface to EUCALYPTUS is compatible with Amazon's EC2 interface, but the infrastructure is designed to support multiple client-side interfaces. EUCALYPTUS is implemented using commonly available Linux tools and basic Web-service technologies making it easy to install and maintain.
[link]
From: James Aylett (Oct 15 2008, at 10:42)
It isn't lock-in-free if you have to re-implement on top of different APIs, which is my main concern (there are BigTable equivalents, but is there an open source equivalent that's even API-compatible, let alone ABI-compatible?). However you can virtualise on top of EC2, and then move your virtual services; I believe there are companies doing this (in fact I'm certain, but I can't remember which). I'm more suspicious of people using the GAE userland stack, because I was under the impression that wasn't really intended for deployment; if I'm wrong then Google has probably done all they can here.
I wouldn't want to trust either cloud or other virtualised services unless I could move my virtual image somewhere else. And I'd want to read a success story of someone having actually done it rather than hand-waving saying it's possible. My feeling is we're getting close to this atop Google and Amazon, but we're not quite there yet...
[link]
From: Boris Mann (Oct 15 2008, at 11:05)
I think the other production-grade host(s) DO need to exist in order for you to not be locked in (assuming you are a software layer guy that doesn't know how to replicate all of Google's or Amazon's infrastructure).
Otherwise, the starting currency (assuming the software stack is available -- others will point out EUCALYPTUS for cloning Amazon, Google is a little trickier) would be a container full of servers (probably 2 containers if you want them redundant in 2 data centers).
[link]
From: Vincent Janelle (Oct 15 2008, at 11:23)
It depends more on how you implement your application. If you utilize SQL servers(with the new elastic block storage, or something like hadoop which can interface directly with S3), don't use paid amis (which charge a premium above the amazon per-hour fees), and can re-implement or design your storage to be able to replace S3(or just continue to use it and pay for traffic), then EC2 isn't 'locked' in.
This of course goes out the window the second you depend on simpledb/sqs/s3/etc/etc without some sort of abstraction, or just simply can't afford the hardware to implement storage/CPU usage yourself.
'Clouds' like google app engine are interesting in that they offer a bit of magic to stop you from having to scale resources yourself (like you do with EC2, or colocation), but requires porting efforts in terms of data storage to migrate away from.
[link]
From: James Urquhart (The Wisdom of Clouds) (Oct 15 2008, at 11:58)
Tim,
I'm with you. If there are no alternative infrastructures on the open market for Amazon and/or Google, than the barrier of entry must be too high. This is to Google and Amazon's advantage, and is therefore a form of lockin.
Now, I must say there is a rumor out there that one of the open source "private cloud" platforms aims to be both. How successfully that is done remains to be seen, however--both technically and from a business perspective.
[link]
From: Geir Magnusson Jr (Oct 15 2008, at 13:18)
Tim :
"To address one of his points: Yes, I think that being lockin-free in theory is much less interesting than having actual concrete commercial alternatives."
Pragmatically I agree, but recent history makes me cautious, as those commercial alternative may be beholden somehow to the original. I think that won't matter to most people, though.
Portability is going to be an interesting issue in "the cloud".
At 10gen, we're working on a POC framework to let people run AppEngine apps on 10gen. (We haven't yet limited query returns to 1000 objects, but give us time... ;)
We hope we can provide an alternative, and one in which the full-stack is available under open source licenses. Clearly the core tech underlying AppEngine is fantastic - BigTable, for all of its limitations for people used to RDMBSs - is well tested and incredibly scalable. But I'll bet we won't see the technology as open source for quite a while.
We'll announce the POC when it's ready - until then, people can try the platform now. (http://www.10gen.com/)
geir
[link]
From: Nick Johnson (Oct 15 2008, at 13:31)
In my mind, 'lock in' implies some attempt to actively prevent people from switching. A lack of alternatives isn't the same as lock in, especially when the company has gone out of their way to make the interfaces used accessible to developers.
Disclaimer: I work for Google, and my 20% job is App Engine. My opinions are my own and not Google's, etc etc.
[link]
From: Niall Kennedy (Oct 15 2008, at 13:48)
Inside or outside the cloud, scalable applications will code as close to the parent system as possible. Are you optimizing for MySQL 5.1 on a InnoDB engine with latest third-party patches applied? You can raise a new instance in another location but your lockin is really the cost of moving away from the tightly bounded infrastructure. ORMs and other abstraction layers can be thrown away once you've moved past the prototype stage and start counting throughput.
Google App Engine uses SQLite on localhost to mimic functionality of Google's system but it's not built for scale. You'll also have to move image handling out of the Picasa API before porting out to a PIL-based system.
Are there viable BigTable clones on the market where you could dump out data from App Engine and easily import? HBase isn't there yet but is showing progress (and Microsoft just green-lighted Powerset's continued contribution to the project). Aster? maybe.
Even if you could grab all your data out of App Engine in XML or other forms you'll be massaging it to a new system. Not necessarily a bad thing, but I don't want to see the platform holding back new feature development to make sure mysqldump works everywhere.
[link]
From: Ian Sollars (Oct 15 2008, at 14:21)
If I've understood your line of reasoning, being practically lock-in-free essentially means coding against a spec that's implemented by multiple vendors, e.g. servlets, as opposed to simply having the spec out there.
Following on from what James Urquhart said, I think that although the big up-front investment in infrastructure for building a service compatible with (e.g.) the EC2 or AppEngine API is practically equivalent to lock-in *at the moment*, it's not, in my view, equivalent to real vendor lock-in, e.g. proprietary file formats, DRM, etc.
I think the 'practical lock-in' will disappear with time, because (a) AppEngine & EC2 are way ahead of the market, which must inevitably catch up, and (b) the corollary, if the rest of the hosting market doesn't follow suite, they're going to find their lunch eaten.
Since new entrants are going to find competing on price & uptime difficult given the edge Google & Amazon have - I'll bet that even if Rackspace decided to clone the Amazon API, they'd not be able to offer the same prices and meet costs - so, of course, there'll have to be other value-adds, like different jurisdictions, or locality, support, or Ken Church's Condo model, or something else.
[link]
From: Mano Marks (Oct 15 2008, at 17:34)
Disclaimer: I work for Google as a Developer Advocate for App Engine. So it is my job to answer this kind of question.
I can say for us, at Google, data and code portability are import to us. What Google offers is an SDK that is compatible with the live App Engine implementation. Here's AppDrop's proof of concept story of running App Engine on EC2: http://appdrop.com/.
I think people can differ on the definition of lock-in, but personally I feel that the bar is set pretty high if you require multiple additional platforms to support the exact same API, at least as far as cloud computing goes. Obviously, one of the problems both Google and Amazon are trying to solve is that it is difficult and expensive to run your own data center, so we are working on providing alternatives. That means any other services have to start their own data centers, or developers have to be willing to create their own, a pretty high bar all around.
So, bottom line, I think DeWitt is right. Great post! As with many of these kinds of conversations, I think the discussion is just as important as the answer.
Thanks,
Mano
[link]
From: Jacek (Oct 15 2008, at 21:33)
I believe what you're looking for here is commoditization: (from wikipedia) "transformation of the market for a unique, branded product into a market based on undifferentiated products". In particular, successful standardization leads to commoditization. (Perhaps we can even define unsuccessful standardization as that which does not lead to commoditization in its particular space.)
What cloud providers can do to ensure zero-barrier-to-exit is standardize, probably on APIs, maybe on ABI. Standardization should be done through a third party with a known process of arriving at a stable specification. Otherwise, as open as many specs may be, the alternative vendors are stuck behind the original provider. That causes lock-in as long as new (and useful) features appear in new versions of the spec that are not yet supported by the alternatives.
Of course, standardization takes time, and cloud approaches may not be known well enough yet even to start useful standardization. And there may be no standards body respected in this particular area, dunno.
So, to answer DeWitt, the Googles and Amazons can do more than just document their APIs and open up the licensing: they can give up the particular IP and form standards. If they create good standards, other implementations may follow and the lock-in may go away.
[link]
From: Doug Cutting (Oct 16 2008, at 09:00)
From my experience with Hadoop, I can say that Amazon provides little lock-in. I was able to get Hadoop running on AWS in a trivial amount of time with proprietary APIs confined to a few installation and configuration scripts.
Google's AppEngine does not provide lock-in in theory, since the APIs are open, but there are not (yet) other scalable implementations of those APIs, so, in practice, you are today locked in, but long-term term you're probably not, API-wise.
But, as I said in your previous message, API lock-in may not be the problem this round. Rather, if your partners' data is already in Amazon, then there's tremendous incentive for you to put your data there too, so that you can efficiently collaborate. (Inter-AWS bandwidth is free. You only pay when you leave their cloud.)
If applications are data-intensive and they involve outsourcing sub-components to other cloud services, then a critical mass of services in one cloud gives that cloud an unsurmountable advantage. Applications which host themselves outside of that cloud would operate at a disadvantage. This imbalance would only grow.
[link]
From: Wes Felter (Oct 17 2008, at 21:11)
Accepting that it's possible and admitting that we want it, is a commercial EC2 clone viable? Does anyone have an incentive to create it?
[link]
From: Luka Marinko (Oct 19 2008, at 05:43)
Scalable applications need to be avare and adapt to the infrastructure( cloud ) they are working on.
And unless someone goes and literary clones Google's or Amazon's data center you will not be able to move to another cloud without some work.
But if you are not writting next facebook/amazon/digg/(insert huge site) killer chances are that you will not have to change much.
And besides as long as you can get to your data you can migrate. Sure migrating large amount of data form bigtable to rdbms doesn't sound much fun but its doable. (and everybody uses data abstraction layer right ?)
But even today with php and host, its only easy to migrate as long as you have few machines worth of infrastructure. As soon as you have memcached farms, SAN's etc. migrating to other providers (colocations etc.) is not automatic or easy.
[link]