Monday, November 30, 2009
How to store images larger than 1 megabyte in Google App Engine
Over the summer, Google App Engine raised its limits for web requests and responses from 1MB to 10MB, but kept the maximum size of any single database element at 1MB. If you try to exceed this, you'll get a MemoryError. You can find a fair amount of grief and woe and gnashing of teeth and wearing of sackcloth and ashes about this online.
Which is kind of surprising, because it's not that hard to break files up into chunks and store those chunks in the database separately. Here's what I did today for my current project, which stores data - including photos - uploaded from smartphones:
First, we have to receive the uploaded image. Our uploads are two-phase - first data, then a photo - for various reasons. The data upload includes the image's file name; the photo upload is a basic form/multipart POST with exactly one argument (the filename) and its value (the file).
So, in "main.py":
and in "ec.py":
Pretty basic stuff: we chop the image up at each 1,000,000-byte mark, and put each chunk into its own ImageChunk DB object.
Then, when we need to retrieve the image, in 'main.py':
and in 'ec.py':
Since db.Blob is a subtype of str, that's all you have to do. I don't understand why some people are so upset about this: it's mildly annoying that I had to write the above, but hardly crippling. At least with JPEGs, which is what we use. (But I don't see why any other file type would be more difficult; they're ultimately all just a bunch of bytes). Could hardly be easier ... well, until App Engine rolls out their large file service.
(eta, Dec 14: which came out today! Meaning you can now disregard all the above and just use the new Blobstore instead.)
(eta, Dec 16: mmm, maybe not. Looked at the Blobstore in detail today, and it's really best suited for browser projects, not app or web-service stuff. The API for the blobs is very limited, and you can only access them via one-time-only URLs that App Engine puts in your HTML. You could scrape that, granted, but that's a pain in the ass, no less inelegant than the image-chunking solution above. It's experimental and subject to change, too. I think I'll hold out until its API improves.)
Which is kind of surprising, because it's not that hard to break files up into chunks and store those chunks in the database separately. Here's what I did today for my current project, which stores data - including photos - uploaded from smartphones:
First, we have to receive the uploaded image. Our uploads are two-phase - first data, then a photo - for various reasons. The data upload includes the image's file name; the photo upload is a basic form/multipart POST with exactly one argument (the filename) and its value (the file).
So, in "main.py":
class SaveImage(webapp.RequestHandler):
def post(self):
entryHandler=ec.EntryHandler()
for arg in self.request.arguments():
file = self.request.get(arg)
response = entryHandler.saveImage(arg,file)
self.response.out.write(response)
and in "ec.py":
class ImageChunk(db.Model):
entryRef = db.ReferenceProperty(Entry)
chunkIndex = db.IntegerProperty()
chunk = db.BlobProperty()
class EntryHandler:
def saveImage(self, fileName, file):
results = Entry.all().filter("photoPath =", fileName).fetch(1)
if len(results)==0:
logging.warning("Error - could not find the entry associated with image name "+fileName)
return "Failed"
else:
MaxBTSize=1000000
entry = results[0]
marker=0
chunks=[]
while marker*MaxBTSize<len(file):
if MaxBTSize*(marker+1)>len(file):
chunk = ImageChunk(entryRef=entry, chunkIndex=marker, chunk=db.Blob(file[MaxBTSize*marker:]))
else:
chunk = ImageChunk(entryRef=entry, chunkIndex=marker, chunk=db.Blob(file[MaxBTSize*marker:MaxBTSize*(marker+1)]))
chunk.put()
marker+=1
logging.info("Successfully received image "+fileName)
return "Successfully received image "+fileName
Pretty basic stuff: we chop the image up at each 1,000,000-byte mark, and put each chunk into its own ImageChunk DB object.
Then, when we need to retrieve the image, in 'main.py':
class ShowImageWithKey(webapp.RequestHandler):
def get(self):
key = self.request.get('entryKey')
entryHandler = ec.EntryHandler()
image = entryHandler.getImageByEntryKey(key)
if image is not None:
self.response.headers['Content-Type'] = 'image/jpeg'
self.response.out.write(image)
and in 'ec.py':
def getImageByEntryKey(self, key):
chunks = db.GqlQuery("SELECT * FROM ImageChunk WHERE entryRef = :1 ORDER BY chunkIndex", key).fetch(100)
if len(chunks)==0:
return None
image=""
for chunkRow in chunks:
image+=chunkRow.chunk
return image
Since db.Blob is a subtype of str, that's all you have to do. I don't understand why some people are so upset about this: it's mildly annoying that I had to write the above, but hardly crippling. At least with JPEGs, which is what we use. (But I don't see why any other file type would be more difficult; they're ultimately all just a bunch of bytes). Could hardly be easier ... well, until App Engine rolls out their large file service.
(eta, Dec 14: which came out today! Meaning you can now disregard all the above and just use the new Blobstore instead.)
(eta, Dec 16: mmm, maybe not. Looked at the Blobstore in detail today, and it's really best suited for browser projects, not app or web-service stuff. The API for the blobs is very limited, and you can only access them via one-time-only URLs that App Engine puts in your HTML. You could scrape that, granted, but that's a pain in the ass, no less inelegant than the image-chunking solution above. It's experimental and subject to change, too. I think I'll hold out until its API improves.)
Labels: AppEngine, BigTable, chunking, chunks, Images, JPEG, JPG, limit, MemoryError, python, size
Wednesday, November 4, 2009
to infinity, and beyond!
I am pleased to report that my pet-project iPhone app, iTravelFree, has passed the stern inspection of Apple's App Store and is now available for download worldwide. For app links, a screenshot-laden tutorial, and help and FAQ files, see here: www.wetravelright.com.
(Yeah, crappy URL, I know, but all the good ones were taken.)
Since this is my tech blog let me wax about its architecture a bit. The iPhone app is pretty straightforward: basically, it's a bunch of TableViewControllers, many of which include WebViews, along with a MapViewController, all pointing to a bunch of CoreData records. Nothing extraordinarily fancy by any means.
The server side is more interesting: it's a Google App Engine service, written in Python, that fetches, caches, and parses Wikitravel pages for the app. This gives me a single point of access to the data flow, lets me do things like convert addresses to lat/long location, cuts down on bandwidth for both Wikitravel (thanks to the caching) and the phone app (thanks to the parsing and stripping out of extraneous info.)
The general architecture - phone app plus App Engine service - is actually really powerful and easy to work with. Basically, it's a distributed version of the classic Model-View-Controller architecture, where the phone is the view, the App Engine service is the controller, and whatever data you're accessing is the model. This lets you do all the heavy-lifting computation on the server side, which is where it belongs, and keep the phone (and its puny processor) focused almost purely on the UI.
I do have some reservations about the BigTable data store that App Engine uses, but they don't apply to projects like this, with relatively simple storage requirements and no data mining.
I wrote it in, hrmm, about six weeks all told, starting in July. (Obviously it's been much more than six weeks since then, but I had full-time work starting August so could only work on this in fits and spurts on the side.)
Anyway - the app is in pretty good shape, but there's more work to be done on the server side, so it's still basically in beta test. Take a look, download it, play around, and let me know what you think -
(Yeah, crappy URL, I know, but all the good ones were taken.)
Since this is my tech blog let me wax about its architecture a bit. The iPhone app is pretty straightforward: basically, it's a bunch of TableViewControllers, many of which include WebViews, along with a MapViewController, all pointing to a bunch of CoreData records. Nothing extraordinarily fancy by any means.
The server side is more interesting: it's a Google App Engine service, written in Python, that fetches, caches, and parses Wikitravel pages for the app. This gives me a single point of access to the data flow, lets me do things like convert addresses to lat/long location, cuts down on bandwidth for both Wikitravel (thanks to the caching) and the phone app (thanks to the parsing and stripping out of extraneous info.)
The general architecture - phone app plus App Engine service - is actually really powerful and easy to work with. Basically, it's a distributed version of the classic Model-View-Controller architecture, where the phone is the view, the App Engine service is the controller, and whatever data you're accessing is the model. This lets you do all the heavy-lifting computation on the server side, which is where it belongs, and keep the phone (and its puny processor) focused almost purely on the UI.
I do have some reservations about the BigTable data store that App Engine uses, but they don't apply to projects like this, with relatively simple storage requirements and no data mining.
I wrote it in, hrmm, about six weeks all told, starting in July. (Obviously it's been much more than six weeks since then, but I had full-time work starting August so could only work on this in fits and spurts on the side.)
Anyway - the app is in pretty good shape, but there's more work to be done on the server side, so it's still basically in beta test. Take a look, download it, play around, and let me know what you think -
Labels: AppEngine, Apple, AppStore, BigTable, iPhone, iTravel, iTravelFree, python, Wikitravel
Monday, June 1, 2009
zipme, baby
So I've finished the first crude version of my middleware and uploaded it to AppEngine. Looks like I'm going to have to upload the indexes by hand, though - I ran it through its paces before uploading it, but got a NeedIndexError: no matching index found when I tried to run it on appspot.com. Oh well. No biggie. I'll do that tomorrow and then go back to my Android app.
But mostly I wanted to tell you about a fun little app called zipme. AppEngine doesn't let you directly examine the source of the files you've uploaded. However, someone named "manatlan" wrote "zipme", a single python file that you add to your root directory so that you can subsequently download the entirety of your source code, zipped, from AppEngine. See here. (It's configured so you have to be logged in as admin, in case you don't wanna show your source to the world...)
eta: spoke too soon - the indexes are now up n' running. However, JavaScript form handling is not. Well, this is why you deploy early and deploy often, so that you don't get bit by it at the last minute. Goin' on a bug hunt, brb...
etaa: in case you're curious, I realized after five minutes that the culprit was neither AppEngine deployment nor my code: it was the NoScript in my browser.
But mostly I wanted to tell you about a fun little app called zipme. AppEngine doesn't let you directly examine the source of the files you've uploaded. However, someone named "manatlan" wrote "zipme", a single python file that you add to your root directory so that you can subsequently download the entirety of your source code, zipped, from AppEngine. See here. (It's configured so you have to be logged in as admin, in case you don't wanna show your source to the world...)
eta: spoke too soon - the indexes are now up n' running. However, JavaScript form handling is not. Well, this is why you deploy early and deploy often, so that you don't get bit by it at the last minute. Goin' on a bug hunt, brb...
etaa: in case you're curious, I realized after five minutes that the culprit was neither AppEngine deployment nor my code: it was the NoScript in my browser.
Labels: AppEngine, python, zipme
Saturday, May 30, 2009
a satisfied customer
I strongly endorse (at least thus far) the GAEUnit Google AppEngine unit-testing framework, available (for free) here, and insanely easy to use.
Also, don't be a total dummkopf like me and name your initial test file "unittest.py". Insert headdesk sounds here. Fortunately I realized the problem after a mere five minutes of staring at bewildering error messages.
Also, don't be a total dummkopf like me and name your initial test file "unittest.py". Insert headdesk sounds here. Fortunately I realized the problem after a mere five minutes of staring at bewildering error messages.
Labels: AppEngine, gaeunit, python, testing
Wednesday, May 27, 2009
On cloud computing nine. Well, maybe three.
I've finished the bare-bones functionality of the Android app I'm working on, and have gone back to the AppEngine middleware. The purpose of the app, in case you're curious, is to make it easy to update WikiTravel from your phone, complete with location data and/or a picture.
AppEngine is remarkably easy to work with. And as my friend Martin pointed out, it's now available in both Java and Python (although the Java is still in "Early Look" status.) I'm working in Python for the sake of variety, and also because the advantages of Java in this context are not immediately obvious.
Anyway, I don't have as much Python experience as Java, but that hardly matters, because it's all very straightforward. I'm having a permissions problem getting the location of an IFrame, which means I may have to make things a little more annoying for the user than I'd like, but that's an XSS browser-security issue not a development issue.
The basics of a web application - getting data from the database and request, displaying it to the page, and saving it as and when needed - are all perfectly straightforward; so much so that I'm not even going to bother posting any code here, for once, because none of it seems particularly interesting. Which is a good thing. It means can focus on what you want to do, unlike the bad old days, where you spent a hefty fraction of your time worrying about how you're going to do it.
(I'm sure I'll hit such a wall at some point, and fear not, when I do I will whine about it at logorrheic length.)
AppEngine also gives you lots of freebies. Sending emails easily, for once. Scalability and data integrity, for two. Goodbye, J2EE deployment descriptors; that heavy lifting now happens pretty much behind the scenes, although you do have to group objects affected by single database transactions together in advance.
You also get automatic seamless user handling, so long as you use Google Accounts as your userbase. Caveat; it's easy to link to a login page, but I haven't quite worked out how to integrate a login form into your own pages. Even so, this is pretty brilliant. It means your site comes with all the user headaches - login, logout, password reminder, sending them emails, etc. - pre-handled, saving you time and grief. It also means that if you use it, which I am for the sake of convenience, you lock yourself even further into Google's infrastructure, and expand the tentacular remit of Google Accounts. Good thing they're not evil, eh?
AppEngine is remarkably easy to work with. And as my friend Martin pointed out, it's now available in both Java and Python (although the Java is still in "Early Look" status.) I'm working in Python for the sake of variety, and also because the advantages of Java in this context are not immediately obvious.
Anyway, I don't have as much Python experience as Java, but that hardly matters, because it's all very straightforward. I'm having a permissions problem getting the location of an IFrame, which means I may have to make things a little more annoying for the user than I'd like, but that's an XSS browser-security issue not a development issue.
The basics of a web application - getting data from the database and request, displaying it to the page, and saving it as and when needed - are all perfectly straightforward; so much so that I'm not even going to bother posting any code here, for once, because none of it seems particularly interesting. Which is a good thing. It means can focus on what you want to do, unlike the bad old days, where you spent a hefty fraction of your time worrying about how you're going to do it.
(I'm sure I'll hit such a wall at some point, and fear not, when I do I will whine about it at logorrheic length.)
AppEngine also gives you lots of freebies. Sending emails easily, for once. Scalability and data integrity, for two. Goodbye, J2EE deployment descriptors; that heavy lifting now happens pretty much behind the scenes, although you do have to group objects affected by single database transactions together in advance.
You also get automatic seamless user handling, so long as you use Google Accounts as your userbase. Caveat; it's easy to link to a login page, but I haven't quite worked out how to integrate a login form into your own pages. Even so, this is pretty brilliant. It means your site comes with all the user headaches - login, logout, password reminder, sending them emails, etc. - pre-handled, saving you time and grief. It also means that if you use it, which I am for the sake of convenience, you lock yourself even further into Google's infrastructure, and expand the tentacular remit of Google Accounts. Good thing they're not evil, eh?
Labels: AppEngine, Google, Java, python
Friday, May 22, 2009
Well, that was easy.
So I created a new Google AppEngine project in Python: a very simple one, which just takes a particular HTTP POST request, stores its values to the datastore, and displays them on-screen. Then I added a "upload" function to my Android's DbHelper class, and connected the latter to the former.
I expected the debugging to be messy and lengthy. But whaddaya know? All I had to do was add the INTERNET permission to my AndroidManifest.xml, and correct the URL that I was pointing to (I'm running the AppEngine app locally; the Android emulator has a special IP address, 10.0.2.2, to connect to its host machine) and poof, amazingly, It Just Worked.
Here's the Android code, in case anyone needs an example:
I'm not even going to bother posting the server code, as it's so simple; 70 lines of Python, and 30 lines of HTML.
You'll note at the moment I just upload a picture URL, rather than actual data, but I'm going to move to doing the latter eventually.
I expected the debugging to be messy and lengthy. But whaddaya know? All I had to do was add the INTERNET permission to my AndroidManifest.xml, and correct the URL that I was pointing to (I'm running the AppEngine app locally; the Android emulator has a special IP address, 10.0.2.2, to connect to its host machine) and poof, amazingly, It Just Worked.
Here's the Android code, in case anyone needs an example:
public boolean uploadNote(long rowId, String location, String title, String comments, String picturePath) {
try {
DefaultHttpClient httpclient = new DefaultHttpClient();
httpclient.getParams().setParameter("http.useragent", Util.AppName);
HttpPost httpost = new HttpPost(Util.wtwSite); // temporarily 10.0.2.2:8080/sendUpdate
Log.i(""+this, "Preparing to post to "+Util.wtwSite);
ListparamList = new LinkedList ();
paramList.add(new BasicNameValuePair("email", Util.GetUserEmail()));
paramList.add(new BasicNameValuePair("title", title));
paramList.add(new BasicNameValuePair("location", location));
paramList.add(new BasicNameValuePair("comments", comments));
paramList.add(new BasicNameValuePair("image", picturePath));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(paramList,
HTTP.DEFAULT_CONTENT_CHARSET);
httpost.setEntity(entity);
HttpResponse response = httpclient.execute(httpost);
Log.i(""+this, "Sent POST, got " + response.getStatusLine());
entity.consumeContent();
markUploaded(rowId);
return true;
}
catch (IOException ex)
{
Log.e(""+this, "Could not upload note with row "+rowId+ " due to "+ex, ex);
return false;
}
}
I'm not even going to bother posting the server code, as it's so simple; 70 lines of Python, and 30 lines of HTML.
You'll note at the moment I just upload a picture URL, rather than actual data, but I'm going to move to doing the latter eventually.
Labels: Android, AndroidManifest, AppEngine, Java, python, upload
Subscribe to Posts [Atom]