技術メモ

2008年8月12日火曜日

spreadsheet からの GAE への upload

GAE SDK付属のdemo/guestbook.py をベースに spreadsheet.google.com のデータを
GAE Cloud の datastore に upload する script を作成

参考ファイルはこちら guestbook.py

今回程度のレコードサイズのデータの場合、read は 150 レコード、 put (write)は　20～30レコード程度毎の処理としました。

利用方法

まず、spreadsheet のデータを読み込みます。
- 今回は対象は DOW の株価データで 960 レコード程度です。
- SDK 環境では一度にすべての spreadsheet のデータを読み込むことができましたが、
  cloud では 100 行程度 ( 200行はエラー ) ごとに複数のシートにデータを分割
- "read" とし、シートの key と　シート番号を指定
- 　1 sheet 分のデータを Memcache に読み込みます。

読み込んだデータを GAE Cloud の datastore に書き込みます。
- "put" を指定
- 100 行一度に書き込もうとするとエラーになりますので、今回の場合、20行程度ごとに
  区切り、 offset をかけながら、何度かに分けて書き込み処理を行ないました。
offset は自動的に加算するようにしましたので、[Sign Guest]を繰り返しクリック
１ sheet 分のデータの upload が完了したら Memcache をクリア
次のシート番号を指定し、同様の作業を繰り返します。

2008年8月11日月曜日

Memcache API

http://code.google.com/appengine/docs/memcache/

日本語訳　http://d.hatena.ne.jp/technohippy/20080717#1216393318

memcache.add( key, value, time=xx, min_compress_len=0)

time　設定した有効時間(単位秒)以後は再度、検索処理を行なう: Optional expiration time, either relative number of seconds from current time (up to 1 month), or an absolute Unix epoch time. By default, items never expire, though items may be evicted due to memory pressure. Float values will be rounded up to the nearest whole second.

set()
set_multi()
get()
get_multi()
delete()
delete_multi()
add()
replace()
incr()
decr()
flush_all()
get_stats()
ex.
{'hits': 34, 'items': 5, 'bytes': 45587, 'oldest_item_age': 1800, 'misses': 21, 'byte_hits': 946479}

2008年8月4日月曜日

Debug　デバッグ

quick and dirty: きれいではなけれども簡単な方法(開発環境用)
* Cloud ではログにエラー出力されるの危険

import sys
print >>sys.stderr, "xxxxxxx"

正しくは

import logging
logging.debug("xxxxxxx")

def main():
# Set the logging level in the main function
# See the section on Requests and App Caching for information on how
# App Engine reuses your request handlers when you specify a main function
logging.getLogger().setLevel(logging.DEBUG)
application = webapp.WSGIApplication([('/', MainPage),
                                      ('/sign', Guestbook)],
                                     debug=True)
webapp.util.run_wsgi_app(application)

if __name__ = '__main__':
main()

参考
http://code.google.com/appengine/articles/logging.html
・http://code.google.com/appengine/docs/python/logging.html
・http://groups.google.com/group/google-appengine/browse_thread/thread/a67752ac402bb21e/345e203a5bdd0750?lnk=gst&q=debug+#345e203a5bdd0750
・Django Middleware で Traceback をコンソールに出力する
http://yamashita.dyndns.org/blog/django-middleware-traceback/

2008年7月23日水曜日

UnicodeEncodeError と webapp.RequestHandler

class MainPage(webapp.RequestHandler):
　の中で
str.decode("utf-8",'ignore')
　とすると
UnicodeEncodeError
　となるようなのですが、
webapp.RequestHandler　を介さない別のところで処理している場合、
エラーは発生していない。

UnicodeEncodeError　はまだよく理解できていないところがありますが。

メモ

UnicodeEncodeError: 'ascii' codec can't encode characters in position 422-424
: ordinal not in range(128)

Unicode 文字列をバイト列に符号化 (encode) するとき， range(128) つまり 0 から 127 までの文字コードしか扱えない 'ascii' エンコーディングが使われ，日本語文字に対して例外 UnicodeEncodeError が起こったもの
http://www.okisoft.co.jp/esc/cygwin-15a.html

webapp を調べたよ (前編)　- Google App Engine
http://d.hatena.ne.jp/hamatsu1974/20080422/1208802967

2008年7月17日木曜日

DeadlineExceededError

DeadlineExceededError のエラー対応を検討

対応例
http://stage.vambenepe.com/archives/category/implementation

結果　google/appengine/runtime/apiproxy.py　を参考に

from google.appengine import runtime
from google.appengine.runtime import apiproxy_errors
from google3.apphosting.runtime import _apphosting_runtime___python__apiproxy

これらを import して対応

except runtime.DeadlineExceededError:

File "/base/python_lib/versions/1/google/appengine/runtime/apiproxy.py", line 161, in Wait
  rpc_completed = _apphosting_runtime___python__apiproxy.Wait(self)
DeadlineExceededError

だけでなく CancelleError というものもある。

File "/base/python_lib/versions/1/google/appengine/runtime/apiproxy.py", line 189, in CheckSuccess
  raise self.exception
CancelledError: The API call datastore_v3.Count() was explicitly cancelled.

2008年7月8日火曜日

Key Limitations: 500 bytes

key_name を設定すると、その分 key が長くなる。
parent を設定すると、親の key が自分の key の先頭に付く。
parent が parent を持っていると、孫の key は長くなる。
限界が 500 byte

従って、当然、親を設定し、さらにこれに親が設定されていれば
ancestor is 　により祖先を検索条件に指定することができる。
b = db.GqlQuery("select * from Blog where ancestor is :1 ", granpa.key())

Tools for storing data: Keys

* Key corresponds to the Bigtable row for an Entity
* Bigtable accessible as a distributed hashtable
* Get() by Key: Very fast! No scanning, just copying data

* Limitations:
o Only one ID or key_name per Entity
o Cannot change ID or key_name later
o 500 bytes

2008年7月7日月曜日

Building Scalable Web Applications/Building a Blog


from google.appengine.ext import db

class BlogIndex(db.Model):
 max_index = db.IntegerProperty(required=True,default=0)

class BlogEntry(db.Model):
 index = db.IntegerProperty(required=True)
 title = db.StringProperty(required=True)
 body  = db.TextProperty(required=True)


def post_entry(blogname, title, body):
 def txn():
   blog_index = BlogIndex.get_by_key_name(blogname)
   if blog_index is None:
     blog_index = BlogIndex(key_name=blogname)
   new_index = blog_index.max_index
   blog_index.max_index += 1
   blog_index.put()
   new_entry = BlogEntry(key_name=blogname + str(new_index),parent=blog_index, index=new_index,title=title, body=body)
   new_entry.put()
 db.run_in_transaction(txn)

def get_entry(blogname, index):
 entry = BlogEntry.get_by_key_name(blogname + str(index),parent=Key.from_path('BlogIndex',blogname))
 return entry


def get_entries(start_index):
 extra = None
 if start_index is None:
   entries = BlogEntry.gql(
       'ORDER BY index DESC').fetch(POSTS_PER_PAGE + 1)
 else:
   start_index = int(start_index)
   entries = BlogEntry.gql(
       'WHERE index <= :1 ORDER BY index DESC',         start_index).fetch(POSTS_PER_PAGE + 1)   if len(entries) > POSTS_PER_PAGE:
   extra = entries[-1]
   entries = entries[:POSTS_PER_PAGE]
 return entries, extra

SQL and GQL　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　TOP

is null	from google.appengine.ext import db r = db.GqlQuery ("SELECT * FROM model WHERE property=:1",None) # None is the python Null. / see
count(*)	r.count()
like	r = db.GqlQuery("SELECT * FROM model WHERE property >= :1 and property < :2 ", search_key, urllib.unquote(search_key).decode("utf8") + u"\uFFFD" )
update	see
!=	!= # see
date	yymm = '2008-05-10 22:22:22' # see ydate = datetime.datetime.strptime(yymm, '%Y-%m-%d %H:%M:%S')
日付検索	from datetime import * import datetime d1 = datetime.datetime.strptime('2008-06-01', '%Y-%m-%d') d2 = d1 + timedelta(days=10) r = db.GqlQuery("select * from model where date >=:1 and date <:2 ",d1,d2) 　for rr in r: print datetime.datetime.strftime(rr,'%Y-%m-%d %H:%M:%S') Dates and Times
	datastore viewer http://localhost:8080/_ah/admin/datastore?kind=StockSum&order=-nikkei_max&order_type=float&num=100&start=0
reference	see (back-references), Many-to-many Join
key, key_name ,id	Key names and IDs cannot be used like property values × select * from Greeting where key = "xxxx" × select * from Greeting where id = xxx ○ r = Greeting.get(db.Key.from_path('Greeting', id)) # or key_name ○ r = db.get("agpoZWxsb3dvcmxkcgsLEgRCbG9nGNQBDA") key = r.key() id = r.key().id()