技術メモ: 6月 2008

2008年6月30日月曜日

Model.get_or_insert(key_name, **kwds)

文字どうり、 get_or_insert する。
存在しなけれ　r.put() 　してくれる。
key_name は省略できない。　key_name は、先頭が数字は不可。
同時に実行されたら　~~２レコード作成されてしまうかもしれない。~~

r = Greeting.get_or_insert("Key001", Title="test",  xxx = xxx, .... )

Model.get_or_insert(key_name, **kwds)

Get or create an entity of the model's kind with the given key name, using a single transaction. The transaction ensures that if two users attempt to get-or-insert the entity with the given name simultaneously, then both users will have a model instance that refers to the entity, regardless of which process created it.

Arguments:

key_name: The name for the key of the entity
**kwds: Keyword arguments to pass to the model class if an instance with the specified key name doesn't exist. The parent argument is required if the desired entity has a parent.

The method returns an instance of the model class that represents the requested entity, whether it existed or was created by the method. As with all datastore operations, this method can raise a TransactionFailedError if the transaction could not be completed.

http://code.google.com/appengine/docs/datastore/modelclass.html#Model_get_or_insert

key_name は長いものはだめだったような気がしていたけれども、間違いだった。
ただ、やはり where 句の検索条件には利用できない。

A key_name is stored as a Unicode string (with str values converted as ASCII text). A key_name must not start with a number.

Tip: Key names and IDs cannot be used like property values in queries. However, you can use a named key, then store the name as a property. You could do something similar with numeric IDs by storing the object to assign the ID, getting the ID value using obj.key().id(), setting the property with the ID, then storing the object again.

http://code.google.com/appengine/docs/datastore/keysandentitygroups.html

親に子を追加


p = db.GqlQuery("select * from Oya_model")[0]
r = Ko_model.get_or_insert("Key001",parent=p,att1=200801,...)

子から親を検索　親が見つかったら、その kind と key を表示


pp = db.GqlQuery("select * from Ko_Model")
for p in pp:
if p.parent():
  print p.parent().kind(),  p.parent_key()

Model.kind() の意味がようやくわかった。
http://code.google.com/appengine/docs/datastore/modelclass.html#Model_kind

2008年6月29日日曜日

don't add support for other languages の気持

はじめは　PHP でなくて Python か、と思っていましたが、以下の表題に同意するといいいますか、
なとなく気持ちがわかるような気がしています。

Please don't add support for other languages.
http://groups.google.com/group/google-appengine/t/93d7fff1f3cfec95?hl=en

Google App Engine Helper for Django を使おうとしてみた

1. Google App Engine Helper for Django を使う
http://mars.shehas.net/~tmatsuo/misc/appengine_helper_for_django-ja.html
を参考に
http://code.google.com/p/google-app-engine-django/
から　appengine_helper_for_django-r30.zip　を download

2. これをテスト用の helloworld フォルダ以下に展開しようかとも、思ったが、そういうことはしてはいけない。( app.yaml ファイルなどもある）

C:\google\appengine_helper_for_django　に展開

3. UNIX の場合 ln -s /path/to/google_appengine .google_appengine

appengine_helper_for_django　は GAE の SDK にアクセスする。
Windows でも ln -s 　できたはずなのでやり方を確認したが　fsutil はファイル単位だけのサポートでフォルダには対応していないので、ろいろツールは作成されているようですが、
どうも Windows　のシンボリックリンクのようなものは信用できない。
結局、物理的に C:\Program Files\Google\google_appengine を copy。

4. 起動　( 拡張子 .py が c:python25\python.exe に関連づけされている）


C:\google\appengine_helper_for_django>manage.py runserver
WARNING:root:Loading the SDK from the 'google_appengine' subdirectory is now deprecated!
WARNING:root:Please move the SDK to a subdirectory named '.google_appengine' instead.
WARNING:root:See README for further details.
WARNING:root:Could not read datastore data from c:\docume~1\xxx\locals~1\temp\django_google-app-engine-django.datastore
WARNING:root:Could not read datastore data from c:\docume~1\xxx\locals~1\temp\django_google-app-engine-django.datastore.history
WARNING:root:Could not initialize images API; you are likely missing the Python "PIL" module. ImportError: No module named PIL
INFO:root:Server: appengine.google.com
INFO:root:Checking for updates to the SDK.
INFO:root:The SDK is up to date.
WARNING:root:Could not read datastore data from c:\docume~1\xxx\locals~1\temp\django_google-app-engine-django.datastore
WARNING:root:Could not read datastore data from c:\docume~1\xxx\locals~1\temp\django_google-app-engine-django.datastore.history
WARNING:root:Could not initialize images API; you are likely missing the Python "PIL" module. ImportError: No module named PIL
INFO:root:Running application google-app-engine-django on port 8080: http://localhost:8080

5. 起動されたようなのでアクセスしてみる　http://localhost:8080/

Welcome to Django
It worked!
Congratulations on your first Django-powered page.

Of course, you haven't actually done any work yet. Here's what to do next:

    * If you plan to use a database, edit the DATABASE_* settings in settings/settings.py.
    * Start your first app by running python settings/manage.py startapp [appname]. 

You're seeing this message because you have DEBUG = True in your Django settings file and you haven't configured any URLs. Get to work!

6. manage.py startapp polls を実行、コンソールにあいかわらず警告がでるが、以下が作成される。
C:\google\appengine_helper_for_django\polls
polls/
__init__.py
models.py
views.py

7. モデルの作成
polls/models.py


from django.db import models

# Create your models here.
from appengine_django.models import BaseModel
from google.appengine.ext import db

class Poll(BaseModel):
question = db.StringProperty()
pub_date = db.DateTimeProperty('date published')

class Choice(BaseModel):
poll = db.ReferenceProperty(Poll)
choice = db.StringProperty()
votes = db.IntegerProperty()

8. settings.py の INSTALLED_APPS にpolls を追加


INSTALLED_APPS = (
'appengine_django',
'django.contrib.auth',
'polls',
#    'django.contrib.contenttypes',
#    'django.contrib.sessions',
#    'django.contrib.sites',
)

9. manage.py shell


＞＞＞ class Poll(BaseModel):
...    question = db.StringProperty()
...    pub_date = db.DateTimeProperty('date published')
...
Traceback (most recent call last):
File "", line 1, in 
File "C:\google\appengine_helper_for_django\appengine_django\models.py", line　109,in __new__
new_class._meta = ModelOptions(new_class)
File "C:\google\appengine_helper_for_django\appengine_django\models.py", line 49, in __init__
model_module = sys.modules[cls.__module__]
KeyError: '__console__'
＞＞＞

Django をみてみないと。テンプレート機能だけしか使っていない。
以前、 An example of using Django on top of App Engine　はみたのだけれども。

2008年6月27日金曜日

Key.from_path( kind, id ),　key_name is not unique in Kind

http://code.google.com/appengine/docs/datastore/keyclass.html#Key_from_path

ある kind のデータの id (or key_name)がわかっているとき
　greeting = db.get(db.Key.from_path('Greeting', int(id)))
のような使い方ができるが、さらに、その親の条件をつけることができる。

k = Key.from_path('User', 'Boris', 'Address', 9876)
 User:親の kind 名　　Boris はこの kind の key_name(=id)

親が異なれば、key_name は同じものをいくつも登録することができる。
つまり、 kind のなかで key_name は必ずしも Unique ではない。

ちなみに key_name は小文字


NewSection = Section(parent=band, Title="Net", URL="News21", Type="News",key_name="k2")
NewSection.put()

"where ANCESTOR IS :1", oya_key

http://googleappengine.blogspot.com/2008/04/posted-by-ken-ashcraft-software.html

Avoid large entity groups. Any two entities that share a common ancestor belong to the same entity groups. All writes to an entity group are sequential, so large entity groups can bog down popular apps quickly if there are a lot of writes to that group. Instead, use small, localized groups in your design.

#490 のようなbugはあるようだけれども


NewBand = Band(Name="Test", URL="Test", Description="A test description")
NewBand.put()

NewSection = Section(parent=NewBand, Title="News", URL="News", Type="News")
NewSection.put()

子のput ()の際に parent を設定しておけば、ancector is に親キーを指定して、検索する
ことができる。


oya_key = db.GqlQuery("select * from Band")[0].key()
ko = db.GqlQuery("select * from Section where ANCESTOR IS :1", oya_key )

http://groups.google.com/group/google-appengine/browse_thread/thread/ad5175429f2f61f2
App works locally, but fails when uploaded
x = Section.all().ancestor(self).filter("URL = ", SectionURL).get

Property Names

name Property は alias ということ。


from google.appengine.ext import db

class MyModel(db.Model):
  obj_key = db.StringProperty(name="key11")
  content = db.StringProperty()

r = db.GqlQuery("select * from MyModel")
for rr in r:
  print rr.content , rr.obj_key, key11
---
test None test

Transactions

Docs > Datastore API > Transactions

from google.appengine.ext import db

class Accumulator(db.Model):
　counter = db.IntegerProperty()
　a = Accumulator()
　a.counter = 0
　a.put()

def increment_counter(key, amount):
　obj = db.get(key)
　obj.counter += amount
　obj.put()

q = db.GqlQuery("SELECT * FROM Accumulator")
acc = q.get()
db.run_in_transaction(increment_counter,acc.key(), 5)
r = db.GqlQuery("select * from Accumulator" )

for rr in r:
　print rr.counter
----
5

2008年6月26日木曜日

Too Many Versions (403) The application already has the maximum number of versions.

2008-06-26 23:13:09,546 ERROR appcfg.py:1128 An unexpected error occurred. Aborting.
Rolling back the update.
Error 403: --- begin server output ---

Too Many Versions (403)
The application already has the maximum number of versions.
--- end server output ---

http://groups.google.com/group/google-appengine/browse_thread/thread/5bafb18014ca366a
issue として登録されているが
http://code.google.com/p/googleappengine/issues/detail?id=212&q=too%20many%20version&colspec=ID%20Type%20Status%20Priority%20Stars%20Owner%20Summary

application-id を変更するのは危険なので、しばらく静観。

http://groups.google.com/group/google-appengine/browse_thread/thread/24870ee8ecdfa34d/116c890c63876919#116c890c63876919

Fixed, i have to manually delete all the other version by dashboard,
than it started to work again fine...

一時はこの application-id はあきらめなければ
ならないのかとも思いましたが。

　バージョン情報を削除して復活です。

2008年6月25日水曜日

ascending single-property indexes are not necessary


Uploading index definitions.
Error 400: --- begin server output ---
Creating a composite index failed: ascending single-property indexes are not necessary
--- end server output ---

2008年6月24日火曜日

no matching index found

SDK 環境では徐々にデータを追加していったためか、あまり発生しなかったエラー

Cloud には bulkupload したためか、no matching index found　エラーが多発する。
index.yaml に指定の内容を追加して update して対応。

2008年6月23日月曜日

DeadlineExceededError

削除は手動で画面を Reload して繰り返し実行すればいいが


r = db.GqlQuery("select * from Stock limit 100")
for rr in r:
　rr.delete()

後方一致 ( like *key_word ) の場合、offset で移動させた。

r = db.GqlQuery("select * from Model limit 100 offset " +  str(offset) )
start = int(offset) + 100
self.response.out.write('<a href="/xxx?offset=%s">del</a>' % str(start) )
for rr in r:
　if rr.trackback_url.find('</td></tr>') <>-1:
　・・・

プログラムで LOOP させて offset を自動処理しては DeadlineExceededError　はさけられず、
意味がない。

このあたり　how should I clean up inside the datastore?
http://groups.google.com/group/google-appengine/browse_thread/thread/da854121e242755a?hl=en
でも話題になっているので、そのうち trauncate table のようなコマンドは用意されるか。
制限事項が多く、少々うんざり気味。

でも、RDBMS でも Undo 領域を必要としない truncate table ができるまでは Oracle などよく　delete では「Rollbackセグメントが足りません」といってデータを思うように削除させてくれず、仕方がないので条件指定して少しづつ削除したこともあった。
Rollback などできなくていいから、まとめて削除させて欲しいと思いつつ。

untrusted app XXX cannot access app helloworld's data

BadRequestError: untrusted app XXX cannot access app helloworld's data

Reference Property の値( =Key ) をbulkupload
・・・
('stock', datastore_types.Key),
・・・
しようとして発生。

仮に upload できたとしても、意味がないのでコメントアウト。

max execution time

put にしろ fetch にしろ１回の処理に制限がある。

bulkload は 100 件
fetch は 1,000 件
delete も 100 件程度


> Is there any limit of execution time of scripts? If there is, how long
> is this max execution time for a function?

A few seconds, eight or ten seconds at the most.

> Is there any way to set the max_execution_time or functions like
> set_time_limit() in php?
I don't think so.

> If I want my script to execut in a long time to finish its job, maybe
> 20 mininuts, just looks like a service,  is there any way in App
> Engine ?

Nope.  Google App Engine is only designed for applications that can
respond to requests in under 300 ms or so.

http://groups.google.com/group/google-appengine/browse_thread/thread/08548c43d692af63?hl=en

2008年6月22日日曜日

fetch() returns a maximum of 1000 results

SDK 環境ではこの
1000件以上のデータがあっても、1000件までしか、検索結果を得ることができない
などという制限はなかったので、
実際にようやく bulkloader でデータを投入してみてはじめて気がついた。
SELECT * FROM model limit 100 offset 1000
のようなことをすれば、1000 の先のデータにアクセスできるのか？　とも思ったが、そんなことはできなかった。

http://code.google.com/appengine/docs/datastore/gqlqueryclass.html

The query has performance characteristics that correspond linearly with the offset amount plus the limit.

Note: fetch() returns a maximum of 1000 results. If more than 1000 entities match the query, and either no limit is specified or a limit larger than 1000 is used, only the first 1000 results are returned by fetch().

Google I/O session videos posted with slides の
Working with Google App Engine Models に

　Store counts を推奨するようなことが書かれて
いたので、ちょっと気になってはいたがけれども。
1000 件以内に収まる条件で管理しないと
どこまでデータがあるのかさえわからなくなるとは。

2008年6月20日金曜日

Importing UTF-8 Data with Bulkloader

Importing UTF-8 Data
http://groups.google.com/group/google-appengine/browse_thread/thread/d4cf3013483220b5
を参考にいろいろ試したが、以下のパターンで UTF-8 が登録できたが、
str とタイプが決まってしまて、　datastore_types.Text　にできなかったので
500 バイト以上の UTF-8 データについては別途検討。

myloader.py


#from google.appengine.ext import bulkload
import bulkload
・・・
class PersonLoader(bulkload.Loader):
 def __init__(self):
   # Our 'Person' entity contains a name string and an email
   bulkload.Loader.__init__(self, 'Person',
             [
#  　　　　   ('name', str),
 　　　　　  ('name', lambda x: unicode(x,'utf-8')),
             ('email', datastore_types.Email),
             ])

bulkload をフォルダごとProjectフォルダにコピーして、__init__.py を加工。
bulkload/__init__.py

・・・
   buffer = StringIO.StringIO(data.encode('utf-8'))
・・・

('name', lambda x: unicode(x,'utf-8'))　をやめて、以下を追加すると
日本語がきれいに抜けて、英数字だけになる。

#     val =  unicode(val, errors='ignore')
     entity[name] = converter(val)

bulkload/init.py

c:/Program Files/Google/google_appengine/google/appengine/ext/bulkload/
__init__.py


"""A mix-in handler for bulk loading data into an application.

For complete documentation, see the Tools and Libraries section of the
documentation.

To use this in your app, first write a script, e.g. bulkload.py, that
instantiates a Loader for each entity kind you want to import and call
bulkload.main(instance). For example:

person = bulkload.Loader(
  'Person',
  [('name', str),
   ('email', datastore_types.Email),
   ('birthdate', lambda x: datetime.datetime.fromtimestamp(float(x))),
  ])

if __name__ == '__main__':
  bulkload.main(person)

See the Loader class for more information. Then, add a handler for it in your
app.yaml, e.g.:

  urlmap:
  - regex: /load
    handler:
      type: 1
      path: bulkload.py
      requires_login: true
      admin_only: true

Finally, deploy your app and run bulkload_client.py. For example, to load the
file people.csv into a dev_appserver running on your local machine:

./bulkload_client.py --filename people.csv --kind Person --cookie ... \
                     --url http://localhost:8080/load

The kind parameter is used to look up the Loader instance that will be used.
The bulkload handler should usually be admin_only, so that non-admins can't use
the shell to modify your app's data. The bulkload client uses the cookie
parameter to piggyback its HTTP requests on your login session. A GET request
to the URL specified for your bulkload script will give you a cookie parameter
you can use (/load in the example above).  If your bulkload handler is not
admin_only, you may omit the cookie parameter.

If you want to do extra processing before the entities are stored, you can
subclass Loader and override HandleEntity. HandleEntity is called once with
each entity that is imported from the CSV data. You can return one or more
entities from HandleEntity to be stored in its place, or None if nothing
should be stored.

For example, this loads calendar events and stores them as
datastore_entities.Event entities. It also populates their author field with a
reference to the corresponding datastore_entites.Contact entity. If no Contact
entity exists yet for the given author, it creates one and stores it first.

class EventLoader(bulkload.Loader):
  def __init__(self):
    EventLoader.__init__(self, 'Event',
                         [('title', str),
                          ('creator', str),
                          ('where', str),
                          ('startTime', lambda x:
                            datetime.datetime.fromtimestamp(float(x))),
                          ])

  def HandleEntity(self, entity):
    event = datastore_entities.Event(entity.title)
    event.update(entity)

    creator = event['creator']
    if creator:
      contact = datastore.Query('Contact', {'title': creator}).Get(1)
      if not contact:
        contact = [datastore_entities.Contact(creator)]
        datastore.Put(contact[0])
      event['author'] = contact[0].key()

    return event

if __name__ == '__main__':
  bulkload.main(EventLoader())
"""

---
    bulkload.Loader.__init__(self, 'Greeting',
           [
                          ('author',   datastore_types.users.User ),
     ('content', str ), 
     ('curl1',   str ), 
     ('cmapinfo', datastore_types.Text), 
     ('date',    lambda x: datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S')),
     ('dateJST', lambda x: datetime.datetime.strptime(x,'%Y-%m-%d %H:%M:%S')),
           ])

--

C:\google\bulkload>c:\google\bulkload\bulkload_client.py --filename greeting_test.csv --kind Greeting --url http://xxxx.appspot.com/load02

2008年6月19日木曜日

model.properties()


kind =(Greeting(),Blog() )

for k in kind:

  print "DROP TABLE IF EXISTS `" + k.kind() +"`;"
  print "Create table `" + k.kind() + "` ("
  print "    `key`        char(36) not null,"
  print "    `id`        int,"
  print "    `key_name`    char(12),"
  p =  k.properties()
  for pp in p:
    attr = str(p[pp]).split('google.appengine.ext.db.')[1]
    attr = attr.split(' object')[0]
    sql_attr = attr
    if attr == "IntegerProperty"    : sql_attr = "int"
    if attr == "StringProperty"     : sql_attr = "varchar(500)"
    if attr == "ReferenceProperty"    : sql_attr = "char(36)"
    if attr == "DateTimeProperty"    : sql_attr = "datetime"
    if attr == "UserProperty"        : sql_attr = "varchar(128)"
    if attr == "TextProperty"        : sql_attr = "text"
    print "`" + pp + "`" + "    " + sql_attr + ",    #" + attr
  print ") #  remove last `,` !!!"

Local or Cloud

ようやく Cloud に「Create and Application」できた。
しかし app.yaml の application: の部分を変更して、起動したことで
Local の datastore が初期化されてしまった。
バックアップがあるので、２，３日前の状態には戻せるが...

#!-*- coding:utf-8 -*-
import cgi
import wsgiref.handlers
import os

from google.appengine.ext import webapp
class MainPage(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/html'
self.response.out.write(u'Hello, webapp World! こんにちわ<br />' )
self.response.out.write(os.environ["PATH_INFO"]   + "<br />")
self.response.out.write(os.environ["TZ"]          + "<br />" )
self.response.out.write(os.environ["SERVER_NAME"] + "<br />")

-----
Hello, webapp World! こんにちわ
/hw0
UTC
localhost

2008年6月18日水曜日

「Google App Engine」の現況および今後の展開

http://codezine.jp/　の【Google Developer Day 2008】 Google App Engineをおさらいしようより

500MBのストレージ、2GB/日のデータ転送量（500万PV/月 相当とのこと）まで無料
無償サービスについては、今後も継続していく予定
BigTableによる「分散型データストア」
SQLライクなGQLで操作できる。SQLのJOINにあたる機能はサポートしていないため、
少し慣れるのに時間がかかるが、分散環境の恩恵に預かれる。

確かに慣れるのに時間がかかると思う

現在、データベースAPIに含まれていない全文検索については、具体的な対応予定は
立っていない
SSLは今のところサポートしていないが、一部fetch APIなどではSSL接続を行っている
日本では、携帯電話のSMSによる認証関係のバグで試すことができないようだ（6月11日現在）。
これについてGoogleは、おそらく今週から日本でも使えるようになるはずだ

おかげさまでようやく Sign Up することができました。

2008年6月17日火曜日

Datastore Viewer Error

key ="agpoZWxsb3dvcmxkcgsLEgRCbG9nGPckDA"
r = db.get(key)

is OK.

But...
Google\google_appengine\google\appengine\ext\admin\__init__.py", line 542, in input_field
if len(sample_value) > 255 or sample_value.find('\n') >= 0:
TypeError: object of type 'NoneType' has no len()

Odd behaviour in admin interface
http://groups.google.com/group/google-appengine/browse_thread/thread/29cd28daea3c37d3?fwc=1

2008年6月16日月曜日

alter table ...

TextProperty は検索に利用できないのでStringPropertyに変更しようと、
db.Model を修正し、
None で Update したが、新たに Text データを put したところ、また
TextProperty　として表示されてしまった。

それで、
新しい、StringPropertyを追加したが、反映されるタイミングがまだわからない。
すくなくとも新しい Property に No error で put できても、実際には
データどころか、 Property さえ、作成されていなかったりする。

Propertyを追加した場合は、必ず
http://localhost:8080/_ah/admin/datastore/
で確認し、ここで追加部分に値を登録してみておいたほうがよい。

更新処理は時間がかかるので、これが無駄になると、時間の無駄。

class TextProperty()

A long string.
Unlike StringProperty, a TextProperty value can be more than 500 bytes long. However, TextProperty values are not indexed, and cannot be used in filters or sort orders.

2008年6月14日土曜日

Many-to-many JOIN

Many-to-many JOIN のサンプル、Intersection Table にあたるものに Reference Property だけの db.Model を作成することで対応。
back-references を使っているので注意。
def books() 　などは必須ではないと思う。（当初、かえってこれで混乱してしまった）

Google App Engine: [A Better] Many-to-many JOIN
http://blog.arbingersys.com/2008/04/google-app-engine-better-many-to-many.html

ER-Modeling with Google App Engine (updated)
http://daily.profeth.de/search/label/entity%20relationship%20model

Google App Engine　入門6 検索件数
http://webdba.blogspot.com/2008/04/google-app-engine6.html

None　による条件検索

まず

 select * from Kawase where entry_date = None

はエラーとなる。
必ず、バインド変数のように指定しなければならない。


entity = db.GqlQuery("select * from Kawase where  stock = :1", None)
for e in entity:
  print e.entry_date, e.stock

しかし、これで Kawase の stock = None の値がすべて検索できるわけではない。
以下の検索で None となって、検索結果が返ってきているが、上の検索のリストにな該当レコードがない。


entity = db.GqlQuery("select * from Kawase where entry_date = :1 ", e1 )
for e in entity:
  print e.entry_date, e.stock

原因：
　後から Property を追加したような場合、この Property を None で更新した場合、
「"select * from Kawase where stock = :1", None　」に該当するが、そうでないレコード(entities)は値が未登録であっても、この検索には該当しないようである。( SDK 1.1.0)

google/appengine/ext/admin/__init__.py
からしても、Google の Datastore は実際のデータを　get して、解析してみなければ、そのデータの構造がわからない。　RDBでいうところの dictionary テーブルのようなものは存在しない。存在しないものを検索することはできない。
つまり、ある db.Model はどんな、 Property から構成されているかは、事前にはわからない。検索してはじめてわかる。

Docs > Datastore API > Entities and Models の以下の意味がようやく少し解ってきた。
Unlike relational databases, the App Engine datastore does not require that all entities of a given kind have the same properties. The application can specify and enforce its data models using the model API.

2008年6月13日金曜日

２つのdb.Modelの結合、そして back-references とは?

２つのdb.Modelの結合は Reference Property がないと遅い。

Reference Propertyを設定しても、ここに値(Key)が登録されていない場合、結合の際に以下のエラーとなる。（要エラー処理)
AttributeError: 'NoneType' object has no attribute 'entry_date'

1:1 の対応になるのであれば、大きな表形式にするのがやはり正しい。
ただし、１行ごとに fetch しながらの更新には時間がかかる。　
２００レコード程度の処理に１５分かかった。(Windows による開発環境にて)
後から Reference Property にしろ、実際に追加したいデータにしろ、更新処理によりつけ加えるのには非常に時間がかかる。これは覚悟しておく必要がある。

from datetime import *
import datetime
from google.appengine.ext import db

class Stock(db.Model):
  nikkei_ave = db.FloatProperty()
  entry_date = db.DateTimeProperty()
  modified   = db.DateTimeProperty(auto_now=True)
  usd_jpy    = db.FloatProperty()
class Kawase(db.Model):
  author     = db.UserProperty()
  usd_jpy    = db.FloatProperty()
  entry_date = db.DateTimeProperty()
  modified   = db.DateTimeProperty(auto_now=True)
  stock      = db.ReferenceProperty(Stock)

start_time = datetime.datetime.today()
e1 = datetime.datetime.strptime( "2003-08-01" ,'%Y-%m-%d')
e2 = datetime.datetime.strptime( "2003-08-22" ,'%Y-%m-%d')

kawases = db.GqlQuery("SELECT * FROM Kawase where entry_date >=:1 and entry_date <:2 ", e1,e2 )
for kawase in kawases:
  stocks = db.GqlQuery("select * from Stock where entry_date = :1", kawase.entry_date )
  for stock in stocks:
    kawase.stock = stock.key()
    kawase.put()
end_time = datetime.datetime.today()
print end_time - start_time

Kawase(db.Model) 側に Stock を参照するための Reference Property を追加し、ここに対応する Stock(db.Model) の Key を登録しておくと、 Kawase 側から簡単に Stock側の値を結合することができる。
Reference Property が抜けている(未登録)とエラーになるので注意。

entity = db.GqlQuery("select * from Kawase")
for e in entity:
  try:
    print e.entry_date,e.usd_jpy, e.stock. entry_date, e.stock.nikkei_ave
  except AttributeError:
    print  e.entry_date,e.usd_jpy, None,None

entity = db.GqlQuery("select * from Kawase")
for e in entity:
try:
  print e.entry_date,e.usd_jpy, e.stock. entry_date, e.stock.nikkei_ave
except AttributeError:
  print  e.entry_date,e.usd_jpy, None,None

Stock 側からの結合は Stock 1 レコード(entity)対して、複数 Kawase レコードが対応する可能性があるので２段階のループになる。

entity = db.GqlQuery("select * from Stock limit 10")
for e in entity:
  for k in e.kawase_set:
    print e.entry_date,e.nikkei_ave, k.entry_date, k.usd_jpy

Stock と Kawase を entry_date で join するため、Kawase 側に ReferenceProperty を作成した。Master-Detail でいうと Stock が Master側になるわけだが、これからDetail側を参照するために自動的に kawase_set という擬似的なものが作成される。
確かにこれを　back-references　と呼ぶのは仕組みがわかってくると、適切なように思える。

また、back-references　は遅いので注意。
上記の10件の join で 22秒もかかった。(SDKにて。 Kawase 1,935件、Stock 1,842件)

これは原則として　Master側は１画面に１レコードとした使い方としないといけない。

Docs > Datastore API > Entities and Models　で
ReferenceProperty has another handy feature: back-references. When a model has a ReferenceProperty to another model, each referenced entity gets a property whose value is a Query that returns all of the entities of the first model that refer to it.
と説明されている。

2008年6月11日水曜日

グーグルデータセンターの内側

ここまでくるとサーバの保守は生命工学にせまる、最先端の領域となる。

http://japan.cnet.com/special/story/0,2000056049,20374847,00.htm

Googleのような規模でサーバを運営するには、マシンを消耗品として扱う必要がある。サーバメーカーはハイエンドマシンが故障に強いことを誇りにしているが、Googleはフォールトトレラントソフトウェアに資金を投入する方を選んでいる。

やっかいな慣らし運転
各クラスタでは1年目に、1000件の個々のマシン故障が発生するのが一般的だ。ハードドライブ故障は数千件起こる。

Googleは本当に心からマルチコアマシンを気に入っている。われわれにとって、マルチコアマシンは、相互接続に優れたたくさんの小型マシンのようなもので、比較的使いやすい

シングルスレッドのパフォーマンスはGoogleにはまったく重要ではない。Googleには並列化可能な問題がたくさんある

Google成功の秘密
Dean氏は、Googleのソフトウェアの3つの中核となる要素、すなわち「GFS（Google File System）」「BigTable」「MapReduce」アルゴリズムについて説明した。

マシン故障はすべてGFSシステム、少なくともストレージレベルで処理される

すべてのデータに構造を提供するためにGoogleはBigTableを使用している。
Oracle、IBMといったメーカーの市販のデータベースは、 Googleには適さない。
1つには、Googleが要求する規模での運営ができないからだが、
　たとえ可能だとしても、費用がかかりすぎる

Bigtable is a distributed storage system for managing structured data.
MapReduce is a programming model and an associated implementation
　for processing and generating large data sets.

MapReduceは、Googleの持つデータを実質的に活用できるようにするもので、最初のバージョンは2003年に作成された。たとえば、 MapReduceでは、特定の単語がGoogleの検索インデックスに登場した回数、ある単語が表示されているウェブページのリスト、特定のウェブサイトにリンクしているすべてのウェブサイトのリストを確認できる。

フォールトトレラントソフトウェア
言うまでもなく、MapReduceはGFSと同様に、サーバの問題を回避するために開発されたものである。

以前、1800台のサーバで構成されるクラスタでの保守作業中に、本格的なMapReduceの信頼性のテストが行われた。作業担当者が一度に80台のマシンの電源を抜いたところ、残りの1720台のマシンがその穴を埋めた。「動作速度は少し落ちたが、すべて終了した」とDean氏は語る。

次世代データセンターの今後の課題
ほとんどの企業は、ジョブをサーバから別のサーバにスムーズに移動する方法を考えているが、Googleの課題はけたが違う。Googleはジョブをデータセンターから別のデータセンターに移動できるようにしたいと考えている。しかも自動的にだ。

2008年6月6日金曜日

GAE 一括更新

SDK 1.10 から　Gql でも　 != 　がサポートされたが、これは

<=, >= などと同じ扱いなので、　日付で <= 　を使うと、　他の条件で != 　が使えなくなる。

また、 order by などの sort 　も <= 　を利用した場合、その　 property　に限られる。

# RDBMS に比較するといろいろ制約があるが、データが溜まった後での
# 変な苦労からは開放されるか。

検索結果をまた絞ることもできるが、 key() まで対応するのは...。
key は不要で、 referece からみの関連する model がないのであれば対応可能。

結局、property を追加し batch でデータ更新することで対応。
バッチの更新は動作したが、 alter table add column などしないで、
model の定義を書き換えるだけで、
property を追加することができるのはいいが、どうもデータが蓄積された後、
追加したところは不安定な様子で、本体での更新処理がなぜがうまくいかない。

rr = db.GqlQuery("select * from Blog ")

for r in rr:
　　print r.title,r.list_mode, r.open_mode
　　r.list_mode = '0'
　　r.put()

#db.put(rr)
-----

blogs = []
for b in blogs_tmp:
if b.category <> category:
blogs.append({
# key()      : b.key(),
'author'    : b.author,
'title'     : b.title,
'content'   : b.content,
'category'  : b.category,
})

2008年6月4日水曜日

GAE でSpreadsheets Data API　にトライ

C:\Python25\Lib\site-packages\gdata\atom
C:\Python25\Lib\site-packages\gdata\gdata
をプロジェクトフォルダに　copy　して import はできて当然だが、
やはり　gd_client.ProgrammaticLogin() あたりで
http のエラーとなる。

Traceback (most recent call last):
 File "C:\Program Files\Google\google_appengine\google\appengine\ext\webapp\__init__.py", line 499, in __call__
   handler.get(*groups)
 File "C:\google\helloworld\helloworld0.py", line 18, in get
   gd_client.ProgrammaticLogin()
 File "C:\google\helloworld\gdata\service.py", line 301, in ProgrammaticLogin
   content_type='application/x-www-form-urlencoded')
 File "C:\google\helloworld\atom\service.py", line 316, in HttpRequest
   connection.endheaders()
 File "C:\Python25\lib\httplib.py", line 860, in endheaders
   self._send_output()
 File "C:\Python25\lib\httplib.py", line 732, in _send_output
   self.send(msg)
 File "C:\Python25\lib\httplib.py", line 699, in send
   self.connect()
 File "C:\Python25\lib\httplib.py", line 1133, in connect
   sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
AttributeError: 'module' object has no attribute 'socket'

2008年6月3日火曜日

Google Spreadsheets Data API

この概要は表示できません。投稿を閲覧するにはここをクリックしてください。

登録: 投稿 (Atom)

SQL and GQL　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　TOP

is null	from google.appengine.ext import db r = db.GqlQuery ("SELECT * FROM model WHERE property=:1",None) # None is the python Null. / see
count(*)	r.count()
like	r = db.GqlQuery("SELECT * FROM model WHERE property >= :1 and property < :2 ", search_key, urllib.unquote(search_key).decode("utf8") + u"\uFFFD" )
update	see
!=	!= # see
date	yymm = '2008-05-10 22:22:22' # see ydate = datetime.datetime.strptime(yymm, '%Y-%m-%d %H:%M:%S')
日付検索	from datetime import * import datetime d1 = datetime.datetime.strptime('2008-06-01', '%Y-%m-%d') d2 = d1 + timedelta(days=10) r = db.GqlQuery("select * from model where date >=:1 and date <:2 ",d1,d2) 　for rr in r: print datetime.datetime.strftime(rr,'%Y-%m-%d %H:%M:%S') Dates and Times
	datastore viewer http://localhost:8080/_ah/admin/datastore?kind=StockSum&order=-nikkei_max&order_type=float&num=100&start=0
reference	see (back-references), Many-to-many Join
key, key_name ,id	Key names and IDs cannot be used like property values × select * from Greeting where key = "xxxx" × select * from Greeting where id = xxx ○ r = Greeting.get(db.Key.from_path('Greeting', id)) # or key_name ○ r = db.get("agpoZWxsb3dvcmxkcgsLEgRCbG9nGNQBDA") key = r.key() id = r.key().id()