Knowledge Base
  • Introduction
  • Events and Actions
  • Action Panel
  • sending email
  • calling a SQL action from a client side js action
  • Accessing to translations form a server
  • Executing SQL statements from within an action
  • How to invoke a generic SQL statement defined through a SQL action
  • How to show a message dialog
  • checking for "undefined" values
  • How to add spaces to the right of a text
  • How to create a docx report and show it on the web browser Enterprise Edition only
  • How to get or set a value from the graphics control
  • How to invoke a generic SQL query defined through a business component
  • How to remove spaces to the left and right of a text
  • How to support multiple themes in a single application, accoding to a rule
  • How to set content to a Google Map linked to a grid or form
  • How to replace all occurences of a pattern from a text
  • Utility methods
  • Link auto login
  • Creation of a link for the first access of a new user without give the user a password and forcing
  • Forgot password
  • setting up default values from values coming from a filter panel
  • identifing the modified record after the alteration
  • enabling/disabling checkboxes in a grid
  • Filtering a Lookup
  • formatting a column
  • using checkboxes to select rows in grid
  • showing a summary row in grid
  • Disabling a toolbar button
  • Configuring grid exports
  • Adding filter conditions to a grid
  • Filtering the grid content from a tree
  • Filtering the tree content, starting from a filter panel linked to a grid
  • collapsing a panel
  • validating a lookup
  • accessing the authorizations set for a specific grid
  • How to design a web service
  • How to remotelly invoke an action or business component or perform a write operation through a Restf
  • how to feed a grid from a JS business component
  • converting a JS object to a JSON string
  • executing a query
  • passing parameters to a server side JS action
  • return value
  • scheduling and frequency
  • finding the right filter panel
  • checking out if a component has been defined
  • Deploying an application
  • Enquiring a table belonging to the Platform repository
  • Adding a where clause to a business component linked to grid
  • Integrating Mailchimp lists
  • Formatting a number as a currency value to use it inside an email template
  • sending email from a template
  • How to send an email
  • Error 'smtpHost' is empty
  • Linking two windows
  • How to open manually a window from another window
  • How to open manually a popup window
  • How to hide a panel in a window dinamically
  • How to manage folder panels
  • How to manage card panels
  • Predefined variables supported by Platform
  • Accessing the application parameters
  • Application Log
  • How to design a web service
  • How to import java classes in server
  • How to import java classes in server
  • How to dynamically set a value on a combo
  • 4WS.Platform
  • How to listen to events in a mobile HTML panel
  • Issues with HTTPS requests
  • How to manage row totals in grid
  • How to send to the UI a notification to execute code automatically
  • How to filter a chart by date interval
  • How to filter a grid by date interval
  • How to read a text or csv file and save data on the database
  • How to write text or csv files
  • Reading an xls file stored in the specified path
  • How to create a report with Jasper Report
  • How to customize the alert message content
  • Setting up a cluster
  • Uploading and downloading files
  • How to listen to user definition changes
  • How to auto-show a window from login
  • How to manage encrypted fields
  • How to change CSS settings for a grid row
  • Customizing a Tree Panel
  • How to execute complex queries on Google Datastore
  • Theme customization
  • Retrieve and send the log of a mobile app
  • Import Roles and Users
  • How to synchronize multiple Form panels in the same window
  • Anchor buttons
  • Properties of subpanels
  • Bulk import
  • How to display the data not found message in a grid
  • How to setup an LDAP based authentication
  • How to synchronize data from Datastore to BigQuery
  • How to synchronize data from Datastore to Google Spanner
  • How to synchronize data from Datastore to CloudSQL
  • Scrollable form list
  • How to setup SAML authentication
  • How to export data from BigQuery in streaming
  • Update Google Spreadsheet
  • How to setup OAuth2 authentication
Powered by GitBook
On this page
  • Aligning data structure between Datastore (master repository) and BigQuery
  • Waiting for updated data commited on BigQuery
  • Executing queries with aggregated functions
  • Involved costs

Was this helpful?

How to synchronize data from Datastore to BigQuery

PreviousHow to setup an LDAP based authenticationNextHow to synchronize data from Datastore to Google Spanner

Last updated 2 years ago

Was this helpful?

In case of a large number of data to read and analyze (complex queries), it is possible to synchronize data initially stored in an entity of Google Datastore and duplicate it on a corresponding table on BigQuery. Such a destination table must have exactly the same structure (field names and types) of the original one (entity on Datastore).

In order to do it:

  1. First, a BigQuery instance must be configured and a dataset defined. This will be used to host all BigQuery tables to create and enquiry.

  2. The application parameter GOOGLE -> "Synchronize Datastore data with" must be filled in with the BigQuery destination type. That means it is possible to synchronize data from Datastore entities only to this media.

  3. Use the "Duplicate Datastore object to BigQuery" command, available on the Data model list; this feature creates a new data model (having "big query" type) starting from the selected Datastore type model; the new object has the same name and field names/types. If you need to align structural changes applied to the Datastore original object to the new one, you can add additional fields to the destination object.

  4. Use the "Sync data to db" checkbox in the data model definition window for the Datastore object: once checked it, every insert/update/delete operation applied to the Datastore entity will be duplicated in the target relational table; there is not a latency on the Datastore side: the duplication operation is always performed on a separated thread.

  5. Optionally, you can also refine the sync rule and limit data to duplicate, using a filtering condition (e.g. companyId==00000)

  6. Optionally, you can change the default settings for the synchronization process, i.e. Standard and set it to Streaming.

In Standard mode, when inserting/updating records on Datastore, every operation is sent to GAE and enqueued. Operations are then processed sequentially using the DML BigQuery API. Such approach has the limitation of 5 records every 10 seconds and it has the perk of managing insert/update/delete operations in "near real time".

See for more details.

An alternative approach is setting the Streaming mode, where insert/update Datastore operations are changed to insert operations in the BigQuery table and these records cannot be changed (neither updated nor deleted) in real time. Delete operations on Datastore are ignored on BigQuery (not synchronized).

Operations in BigQuery are very fast and they do not require any more a GAE queue to process them.

Only after waiting 30 minutes these records can be removed.

See for more details.

An alternative to the steps 4-5 is using the "Export from Datastore to BigQuery" feature, where you can export the whole content (either as insert or update operations) and schedule this job to be execute on a regular basis, using a scheduled process.

Important note: it would be better to opt for the "Sync data to db" approach, rather than the export, since it synchronize data in real time (i.e. data will be available in terms of seconds on BigQuery).

However, in case of a large amount of data created on Datastore (e.g. tens of thousands records per day), the latency due to BigQuery writing (2-3 seconds per record with the Standard mode) could represent a limitation. In such a scenario, it can be helpful to add a scheduled process using the "Export from Datastore to BigQuery" feature as well, to duplicate a lot of data in a faster way: inserting multiple records as an export is faster than inserting single records through a real time synchronization.

A faster alternative is the Streaming mode described above.

Example of table management with Streaming mode

Let's have a table having 3 fields: ITEM_ID a text field used as a primary key, a QTY numeric field and a datetime field LAST_UPDATE used to store the most recent version of the ITEM_ID.

An insert operation on Datastore would lead to a similar insert operation on the BigQuery table.

After that, the current qty value would be retrieved through a SELECT QTY from the table having the most recent LAST_UPDATE, grouped by ITEM_ID and it would return the only record stored at that time.

Next, an update operation on Datastore would be converted in an insert operation on the BigQuery table.

After that, the current qty value would be retrieved through a SELECT QTY from the table having the most recent LAST_UPDATE, grouped by ITEM_ID and it would return the last inserted record, i.e. the one generated when updating Datastore.

Finally, a delete operation on Datastore would be ignored on the BigQuery side. It is up to the developer to include an additional operation to insert in BigQuery a new record having a qty set to 0. This can be accomplished using the method:

utils.importBigQueryThroughStreaming(
  datasetName,
  tableName,
  [{ itemCode: "...", qty: 0, lastUpdate: utils.getcurrentDateAndTime() }]
);

After that, the current qty value would be retrieved through a SELECT QTY from the table having the most recent LAST_UPDATE, grouped by ITEM_ID and it would return the last inserted record, i.e. the one with a qty set to 0.

Aligning data structure between Datastore (master repository) and BigQuery

In the BigQuery data model definition window, there is an "Align model from Datastore" button, which can be used to copy the Datastore entity definition (fields) to BigQuery and all related objects connected to it (business components, panels). Use it to quickly align the two models.

It also ALTER BigQuery table and add new fields coming from Datastore. This is helpful also when you export->import metadata from one environment to the other.

Waiting for updated data commited on BigQuery

When synchronizing data from Datastore to BigQuery, the data alignment is asynchrounous, since writing single records to BigQuery is a slow operation (it can takes several seconds).

In exceptional cases (which cannot become the daily rule), it is possible to force a synchronous writing of data on BigQuery, when saving data (in insert/update) on Datastore starting from a grid or form. In such a scenario, you have to pass forward to the standard Platform web services (setlist & setdetail) an additional parameter: "&realTime=true" to append to the URL generated when saving data from a grid/form. In order to do it, link a client-side js action to the "before saving data on insert/update" events. The bound action should contain something like:

additionalParams = "&realTime=true";

In this way, the sync operation on BigQuery would be synchronous and retrieved data on grid/form is fresh. Anyway, additional seconds are needed when saving data.

It would be way better to avoid automatic grid/form reloading when saving data on grid/form: this behavior is not needed, since grid/form are already updated behind the scenes with the fresh data coming from the server, through the JSON response.

Executing queries with aggregated functions

In case you need to execute a SQL query not only involving fields of the synchronized table but also aggregated functions like SUM, COUNT, etc. combined with GROUP BY keyword, you can di it using the already existing Platform features:

  • add a virtual field (e.g MY_SUM) to the BigQuery type model, one for each aggregated function; these fields with not be removed in case you "Align model with Datastore" (which only adds fields and never removes them)

  • change the SQL query in your server-side javascript business component and include the corresponding aggregated function and alias (SELECT SUM(*) AS MY_SUM from ...)

  • enable columns in your grid accordingly

Involved costs

BigQuery has a pricing based on (i) occupied space and (ii) number of records analyzed in a query per month. Whereas the first fee is relatively low for a few Gbytes (less than 2 euros per 100Gb), the second fee is proportional to

  • the number of queries per month

  • the amount of records analyzed for each query (analyzed and retrieved is different!)

  • the record size

Consequently, the larger is the record and the number of records to analyze, the higher are the involved costs: it is important to pay attention to these costs.

For a more detailed example, see

Performing large scale mutations in bigquery
Quotas and limits
Quotas and limits