Service Monitoring

Example of a distributed system:

In this scenario, there are 2 distinct applications, each installed in its own Platform server.

Let’s say that the first system is gathering through remote calls a list of sellings from a set of points of sale and it is saving them on its own data repository, through a queue.

Later, these sellings are moved to a second system which has to collect, process and save them.

It would be a good idea to define for both systems the same service code for the “sellings service” (the same identifier); in this way, it would be easier to trace data coming from the first system and moved to the second one.

Platform automatically will manage behind the scenes the “distributed transaction”, so that the data born in the first system can be traced with the same transaction id also on the second system.

That will make it easier to search for problems risen along the whole distributed system, since it is possible to filter by datetime interval as well as for service code or transaction id.

In any of the services described in the example above, there is a wide range of problems that can happen and interrupt the normal execution flow:

authentication failed when invoking a remote web service
too many HTTP requests and consequently request are rejected
too many database opened connections
database locks due to incorrect business logic working concurrently on the same set of data
export too slow or interrupted with errors
file system reading/writing errors
syntax errors in server-side javascript actions
execution errors in server-side javascript actions
business logic failure in server-side javascript actions, due to invalid input data or data not consistent with the already existing one
out of memory errors
too many elements in queue

Moreover, when there is data created on a remote application and sent to another one, i.e. when there is a distributed system, something can go wrong in any of the involved components and it is important to track the same data along the whole system, which represents additional problems to take into account.

PreviousTotal monthly costs with Google Datastore NextIntroduction

Last updated 5 years ago