XAP66 == "XAP9NET" || XAP66 == "XAP9" || XAP66 == "XAP9NET")

Writing Your First Caching Application

Search XAP 6.6
Searching XAP 6.6 Documentation
Browse XAP 6.6
Offline Documentation

Download latest offline documentation in HTML format:
xap-6.6.1-documentation.zip (20.6MB)

                                                              

Wish to become a GigaSpaces professional and get training from our product experts?
Click here for details about our various training and education programs.

Summary: This tutorial shows how an application interacts with a GigaSpaces Data Grid, clustered in either a replicated, partitioned, master-local, or local-view topology. The application either actively reads data or registers for notifications.

This tutorial contains a client application that runs on GigaSpaces 6.6. You must have GigaSpaces 6.6 installed before proceeding.
You can download the product here.
Both Java and .NET implementations are provided:

This icon specifies instructions relevant only for Java.
This icon specifies instructions relevant only for .NET.

Overview


Different applications might have different caching requirements. Some applications require on-demand loading from a remote cache, due to limited memory; others use the cache for read-mostly purposes; transactional applications need a cache that handles both write and read operations and maintains consistency.

In order to address these different requirements, GigaSpaces provides an In-Memory Data Grid that is policy-driven. Most of the policies do not affect the actual application code, but rather affect the way each Data Grid instance interacts with other instances. The policies allow the Data Grid to be configured in almost any topology; most common topologies are predefined in the GigaSpaces product and do not require editing policies.

In this tutorial, you will use GigaSpaces to implement a simple application that writes and retrieves user accounts from the GigaSpaces In-Memory Data Grid, clustered in the most common topologies - replicated, partitioned, master-local and local-view. The application will either actively read data or ask to be notified when data is written to or modified in the Data Grid.

GigaSpaces Data Grid - Basic Terms

  • Data Grid instance - an independent data storage unit, also called a cache. The Data Grid is comprised of all the Data Grid instances running on the network.

  • Space - a distributed, shared, memory-based repository for objects. A space runs in a space container - this is usually transparent to the developer. In GigaSpaces each Data Grid instance is implemented as a space, and the Data Grid is implemented as a cluster of spaces organized in one of several predefined topologies.

  • Grid Service Container - a generic container that can run one or more space instances (together with their space containers) and other services. This container is launched on each machine that participates in the Data Grid, and hosts the Data Grid instances.

  • Replication - a relationship in which data is copied between two or more Data Grid instances, with the aim of having the same data in some or all of them.

  • Syncronous replication - replication in which applications using the Data Grid are blocked until their changes are propagated to all Data Grid instances. This guarantees that everyone sees the same data, but reduces performance.
  • Asyncronous replication - replication in which changes are propagated to Data Grid instances in the background; applications do not have to wait for their changes to be propagated. Asynchronous replication does not negatively effect performance, but on the other hand, changes are not instantly available to everyone.
  • Partitioning - new data or operations on data are routed to one of several Data Grid instances (partitions). Each Data Grid instance holds a subset of the data, with no overlap. Partitioning is done according to an index field in the data - operations are routed to partitions based on the value of this field.

  • Topology - a specific configuration of Data Grid instances. For example, a replicated topology is a configuration in which some or all Data Grid instances replicate data between them. In GigaSpaces, Data Grid topologies are defined by cluster policies (explained in the following section).
  • Reading - one way to retrieve data from the Data Grid, which will be used in this tutorial, is to call the space read operation, supplying a read template object which specifies what needs to be read.
  • Notifications - GigaSpaces allows applications to be notified when changes are made to objects in the Data Grid. Applications register in advance to be notified about specific events. When these events occur, a notification is triggered on the application, which delivers the actual data that triggered the event.

GigaSpaces Clustering Concepts

In GigaSpaces, a cluster is a grouping of several spaces running in one or more containers. For an application trying to access data, the cluster appears as one space, but in fact consists of several spaces which may be distributed across several physical machines. The spaces in the cluster are also called cluster members.

A cluster group is a logical collection of cluster members, which defines how these members interact. The only way to define relationships between clustered spaces in GigaSpaces, is to add them to a group and define policies. A cluster can contain several, possibly overlapping groups, each of which defines some relations between some cluster members - this provides much flexibility in cluster configuration.

A GigaSpaces cluster group can have one or more of the following policies:

  • Replication Policy - defines replication between two or more spaces in the cluster, and replication options such as synchronous/asynchronous and replication direction.
  • Load Balancing Policy - because user requests are submitted to the entire cluster, there is a need to distribute the requests between cluster members. The load balancing policy defines an algorithm according to which requests are routed to different members. For example, in a replicated topology, requests are divided evenly between cluster members; in a partitioned topology they are routed according to the partitioning key.
  • Failover Policy - defines what happens when a cluster member fails. Operations on the cluster member can be transparently routed to another member in the group, or to another cluster group.

A cluster schema is an XML file which defines a cluster - the cluster name, which spaces are included in the cluster, which groups are defined on them, and which policies are defined for each group. GigaSpaces provides predefined cluster schemas for all common cluster topologies. Each topology is a certain combination of replication, load balancing and failover policies.

Data Grid Topologies Shown in this Tutorial

Topology and Description Common Use Options
Replicated (view diagram)
Two or more space instances with replication between them.
Allowing two or more applications to work with their own dedicated data store, while working on the same data as the other applications.
  • Replication can be synchronous (slower but guarantees consistency) or asynchronous (fast but less reliable, as it does not guarantee identical content).
  • Space instances can run within the application (embedded - allows faster read access) or as a separate process (remote - allows multiple applications to use the space, easier management).
  • In this tutorial: two remote spaces, synchronous replication.
Partitioned (view diagram)
Data and operations are split between two spaces (partitions) according to an index field defined in the data. An algorithm, defined in the Load-Balancing Policy, maps values of the index field to specific partitions.
Allows the In-Memory Data Grid to hold a large volume of data, even if it is larger than the memory of a single machine, by splitting the data into several partitions.
  • Several routing algorithms to chose from.
  • With/without backup space for each partition.
  • In this tutorial: Two spaces, hash-based routing, with backup.
Master-Local (view diagram)
Each application has a lightweight, embedded cache, which is initially empty. The first time data is read, it is loaded from a master cache to the local cache (lazy load); the next time the same data is read, it is loaded quickly from the local cache. Later on data is either updated from the master or evicted from the cache.

Boosting read performance for frequently used data. A useful rule of thumb is to use a local cache when over 80% of all operations are read operations.
  • The master cache can be clustered in any of the other topologies: replicated, partitioned, etc.
  • In this tutorial: The master cache comprises two spaces in a partitioned topology.
Local-View (view diagram)
Similar to master-local, except that data is pushed to the local cache. The application defines a filter, using a spaces read template or an SQL query, and data matching the filter is streamed to the cache from the master cache.
Achieving maximal read performance for a predetermined subset of data.
  • The master cache can be clustered in any of the other topologies: replicated, partitioned, etc.
  • In this tutorial: The master cache comprises two spaces in a partitioned topology.

The topologies above are provided in the GigaSpaces product as predefined cluster schemas. Schemas can be found inside the <GigaSpaces Root>/lib/JSpaces.jar, under the schemas/config directory. The schema names are:

  • Synchronous replication - sync_replicated-cluster-schema.xsl
  • Partitioned with backup - partitioned-sync2backup-cluster-schema.xsl
    The master-local and local-view topologies do not need their own schemas, because the local cache is defined on the client side.

Deploying the Data Grid


Now that you have a little background about the GigaSpaces Data Grid and the topologies used in this tutorial, the first step is to deploy the Data Grid.

To deploy the Data Grid instances, you will first launch two GigaSpaces Grid Service Containers (generic containers that can run Data Grid instances) on the same machine. Each container will host one cluster node. In real life, each cluster node usually runs on a different physical machine.

You will also start a Grid Service Manager that will manage the two Grid Service Containers.

Then, using the GigaSpaces Management Center (GS-UI), you will launch two spaces, clustered together according to one of the Data Grid topologies discussed above.

Start by choosing the Data Grid topology that interests you most, and launching it using the instructions below. After you start the client application and test this topology (as described in the following sections), you can return to this section, deploy another topology, and try it out as well.

To run the Grid Service Containers:

  1. Start the GS-UI, by executing <GigaSpaces Root>\bin\gs-ui.bat (or .sh).
  2. From the upper toolbar select Launch -> Local Service (GSM/GSC) -> Grid Service Manager to start a local (running in this machine) Grid Service Manager, which manages the containers.
  3. From the upper toolbar select Launch ->* Local Service (GSM/GSC)* -> Grid Service Container to start a local (running in this machine) Grid Service Container.
  4. Start another Grid Service Container by selecting Launch -> Local Service (GSM/GSC) -> Grid Service Container again.

To deploy the Data Grid:

  1. Inside the GS-UI, on the toolbar at the top, click the Launch Data Grid ( ) button. This is how you deploy a data grid.



    The following page showing the Data Grid attribute fields is displayed:



  2. In the Data Grid Name field, type the name myDataGrid as shown above. This name represents the Data Grid you are deploying in the GS-UI. This name will be given to all spaces in the cluster. Remember this space name - you will use it when running the client application and connecting to the Data Grid.
  3. In the Space Schema field, leave the space schema as default. This field allows you to specify whether the space instances in the cluster should be persistent (data automatically persisted to a database) or not. You will not use persistency in this tutorial.
  4. In this page of the wizard you will define the Data Grid topology by filling the Cluster Info area, do one of the following:
    • If you want to deploy the Data Grid in a replicated topology, From the *Cluster schema drop-down menu, select the sync_replicated option. This option uses the sync_replicated-cluster-schema, which has synchronous replication between all cluster members. This option refers to a single space or a cluster of spaces (in one of several common topologies) with no backup.
      • Select the number of spaces (Data Grid instances) in your replicated cluster. Deploy a cluster with 2 spaces, by typing the number 2 into Number of Instances field.
        The following shows the settings for the replicated topology:



    • If you want one of the other topologies, partitioned, master-local or local-view, from the Cluster schema drop-down menu, select the partitioned option. This option refers to a single space with a backup, or a partitioned cluster of spaces with backups.
      • You need to select the number of partitions. Specify two partitions by typing 2 into the Number of Instances field. This option uses the partitioned-cluster-schema. Specify one backup for each partition, by typing 1 into the Number of backups field. When using the partitioned cluster with backups the cluster schema used is the partitioned-sync2backup-cluster-schema.
        The following shows the settings for the partitioned (with backup) topology:



    • For both topologies you need to select a Grid Service Manager (GSM) for deployment from the table placed in the bottom area of the page.
      The table might include more than one Grid Service Manager. If so, look for the specific manager you launched - you can find it according to the Machine field (look for the machine on which you ran the Grid Service Manager). Click your Grid Service Manager to select it.



  5. Click Deploy to deploy the cluster. Deployment status is displayed (Here for the two replicated Data Grid instances):



    In the master-local and local-view topologies, the master cache can in principle be clustered in any topology - partitioned, replicated, etc. (or can be a single space). The master-local/local-view aspect of the topology is specified on the client side: when the client connects to the cluster or space (the master cache), it specifies if it wants to start a local cache and how this cache should operate.



    Depending on the type of deployment you performed, you should see that either two spaces (two replicated Data Grid instances) or four spaces (two Data Grid partitions with one backup each) were provisioned to the host running the Grid Service Containers.
  6. If this is not the first topology you are deploying, and you are already familiar with the client application, skip to Running Client, Testing Notifications and Verifying Topologies.

    You deployed the the Data Grid using the GS-UI and its Deployment Wizard. An alternative way to deploy is to start the cluster manually, by executing the gsInstance script (<GigaSpaces Root>\bin\gsInstance.bat or .sh). Manual deployment requires the use of Space URLs, which might take different arguments for different topologies.

    For more details on deploying a cluster manually, refer to Space URL.


The Client Application


In this tutorial, we provide a sample application that consists of the following components:

  • A Data Loader that writes data to the Data Grid.
  • A Simple Reader that reads data directly from the Data Grid (using spaces read).
  • A Notified Reader that registers for notifications on the Data Grid and is notified when data is written by the Data Loader.
    You can run one or more reader of either or both types.
  • An Account object, defined as a POJO (Java) or PONO (.NET), which represents the data in the Data Grid. It has the following fields: userName, accountID and balance.

Getting Source Code and Full Client Package

The source code of all three components, and the scripts used to run them, remains the same for all Data Grid topologies described above. To view the source code, use the links below:

The full Java client package including execution scripts is included, together with other GigaSpaces examples and tutorials. Find the client package for this tutorial at <GigaSpaces Root>\examples\tutorials\datagrid\topologies.

The full .NET client package can be found at the following path: <GigaSpaces Root>\dotnet\examples\DataGrid. If you don't see this path, this is because when you download the product, the dotnet directory is initially zipped. Extract the ZIP file in the dotnet directory into <GigaSpaces Root\dotnet, then look for this tutorial's client package under <GigaSpaces Root>\dotnet\examples\DataGrid.

Client Operating Process (In Brief)

  1. When you run the Data Loader, it:
    • Connects to the Data Grid and clears it from all data.
    • Creates a new Account object, with a certain userName and accountID. The Account also has a balance (Java) or Balance (.NET) field, which is obtained by calculating accountID*10 (Java) or AccountID*10 (.NET).
    • Writes 100 Account instances with IDs 1 through 100 to the Data Grid, using JavaSpaces write.
  1. When you run a Simple Reader, it reads all the Account instances in the Data Grid, then reads them again every few seconds, until you close it.
  2. When you run a Notified Reader, it registers for notification on the Account class, and starts listening for notifications. When Account objects are written to the Data Grid, the Notified Reader immediately receives notifications from the Data Grid. The notifications include the Account objects themselves.
  3. If you run more 'Simple Readers' or 'Notified Readers', they repeat step 2 or 3 above, respectively.

How the Client Application Connects to the Data Grid

The application connects to the space using the GigaSpaces SpaceFinder.find() (Java) or SpaceProxyProviderFactory.Instance.FindSpace(spaceUrl) (.NET) method. This is a method that accepts a space URL, discovers the space, and returns a proxy that allows the application to work with the space. The URL is usually not defined in the client application itself, but is supplied to it as an argument when it is started.

In this tutorial, we will use a space connection URL similar to the following:

jini://*/*/myDataGrid

  • This URL uses the Jini protocol, which enables dynamic discovery of the space (the client does not need to know which machines are participating in the Data Grid).
  • */*/myDataGrid specifies that the client wants to connect to a cluster in which all the spaces are called myDataGrid, regardless of which physical machines participate in the cluster.
  • useLocalCache is an additional parameter, not shown above, which launches a local cache in the connecting application. This is necessary for the master-local and local-view topologies.

The URL above is used by the application to connect to the space (a cluster of spaces in this case), so it is called a space connection URL. This should not be confused with a space start URL, a similar form of URL which can be used to start a space. In this tutorial, you will not use a space start URL, rather you will start the spaces using the GS-UI, as described below.

How Notifications Work

In a GigaSpaces Data Grid, applications can ask to be notified when changes are made to objects in the Data Grid. A request for notification has two components: a template and a mask:

  • The template specifies the class type and attribute values the application is interested in.
  • The mask (also called NotifyActionType in Java or DataEventType in .NET) specifies which events the application wants to be notified about - new data written to the Data Grid, data taken from the Data Grid, and so on.

GigaSpaces provides a mechanism that handles this process without requiring remote calls. The mechanism works as follows:

And here is the callback method invoked when the application is notified:

The Data - Defined as a POJO (Java) or PONO (.NET)

In this tutorial all the objects written to the space instances, which make up the Data Grid, are Plain Old Java Objects - POJOs (Java) or Plain Old .NET Objects - PONOs (.NET). This is in contrast to the tutorials in the Parallel Processing Track of this Quick Start Guide, in which objects written to the space implement the Entry class, as in the JavaSpaces standard.

To demonstrate use of POJOs (Java) or PONOs (.NET), the Account class is implemented with private fields, and with set/get methods (Java) or Properties (.NET) for each field, which enable the space to read and write the field value. For example:

Index Field for Partitioning

Inside the Account object, one of the data fields is defined as a routing index field for the purposes of partitioning. If this object is used in a Data Grid deployed in a partitioned topology, the routing index field is used to distribute data between the Data Grid instances, and to retrieve data from the relevant Data Grid instance when it is read.

In this tutorial, the routing index field is AccountID, and the partitioning algorithm is a hash. This means the operations on accounts are distributed evenly, based on the AccountID, between the Data Grid instances. You deployed two spaces (Data Grid instances), so all the operations on half the accounts - those with even IDs - go to one space, and all operations on the other half - with the odd IDs - go to the other space.

Here is how AccountID is defined as the index field (Java) or property (.NET), inside the Account object - in Java, using annotations before the get/set methods; in .NET, using attributes before the property:

When using JDK 1.4, instead of using annotations, an Account.gs.xml file should be placed in a folder named config\mapping. The file should contain the following:

persist="false" replicate="false" fifo="false" >

For more information on using a gs.xml file instead of annotations (in Java), refer to C++ Mapping File.


Running Client, Testing Notifications and Verifying Data Grid Topologies


Now that you have started the Data Grid topology of your choice, you can run the client application, described in the previous section, verify that the Notified Reader receives notifications, and then test that the Data Grid topology is functioning as expected (for example, that data is really being replicated between the spaces).

Before you begin - download and compile the client application:

  1. If you haven't done so already, extract the client application:
    If your <GigaSpaces Root>\dotnet folder contains a ZIP file, extract it.
  2. The client application package should appear at the following path:
    <GigaSpaces Root>\examples\tutorials\datagrid\topologies.
    <GigaSpaces Root>\dotnet\examples\DataGrid
  3. Compile the client's source files by executing \bin\compile.bat (or .sh) from the example folder.

Select the topology you deployed from the tabs below.

What's Next?

Try Another Tutorial
GigaSpaces XAP Help Portal

Further Reading


IMPORTANT: This is an old version of GigaSpaces XAP. Click here for the latest version.

Labels

 
(None)