Cassandra Space Persistency


Search XAP 9.1
Searching XAP 9.1 Documentation
Browse XAP 9.1

                                                              

Cassandra Space Persistency Support makes use of a new technology preview persistency API. As such, references to different APIs will point to the XAP 9.5 documentation. For further details about these APIs see Space Persistency.

Overview

The Apache Cassandra Projectâ„¢ is a scalable multi-master database with no single points of failure. The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.

Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines. Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime. Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.

Cassandra Space Data Source and Space Synchronization Endpoint

GigaSpaces comes with built in implementations of Space Data Source and Space Synchronization Endpoint for Cassandra, called CassandraSpaceDataSource and CassandraSpaceSynchronizationEndpoint, respectively.

For further details about the persistency APIs used see Space Persistency.

Cassandra Space Data Source

Configuration

A Cassandra based implementation of the Space Data Source.

Library dependencies

The Cassandra Space Data Source uses Cassandra JDBC Driver and Hector Library For communicating with the Cassandra cluster.
include the following in your pom.xml

<!-- currently the cassandra-jdbc library is not the central maven repository -->
<repository>
    <id>org.openspaces</id>
    <name>OpenSpaces</name>
    <url>http://maven-repository.openspaces.org</url>
</repository>

<dependency>
    <groupId>org.apache.cassandra</groupId>
    <artifactId>cassandra-clientutil</artifactId>
    <version>1.1.6</version>
</dependency>

<dependency>
    <groupId>org.apache.cassandra</groupId>
    <artifactId>cassandra-thrift</artifactId>
    <version>1.1.6</version>
</dependency>

<dependency>
    <groupId>org.apache.cassandra</groupId>
    <artifactId>cassandra-jdbc</artifactId>
    <version>1.1.2</version>
</dependency>

<dependency>
    <groupId>org.hectorclient</groupId>
    <artifactId>hector-core</artifactId>
    <version>1.1-2</version>
</dependency>

Setup

An example of how the Cassandra Space Data Source can be configured for a space that loads data back from Cassandra once initialized and
also asynchronously persists the data using a mirror (see Cassandra Space Synchronization Endpoint).

For more details about different configurations see [Space Persistency].

CassandraSpaceDataSource Properties

Property Description Default
cassandraDataSource A configured org.apache.cassandra.cql.jdbc.CassandraDataSource bean. Must be configured to use CQL 2.0.0.  
hectorClient A configured HectorCassandraClient bean. see Hector Cassandra Client.  
minimumNumberOfConnections The minimum number of jdbc connections to hold in the pool. 5
maximumNumberOfConnections The maximum number of jdbc connections to hold in the pool. If a connection is required and the pool is full, a new connection will be opened which will be closed shortly after its usage is completed. 30
batchLimit The underlying cassandra-jdbc implementation brings the entire result set in one batch. If paging is required, this parameter will control the maximum number of entries to fetch in each batch. (this parameter controls both initial data load and general cache miss queries) 10000
fixedPropertyValueSerializer see Property Value Serializer.  
dynamicPropertyValueSerializer see Property Value Serializer.  

Considerations

General limitations

  • Extended indexes are not supported. (If one is set on a property, it will be treated as Basic index).
  • All classes that belong to types that are to be introduced to the space during the initial metadata load must exist on the classpath of the JVM the Space is running on.
  • Unindexed properties cannot be queried.

Cache miss Query limitations

Supported queries:

  • id = 1234
  • name = 'John' AND age = 13
  • address.streetName = 'Liberty'

Unsupported queries:

  • age > 15
  • name = 'John' OR name = 'Jane'

Unsupported queries and queries on unindexed properties will result in a runtime exception.

Cassandra Space Synchronization Endpoint

Configuration

A Cassandra based implementation of the Space Synchronization Endpoint.

Library dependencies

The Cassandra Space Synchronization Endpoint uses the Hector Library For communicating with the Cassandra cluster.
Include the following in your pom.xml

Setup

An example of how the Cassandra Space Synchronization Endpoint can be configured within a mirror.

For more details about different configurations see Space Persistency.

CassandraSpaceSynchronizationEndpoint Properties

Property Description
hectorClient A configured HectorCassandraClient bean. see Hector Cassandra Client.
fixedPropertyValueSerializer see Property Value Serializer.
dynamicPropertyValueSerializer see Property Value Serializer.
flattenedPropertiesFilter see Flattened Properties Filter.
columnFamilyNameConverter see Column Family Name Converter.

Property Value Serializer

By default when serializing object/document properties to column values, the following serialization logic is applied:

For fixed properties:

  • If the type of the value to be serialized matches a primitive type in Cassandra it will be serialized as defined by the Cassandra primitive type serialization protocol.
  • Otherwise, the value will be serialized using standard java Object serialization mechanism.

For dynamic properties:

It is possible to override this default behavior by providing a custom implementation of PropertyValueSerializer .
This interface is defined by these 2 methods:

ByteBuffer toByteBuffer(Object value);
Object fromByteBuffer(ByteBuffer byteBuffer);

The behavior of overriding the serialization logic is different for fixed properties and dynamic properties:

  • Fixed properties will only be serialized by the custom serializer if their type does not match a primitive type in Cassandra.
  • Dynamic properties will always be serialized using the provided implementation. This means that they should be able to handle primitive types such as Integer, Long, etc...
Overriding the property value serializers in the Cassandra Space Synchronization Endpoint must be followed by overriding the same serializers in the Cassandra Space Data Source. Failure to do so will prevent the Cassandra Space Data Source from properly deserializing values read from Cassandra.

Flattened Properties Filter

Introduction

When a type is introduced to the Cassandra Space Synchronzation Endpoint, the type's fixed properties will be introspected and the final result will be a mapping from this type's nested properties to column family columns.
The default behavior of this mapping is explained in the following example.
Consider the following simple POJO (could also be a SpaceDocument's fixed properties):

// implementation omitted for brevity
@SpaceClass
public class Person {

    @SpaceId
    public Long getId() ...

    public String getName() ...

    public Address getAddress() ...

    ...

}

public class Address {

    public String getStreetName() ...

    public Long getStreetNumber() ...

}

By default, the fixed properties will be mapped to the Person column family in Cassandra like this:

Property Column Name (and type)
person.id (row key) (type: Long)
person.name name (type: UTF8)
person.address.streetName address.streetName (type: UTF8)
person.address.streetNumber address.streetNumber (type: Long)

Notice how the address property was flattened and its properties are flattened as columns.

Now suppose that a Person is written to the space as a SpaceDocument which also includes these dynamic properties:

  • String newName
  • Address newAddress

By default, dynamic properties are not flattened and are written as is to Cassandra. Moreover, their static type is not updated in the Column Family metadata and they are serialized using a custom serializer. (see Property Value Serializer).

This is how they will be written to Cassandra:

Property Column Name (and type)
person.newName newName (type: Bytes)
person.newAddress newAddress (type: Bytes)

Customization

It is possible to override the above behavior by providing a FlattenedPropertiesFilter implementation.
The implementations is used during type introspection when a type is first introduced to the synchronization endpoint and whenever an entry of that type is written which contains dynamic properties.

The interface is defined by a single method:

boolean shouldFlatten(PropertyContext propertyContext);

The return value indicates whether the current introspected property should be serialized as is or should its nested properties be introspected as well.
As for the above example, the default implementation DefaultFlattenedPropertiesFilter returns true if and only if the property is fixed and the current introspection nesting level does not exceed 10.

the PropertyContext contains the following details about the current introspected property:

String getPath();
String getName();
Class<?> getType();
boolean isDynamic();
int getCurrentNestingLevel();

Column Family Name Converter

Due to implementation details of Cassandra regarding Column Families there are certain limitations when converting a type name (e.g: com.example.data.Person) to a column family name. Among these limitations is a 48 characters max length limitation and invalid characters in the name (such as '.').
The behavior for converting a type name to a column family name when creating a column family is defined by the interface ColumnFamilyNameConverter .
This interface is defined by 1 method:

String toColumnFamilyName(String typeName);

The default implementation is: DefaultColumnFamilyNameConverter .

Considerations

  • Collections and Maps are not flattened and are serialized as blobs using java object serialization mechanism.
  • Writing entries that only have their id property set is not supported, these entries will not be written to Cassandra.
IMPORTANT: This is an old version of GigaSpaces XAP. Click here for the latest version.

Labels

 
(None)