Cassandra Space Persistency Support makes use of a new technology preview persistency API. As such, references to different APIs will point to the XAP 9.5 documentation. For further details about these APIs see Space Persistency.
The Apache Cassandra Projectâ„¢ is a scalable multi-master database with no single points of failure. The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines. Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime. Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
Cassandra Space Data Source and Space Synchronization Endpoint
GigaSpaces comes with built in implementations of Space Data Source and Space Synchronization Endpoint for Cassandra, called CassandraSpaceDataSource and CassandraSpaceSynchronizationEndpoint, respectively.
For further details about the persistency APIs used see Space Persistency.
The Cassandra Space Data Source uses Cassandra JDBC Driver and Hector Library For communicating with the Cassandra cluster.
include the following in your pom.xml
<!-- currently the cassandra-jdbc library is not the central maven repository --><repository><id>org.openspaces</id><name>OpenSpaces</name><url>http://maven-repository.openspaces.org</url></repository><dependency><groupId>org.apache.cassandra</groupId><artifactId>cassandra-clientutil</artifactId><version>1.1.6</version></dependency><dependency><groupId>org.apache.cassandra</groupId><artifactId>cassandra-thrift</artifactId><version>1.1.6</version></dependency><dependency><groupId>org.apache.cassandra</groupId><artifactId>cassandra-jdbc</artifactId><version>1.1.2</version></dependency><dependency><groupId>org.hectorclient</groupId><artifactId>hector-core</artifactId><version>1.1-2</version></dependency>
Setup
An example of how the Cassandra Space Data Source can be configured for a space that loads data back from Cassandra once initialized and
also asynchronously persists the data using a mirror (see Cassandra Space Synchronization Endpoint).
<?xml version="1.0"?>
<beans xmlns="http://www.springframework.org/schema/beans"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:os-core="http://www.openspaces.org/schema/core"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
http://www.openspaces.org/schema/core
http://www.openspaces.org/schema/9.5/core/openspaces-core.xsd">
<bean id="propertiesConfigurer"
class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/>
<bean id="cassandraDataSource" class="org.apache.cassandra.cql.jdbc.CassandraDataSource"><constructor-arg value="${cassandra.host}" /><constructor-arg value="${cassandra.port}" /><constructor-arg value="${cassandra.keyspace}" /><constructor-arg value="${cassandra.user}" /><constructor-arg value="${cassandra.password}" /><constructor-arg value="2.0.0" /></bean>
<bean id="hectorClient"
class="org.openspaces.persistency.cassandra.HectorCassandraClientFactoryBean">
<!-- comma separated seed list --><property name="hosts" value="${cassandra.host}" /><!-- cassandra rpc communication port --><property name="port" value="${cassandra.port}" /><!-- keyspace name to work with --><property name="keyspaceName" value="${cassandra.keyspace}" /></bean>
<bean id="cassandraSpaceDataSource"
class="org.openspaces.persistency.cassandra.CassandraSpaceDataSourceFactoryBean">
<!-- configured above --><property name="cassandraDataSource" ref="cassandraDataSource" /><!-- configured above --><property name="hectorClient" ref="hectorClient" /></bean>
<os-core:space id="space" url="/./dataSourceSpace"
space-data-source="cassandraSpaceDataSource"
schema="persistent"
mirror="true">
<os-core:properties><props><!-- Use ALL IN CACHE, put 0 for LRU --><prop key="space-config.engine.cache_policy">1</prop><prop key="cluster-config.cache-loader.central-data-source">true</prop><prop key="cluster-config.mirror-service.supports-partial-update">true</prop></props></os-core:properties></os-core:space><os-core:giga-space id="gigaSpace" space="space" /></beans>
The minimum number of jdbc connections to hold in the pool.
5
maximumNumberOfConnections
The maximum number of jdbc connections to hold in the pool. If a connection is required and the pool is full, a new connection will be opened which will be closed shortly after its usage is completed.
30
batchLimit
The underlying cassandra-jdbc implementation brings the entire result set in one batch. If paging is required, this parameter will control the maximum number of entries to fetch in each batch. (this parameter controls both initial data load and general cache miss queries)
Extended indexes are not supported. (If one is set on a property, it will be treated as Basic index).
All classes that belong to types that are to be introduced to the space during the initial metadata load must exist on the classpath of the JVM the Space is running on.
Unindexed properties cannot be queried.
Cache miss Query limitations
Supported queries:
id = 1234
name = 'John' AND age = 13
address.streetName = 'Liberty'
Unsupported queries:
age > 15
name = 'John' OR name = 'Jane'
Unsupported queries and queries on unindexed properties will result in a runtime exception.
The Cassandra Space Synchronization Endpoint uses the Hector Library For communicating with the Cassandra cluster.
Include the following in your pom.xml
By default when serializing object/document properties to column values, the following serialization logic is applied:
For fixed properties:
If the type of the value to be serialized matches a primitive type in Cassandra it will be serialized as defined by the Cassandra primitive type serialization protocol.
Otherwise, the value will be serialized using standard java Object serialization mechanism.
For dynamic properties:
All values will be serialized using the default dynamic property value serializer implementation: DynamicPropertyValueSerializer
It is possible to override this default behavior by providing a custom implementation of PropertyValueSerializer .
This interface is defined by these 2 methods:
The behavior of overriding the serialization logic is different for fixed properties and dynamic properties:
Fixed properties will only be serialized by the custom serializer if their type does not match a primitive type in Cassandra.
Dynamic properties will always be serialized using the provided implementation. This means that they should be able to handle primitive types such as Integer, Long, etc...
Overriding the property value serializers in the Cassandra Space Synchronization Endpoint must be followed by overriding the same serializers in the Cassandra Space Data Source. Failure to do so will prevent the Cassandra Space Data Source from properly deserializing values read from Cassandra.
Flattened Properties Filter
Introduction
When a type is introduced to the Cassandra Space Synchronzation Endpoint, the type's fixed properties will be introspected and the final result will be a mapping from this type's nested properties to column family columns.
The default behavior of this mapping is explained in the following example.
Consider the following simple POJO (could also be a SpaceDocument's fixed properties):
// implementation omitted for brevity
@SpaceClass
public class Person {
@SpaceId
publicLong getId() ...
publicString getName() ...
public Address getAddress() ...
...
}
public class Address {
publicString getStreetName() ...
publicLong getStreetNumber() ...
}
By default, the fixed properties will be mapped to the Person column family in Cassandra like this:
Property
Column Name (and type)
person.id
(row key) (type: Long)
person.name
name (type: UTF8)
person.address.streetName
address.streetName (type: UTF8)
person.address.streetNumber
address.streetNumber (type: Long)
Notice how the address property was flattened and its properties are flattened as columns.
Now suppose that a Person is written to the space as a SpaceDocument which also includes these dynamic properties:
String newName
Address newAddress
By default, dynamic properties are not flattened and are written as is to Cassandra. Moreover, their static type is not updated in the Column Family metadata and they are serialized using a custom serializer. (see Property Value Serializer).
This is how they will be written to Cassandra:
Property
Column Name (and type)
person.newName
newName (type: Bytes)
person.newAddress
newAddress (type: Bytes)
Customization
It is possible to override the above behavior by providing a FlattenedPropertiesFilter implementation.
The implementations is used during type introspection when a type is first introduced to the synchronization endpoint and whenever an entry of that type is written which contains dynamic properties.
The return value indicates whether the current introspected property should be serialized as is or should its nested properties be introspected as well.
As for the above example, the default implementation DefaultFlattenedPropertiesFilter returns true if and only if the property is fixed and the current introspection nesting level does not exceed 10.
the PropertyContext contains the following details about the current introspected property:
String getPath();
String getName();
Class<?> getType();
boolean isDynamic();
int getCurrentNestingLevel();
Column Family Name Converter
Due to implementation details of Cassandra regarding Column Families there are certain limitations when converting a type name (e.g: com.example.data.Person) to a column family name. Among these limitations is a 48 characters max length limitation and invalid characters in the name (such as '.').
The behavior for converting a type name to a column family name when creating a column family is defined by the interface ColumnFamilyNameConverter .
This interface is defined by 1 method: