Alvin's Big Data Notebook : Compare Apache Ignite with Hazelcast

1. Ignite v.s. Spark

They have different use cases.
Spark is more inclined towards analytics and ML, and focused on MR-specific payloads.
Spark is a computation engine for analytics (OLAP). It has to read and move data from some where.

Ignite supports both OLTP and OLAP. It also support SQL. It doesn't require data movement.
Ignite also provides support for full ACID transactions spanning memory and optional data sources

2. Ignite v.s. Hadoop

Hadoop is a batch oriented data warehouse system.

Ignite is a real-time, transactional in memory data fabric, focused on real-time processing of operational data.
Ignite can be deployed on top of Hadoop, which is used as a "slower" datasource.

Hadoop jobs are broken down into many tasks, while GridGain reverses that.
The primary difference is that Hadoop is designed to work with large data sets (100's of TB in a single job) and GridGain is not. GridGain's jobs have a single reducer and it is given all of the values in a java.util.List. Therefore, the GridGain's jobs are limited to what can fit in a single jvm's heap.

In GridGain's system each map and reduce returns a single value. To support map/reduce like semantics, each value would need to be a list.

GridGain's framework does not sort the data between the maps and the reduce. However, since it is passed as a java.util.List to the reduce, it is pretty easy for the application to sort it as desired. Hadoop provides an automatic distributed sort of the data between the maps and reduces.

GridGain does not support combiners, or counters. Combiners are an optional pass that reduces the values out of each map to shrink the amount of data that needs to be shuffled. Counters are user and system defined events that are counted and are used to track the progress of the job as it runs.

GridGain's map method returns a Map<Task, Node>, which allows it to do task locality. However, the locality is not a suggestion, but an order.

3. Ignite v.s. Hazelcast

Hazelcast mainly provides data grid and computation grid . While Ignite has compute grid, file system, streaming, etc.
Hazelcast has no SQL support. Its heap memory is only for enterprise version rather than open source version.

Both have distributed data memory and key value stores.

4. Ignite vs. Storm, Samza

Streaming is only one functional areas of Ignite.
Ignite has better capabilities on streaming and CEP.

3. Ignite v.s. Zookeeper

Zookeeper is quorum based for resource management.
The data stored in Zookeeper can't be too large.
It will be the bottleneck when a large volume traffic go to Zookeeper.

Ignite's reads go to one primary node. Support large volume transactions.

Reference:
https://wiki.apache.org/incubator/IgniteProposal
http://wiki.apache.org/hadoop/HadoopVsGridGain

Alvin's Big Data Notebook

Monday 6 April 2015

Compare Apache Ignite with Hazelcast

No comments:

Post a Comment