Wednesday, January 18, 2017

MapR EcoSystem

https://www.mapr.com/products/product-overview/overview%20

Execution Engines
Batch
Tez

Spark
A fast and general engine for large-scale data processing
Cascading

Pig
an ETL library for Hadoop. It generates MapReduce jobs. You use it when you have processes that are ETL-like.
Map Reduce v1/v2

ML, Graph
Graphx

MLLIB

Mahout
machine learning or predictive analytics. A library.
SQL
Drill
A schema-free SQL query engine for Hadoop, NoSQL, and Cloud Storage. Doesn't use MapReduce.
Shark

Impala

Hive
SQL like query used with Hbase. It uses H-sql. Ad-hoc querying.
NoSql & Search
Accumulo

Soir

HBase

Streaming
Storm
A free and open source distributed real-time computation system.
Spark Streaming


Yarn
“Yet Another Resource Negotiator”. sometimes called MapReduce 2.0. Apache YARN decouples resource management and data processing in Hadoop.
Data Governance & Operations
Data Integration & Access
Hue

HttpFS

Flume
a log collector because Hadoop jobs produce a large amount of log information about job process because the jobs are running batch, so they take time to run
Sqoop
Transfers bulk data between Hadop and Oracle’s DBMS.
Security
Knox

Sentry

Workflow & Data Governance
Falcon

Oozie
a Workflow scheduler library for Hadoop jobs
Provisioning & Coordination
Savannah

Juju

Zookeeper
A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

No comments:

Post a Comment