Learning Big Data: MapR EcoSystem

Execution Engines	Batch	Tez
		Spark	A fast and general engine for large-scale data processing
		Cascading
		Pig	an ETL library for Hadoop. It generates MapReduce jobs. You use it when you have processes that are ETL-like.
		Map Reduce v1/v2
	ML, Graph	Graphx
		MLLIB
		Mahout	machine learning or predictive analytics. A library.
	SQL	Drill	A schema-free SQL query engine for Hadoop, NoSQL, and Cloud Storage. Doesn't use MapReduce.
		Shark
		Impala
		Hive	SQL like query used with Hbase. It uses H-sql. Ad-hoc querying.
	NoSql & Search	Accumulo
		Soir
		HBase
	Streaming	Storm	A free and open source distributed real-time computation system.
	Streaming	Spark Streaming
	Yarn		“Yet Another Resource Negotiator”. sometimes called MapReduce 2.0. Apache YARN decouples resource management and data processing in Hadoop.
Data Governance & Operations	Data Integration & Access	Hue
		HttpFS
		Flume	a log collector because Hadoop jobs produce a large amount of log information about job process because the jobs are running batch, so they take time to run
		Sqoop	Transfers bulk data between Hadop and Oracle’s DBMS.
	Security	Knox
	Security	Sentry
	Workflow & Data Governance	Falcon
	Workflow & Data Governance	Oozie	a Workflow scheduler library for Hadoop jobs
	Provisioning & Coordination	Savannah
		Juju
		Zookeeper	A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Learning Big Data