본문 바로가기

AI Tool and Library

Apache Spark : Unified engine for large-scale data analytics

반응형

https://spark.apache.org/

 

Apache Spark™ - Unified Engine for large-scale data analytics

Run now Install with 'pip' $ pip install pyspark $ pyspark Use the official Docker image $ docker run -it --rm spark:python3 /opt/spark/bin/pyspark QuickStart Machine Learning Analytics & Data Science df = spark.read.json("logs.json") df.where("age > 21").

spark.apache.org

 

 

1. What is Apache Spark ?

Apache Spark is a multi-language engine for executing data engineering, data science, and machine leanring on single-node machines or clusters. 

 

2. Key features

1) Batch/Streaming data

2) SQL Analytics

3) Data Science at scale

4) Machie learning 

 

 

3. Why Spark 

The most widely used engine for scalable computing 

Thousands of companies, including 80% of the Fortune 500, use Apache Spark.

Over 2000 contributors to the open source project from industry and academia. 

 

Apache Spark integrates with your favorite frameworks, helping to scale them to thousands of machines 

 

 

반응형
LIST