Apache Spark : Unified engine for large-scale data analytics

https://spark.apache.org/

Apache Spark™ - Unified Engine for large-scale data analytics

Run now Install with 'pip' $ pip install pyspark $ pyspark Use the official Docker image $ docker run -it --rm spark:python3 /opt/spark/bin/pyspark QuickStart Machine Learning Analytics & Data Science df = spark.read.json("logs.json") df.where("age > 21").

spark.apache.org

1. What is Apache Spark ?

Apache Spark is a multi-language engine for executing data engineering, data science, and machine leanring on single-node machines or clusters.

2. Key features

1) Batch/Streaming data

2) SQL Analytics

3) Data Science at scale

4) Machie learning

3. Why Spark

The most widely used engine for scalable computing

Thousands of companies, including 80% of the Fortune 500, use Apache Spark.

Over 2000 contributors to the open source project from industry and academia.

Apache Spark integrates with your favorite frameworks, helping to scale them to thousands of machines

LIST

저작자표시 (새창열림)

'AI Tool and Library' 카테고리의 다른 글

Hugging Face - Evaluate - A library for easily evaluating machine learning models and datasets (0)	2024.05.01
HuggingFace - disagreegator - Curated data labelers for in-depth analysis (0)	2024.05.01
DELTA LAKE : Build Lakehouses with Delta Lake (0)	2024.04.30
MLflow: A Tool for Managing the Machine Learning Lifecycle (0)	2024.04.25
DeepSpeed - a deep learning optimization library (0)	2024.04.23

Data Scientist Story For Sustainability

Apache Spark : Unified engine for large-scale data analytics

'AI Tool and Library' 카테고리의 다른 글

티스토리툴바

Apache Spark : Unified engine for large-scale data analytics

'AI Tool and Library' 카테고리의 다른 글

'AI Tool and Library' Related Articles

티스토리툴바