Apache Spark: Reading Data

浏览 126 扫码分享 2023-11-23 12:55:31

Apache Spark: Reading Data
- Preparing for Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3
Reading Data - CSV Files

Apache Spark: Reading Data

Preparing for Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3

Reading Data - CSV Files

Technical Accomplishments:

Start working with the API documentation
Introduce the class SparkSession and other entry points
Introduce the class DataFrameReader
Read data from:
- CSV without a Schema.
- CSV with a Schema.

Spark API

Spark API Home Page

Google for Spark API Latest or Spark API x.x.x for a specific version.
Select Spark API Documentation - Spark x.x.x Documentation - Apache Spark.
Which set of documentation you will use depends on which language you will use.

Other Documentation:

Programming Guides for DataFrames, SQL, Graphs, Machine Learning, Streaming…
Deployment Guides for Spark Standalone, Mesos, Yarn…
Configuration, Monitoring, Tuning, Security…

Here are some shortcuts

Spark API (Python)

Select Spark Python API (Sphinx).
Look up the documentation for pyspark.sql.SparkSession.
In the lower-left-hand-corner type SparkSession into the search field.
Hit [Enter].
The search results should appear in the right-hand pane.
Click on pyspark.sql.SparkSession (Python class, in pyspark.sql module)
The documentation should open in the right-hand pane.

SparkSession

Quick function review:

createDataSet(..)
createDataFrame(..)
emptyDataSet(..)
emptyDataFrame(..)
range(..)
read(..)
readStream(..)
sparkContext(..)
sqlContext(..)
sql(..)
streams(..)
table(..)
udf(..)

DataFrameReader

Look up the documentation for DataFrameReader.

Quick function review:

csv(path)
jdbc(url, table, ..., connectionProperties)
json(path)
format(source)
load(path)
orc(path)
parquet(path)
table(tableName)
text(path)
textFile(path)

Configuration methods:

option(key, value)
options(map)
schema(schema)

Apache Spark: Reading Data - 图7

若有收获，就点个赞吧

上一篇:

下一篇:

让时间为你证明

展开/收起文章目录