In this post, we will be learning how to connect to a JDBC data-source using SparkSQL data frames. In case you are not familiar with SparkSQL
We know that DataFrame API was introduced in Spark 1.3 and that Spark RDDs are a unit of computation in Spark and doesn’t contain any schema information about the structure of the data. Dataframes combine RDDs with Schema and this small addition makes them very powerful.
For demonstration purpose, we will create a table with historical stock data in MySQL.
You can click here to download the dataset.
We know that DataFrame API was introduced in Spark 1.3 and that Spark RDDs are a unit of computation in Spark and doesn’t contain any schema information about the structure of the data. Dataframes combine RDDs with Schema and this small addition makes them very powerful.
For demonstration purpose, we will create a table with historical stock data in MySQL.
You can click here to download the dataset.
Creating a Table in MySQL:
data:image/s3,"s3://crabby-images/86784/8678442df203967dcb12e81ac343d8cfde80ea40" alt="creating table"
Loading the Data into MySQL using Load data in File Command:
data:image/s3,"s3://crabby-images/2e412/2e412388ffc93497769fc3e5abef8177038a2b9c" alt="load data"
Checking the Records:
data:image/s3,"s3://crabby-images/a0098/a0098eda04c69c6b2400cb35ca4185dd4de89c1f" alt="checking records"
Add MySQL connector/driver to the pyspark path shell using the driver-class-path argument.
To download MySQL connector Click here
Logging into pyspark Shell:
data:image/s3,"s3://crabby-images/d577e/d577ec76d0a53dbd82cc177f6ac709a417fe6f75" alt="logging to pyspark shell"
Creating the URL String with MySQL Login Details:
Note: The URL should contain three parts – 1. database URL 2. user name 3. password (in this case, I don’t have any root user password so I’m not mentioning it.)
data:image/s3,"s3://crabby-images/ddf1f/ddf1f70cde10410bf94ae9388ed79fb1c069266b" alt="checking url"
data:image/s3,"s3://crabby-images/ddf1f/ddf1f70cde10410bf94ae9388ed79fb1c069266b" alt="checking url"
Checking the Type of stock_data Variable:
data:image/s3,"s3://crabby-images/46846/46846cc591f12fdf461798d29b6fc5a6e03d86b2" alt="checking stock data"
We can access the data from the created dataframe in two ways:
- Using DataFrame methods.
- We can register the Dataframe as a Table and then query it.
Let’s look at each of them in detail.
Using DataFrame Methods:
To check the schema, we can use the PrintSchema method.
data:image/s3,"s3://crabby-images/7f96f/7f96f95e1a453cd852c0018ef4643923aeb735ab" alt="using Dataframe method"
Select the symbols from stock_data.
Output:
data:image/s3,"s3://crabby-images/d315e/d315e43a929c9626a149fef02ffee5c6a094e32f" alt="symbols for stock data"
data:image/s3,"s3://crabby-images/7f96f/7f96f95e1a453cd852c0018ef4643923aeb735ab" alt="using Dataframe method"
Select the symbols from stock_data.
data:image/s3,"s3://crabby-images/b3506/b3506776af2922a911ee4495f2a599e89f233605" alt="select symbols for stock data"
data:image/s3,"s3://crabby-images/d315e/d315e43a929c9626a149fef02ffee5c6a094e32f" alt="symbols for stock data"
Registering DataFrame as a Table:
First, register the DataFrame as a table so that we can query the table normally.
data:image/s3,"s3://crabby-images/5abc4/5abc43e0ba41dca2c80a1a7da49b441115101be6" alt="registering dataframe as a table"
Query 1: Select maximum volumes for stock symbol A.
data:image/s3,"s3://crabby-images/4e5b5/4e5b5107959e95ef730de8c873c1098d27d7374b" alt="query 1"
Output:
data:image/s3,"s3://crabby-images/0c85d/0c85d55a3b85dae47132dedb659e8cc983fc08b5" alt="output 1"
Query 2: Select maximum value of volumes for each stock symbol.
data:image/s3,"s3://crabby-images/d45f9/d45f9a087906f15aaf862ddc9961c32a30e51d8b" alt="query 2"
Output:
data:image/s3,"s3://crabby-images/fa275/fa275291899cd57ef1f414a792c310490d05178e" alt="output 2"
data:image/s3,"s3://crabby-images/5abc4/5abc43e0ba41dca2c80a1a7da49b441115101be6" alt="registering dataframe as a table"
Query 1: Select maximum volumes for stock symbol A.
data:image/s3,"s3://crabby-images/4e5b5/4e5b5107959e95ef730de8c873c1098d27d7374b" alt="query 1"
Output:
data:image/s3,"s3://crabby-images/0c85d/0c85d55a3b85dae47132dedb659e8cc983fc08b5" alt="output 1"
Query 2: Select maximum value of volumes for each stock symbol.
data:image/s3,"s3://crabby-images/d45f9/d45f9a087906f15aaf862ddc9961c32a30e51d8b" alt="query 2"
Output:
data:image/s3,"s3://crabby-images/fa275/fa275291899cd57ef1f414a792c310490d05178e" alt="output 2"
No comments:
Post a Comment