Skip to content

PySpark

Installation

pip install "toolfront[pyspark]"

Connection Parameters

Connect using the Database.from_pyspark() method with parameters:

from toolfront import Database
from pyspark.sql import SparkSession

# Create PySpark SparkSession
session = SparkSession.builder.getOrCreate()

db = Database.from_pyspark(
    session=session,
    mode="batch"
)

revenue = db.ask("What's our total revenue this month?")
PARAMETER DESCRIPTION
session

A SparkSession instance.

TYPE: SparkSession DEFAULT: None

mode

Can be either "batch" or "streaming". If "batch", every source, sink, and query executed within this connection will be interpreted as a batch workload. If "streaming", every source, sink, and query executed within this connection will be interpreted as a streaming workload. Default is 'batch'.

TYPE: str DEFAULT: 'batch'

match_schema

Regex pattern to filter schemas. Mutually exclusive with match_tables.

TYPE: str DEFAULT: None

match_tables

Regex pattern to filter tables. Mutually exclusive with match_schema.

TYPE: str DEFAULT: None

**kwargs

Additional keyword arguments used to configure the SparkSession.

TYPE: Any DEFAULT: {}