kandi background
Explore Kits

spark | simple expressive web framework

 by   perwendel Java Version: Current License: Apache-2.0

 by   perwendel Java Version: Current License: Apache-2.0

Download this library from

kandi X-RAY | spark Summary

spark is a Java library typically used in Big Data, Spark applications. spark has no bugs, it has build file available, it has a Permissive License and it has medium support. However spark has 5 vulnerabilities. You can download it from GitHub, Maven.
Spark - a tiny web framework for Java 8.
Support
Support
Quality
Quality
Security
Security
License
License
Reuse
Reuse

kandi-support Support

  • spark has a medium active ecosystem.
  • It has 9141 star(s) with 1525 fork(s). There are 411 watchers for this library.
  • It had no major release in the last 12 months.
  • There are 182 open issues and 544 have been closed. On average issues are closed in 222 days. There are 59 open pull requests and 0 closed requests.
  • It has a neutral sentiment in the developer community.
  • The latest version of spark is current.
spark Support
Best in #Java
Average in #Java
spark Support
Best in #Java
Average in #Java

quality kandi Quality

  • spark has 0 bugs and 0 code smells.
spark Quality
Best in #Java
Average in #Java
spark Quality
Best in #Java
Average in #Java

securitySecurity

  • spark has 5 vulnerability issues reported (1 critical, 2 high, 2 medium, 0 low).
  • spark code analysis shows 0 unresolved vulnerabilities.
  • There are 0 security hotspots that need review.
spark Security
Best in #Java
Average in #Java
spark Security
Best in #Java
Average in #Java

license License

  • spark is licensed under the Apache-2.0 License. This license is Permissive.
  • Permissive licenses have the least restrictions, and you can use them in most projects.
spark License
Best in #Java
Average in #Java
spark License
Best in #Java
Average in #Java

buildReuse

  • spark releases are not available. You will need to build from source code and install.
  • Deployable package is available in Maven.
  • Build file is available. You can build the component from source.
  • Installation instructions are not available. Examples and code snippets are available.
spark Reuse
Best in #Java
Average in #Java
spark Reuse
Best in #Java
Average in #Java
Top functions reviewed by kandi - BETA

kandi has reviewed spark and discovered the below as its top functions. This is intended to give you an instant insight into spark implemented functionality, and help decide if they suit your requirements.

  • Performs canonical version of path .
  • Starts Jetty server .
  • Check if the path matches
  • Decodes an ISO 8601 path .
  • Resolves a class by its name .
  • Clean the given path .
  • Concatenates two paths
  • Creates a ssl socket connector .
  • Execute the request .
  • Append a byte to the stream .

spark Key Features

A simple expressive web framework for java. Spark has a kotlin DSL https://github.com/perwendel/spark-kotlin

default

copy iconCopydownload iconDownload
<dependency>
    <groupId>com.sparkjava</groupId>
    <artifactId>spark-core</artifactId>
    <version>2.9.3</version>
</dependency>

Why joining structure-identic dataframes gives different results?

copy iconCopydownload iconDownload
df3 = df2.alias('df2').join(df1.alias('df1'), (F.col('df1.c1') == F.col('df2.c2')), 'full')
df3.show()

# Output
# +----+------+----+----+---+------+----+---+
# |  ID|Status|  c1|  c2| ID|Status|  c1| c2|
# +----+------+----+----+---+------+----+---+
# |   4|    ok|null|   A|  1|   bad|   A|  A|
# |null|  null|null|null|  4|    ok|null|  A|
# +----+------+----+----+---+------+----+---+
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')
-----------------------
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
*(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
+- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
   +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, FullOuter, (c1#2 = A)
:- *(1) Project [ID#0L, Status#1, c1#2, A AS c2#6]
:  +- *(1) Filter (isnotnull(Status#1) AND (Status#1 = ok))
:     +- *(1) Scan ExistingRDD[ID#0L,Status#1,c1#2]
+- BroadcastExchange IdentityBroadcastMode, [id=#75]
   +- *(2) Project [ID#46L, Status#47, c1#48, A AS c2#45]
      +- *(2) Scan ExistingRDD[ID#46L,Status#47,c1#48]
+----+------+----+----+----+------+----+----+
|  ID|Status|  c1|  c2|  ID|Status|  c1|  c2|
+----+------+----+----+----+------+----+----+
|   4|    ok|null|   A|null|  null|null|null|
|null|  null|null|null|   1|   bad|   A|   A|
|null|  null|null|null|   4|    ok|null|   A|
+----+------+----+----+----+------+----+----+
== Physical Plan ==
*(1) Scan ExistingRDD[ID#98L,Status#99,c1#100,c2#101]
== Physical Plan ==
*(1) Filter (isnotnull(Status#124) AND (Status#124 = ok))
+- *(1) Scan ExistingRDD[ID#123L,Status#124,c1#125,c2#126]
df3 = df1.join(df2, (df1.c1 == df2.c2), 'full')

AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks'>

copy iconCopydownload iconDownload
import numpy as np 
import pandas as pd
df =pd.DataFrame(np.random.rand(3,6))

with open("dump_from_v1.3.4.pickle", "wb") as f: 
    pickle.dump(df, f) 

quit()
import pickle

with open("dump_from_v1.3.4.pickle", "rb") as f: 
    df = pickle.load(f) 


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-ff5c218eca92> in <module>
      1 with open("dump_from_v1.3.4.pickle", "rb") as f:
----> 2     df = pickle.load(f)
      3 

AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py'>
-----------------------
import numpy as np 
import pandas as pd
df =pd.DataFrame(np.random.rand(3,6))

with open("dump_from_v1.3.4.pickle", "wb") as f: 
    pickle.dump(df, f) 

quit()
import pickle

with open("dump_from_v1.3.4.pickle", "rb") as f: 
    df = pickle.load(f) 


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-ff5c218eca92> in <module>
      1 with open("dump_from_v1.3.4.pickle", "rb") as f:
----> 2     df = pickle.load(f)
      3 

AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py'>

Problems when writing parquet with timestamps prior to 1900 in AWS Glue 3.0

copy iconCopydownload iconDownload
sc = SparkContext()
# Get current sparkconf which is set by glue
conf = sc.getConf()
# add additional spark configurations
conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "CORRECTED")
conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED")
conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
# Restart spark context
sc.stop()
sc = SparkContext.getOrCreate(conf=conf)
# create glue context with the restarted sc
glueContext = GlueContext(sc)

Cannot find conda info. Please verify your conda installation on EMR

copy iconCopydownload iconDownload
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh  -O /home/hadoop/miniconda.sh \
    && /bin/bash ~/miniconda.sh -b -p $HOME/conda

echo -e '\n export PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc


conda config --set always_yes yes --set changeps1 no
conda config -f --add channels conda-forge


conda create -n zoo python=3.7 # "zoo" is conda environment name
conda init bash
source activate zoo
conda install python 3.7.0 -c conda-forge orca 
sudo /home/hadoop/conda/envs/zoo/bin/python3.7 -m pip install virtualenv
“spark.pyspark.python": "/home/hadoop/conda/envs/zoo/bin/python3",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type":"native",
"spark.pyspark.virtualenv.bin.path":"/home/hadoop/conda/envs/zoo/bin/,
"zeppelin.pyspark.python" : "/home/hadoop/conda/bin/python",
"zeppelin.python": "/home/hadoop/conda/bin/python"
-----------------------
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh  -O /home/hadoop/miniconda.sh \
    && /bin/bash ~/miniconda.sh -b -p $HOME/conda

echo -e '\n export PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc


conda config --set always_yes yes --set changeps1 no
conda config -f --add channels conda-forge


conda create -n zoo python=3.7 # "zoo" is conda environment name
conda init bash
source activate zoo
conda install python 3.7.0 -c conda-forge orca 
sudo /home/hadoop/conda/envs/zoo/bin/python3.7 -m pip install virtualenv
“spark.pyspark.python": "/home/hadoop/conda/envs/zoo/bin/python3",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type":"native",
"spark.pyspark.virtualenv.bin.path":"/home/hadoop/conda/envs/zoo/bin/,
"zeppelin.pyspark.python" : "/home/hadoop/conda/bin/python",
"zeppelin.python": "/home/hadoop/conda/bin/python"

How to set Docker Compose `env_file` relative to `.yml` file when multiple `--file` option is used?

copy iconCopydownload iconDownload
  env_file:
    - ${BACKEND_BASE:-.}/.env

Read spark data with column that clashes with partition name

copy iconCopydownload iconDownload
df= spark.read.json("s3://bucket/table/**/*.json")

renamedDF= df.withColumnRenamed("old column name","new column name")
-----------------------
from pyspark.sql import functions as F

Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
conf = sc._jsc.hadoopConfiguration()

s3_path = "s3://bucket/prefix"
file_cols = ["id", "color", "date"]
partitions_cols = ["company", "service", "date"]

# listing all files for input path
json_files = []
files = Path(s3_path).getFileSystem(conf).listFiles(Path(s3_path), True)

while files.hasNext():
    path = files.next().getPath()
    if path.getName().endswith(".json"):
        json_files.append(path.toString())

df = spark.read.json(json_files) # you can pass here the schema of the files without the partition columns

# renaming file column in if exists in partitions
df = df.select(*[
    F.col(c).alias(c) if c not in partitions_cols else F.col(c).alias(f"file_{c}")
    for c in df.columns
])

# parse partitions from filenames
for p in partitions_cols:
    df = df.withColumn(p, F.regexp_extract(F.input_file_name(), f"/{p}=([^/]+)/", 1))

df.show()

#+-----+----------+---+-------+-------+----------+
#|color| file_date| id|company|service|      date|
#+-----+----------+---+-------+-------+----------+
#|green|2021-08-08|baz|   abcd|    xyz|2021-01-01|
#| blue|2021-12-12|foo|   abcd|    xyz|2021-01-01|
#|  red|2021-10-10|bar|   abcd|    xyz|2021-01-01|
#+-----+----------+---+-------+-------+----------+
-----------------------
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, DateType

# schema of json files
schema = StructType([
    StructField('id', StringType(), True),
    StructField('color', StringType(), True),
    StructField('date', DateType(), True)
])

df = sparkSession.read.text('resources') \
    .withColumnRenamed('date', 'partition_date') \
    .withColumn('json', F.from_json(F.col('value'), schema)) \
    .select('company', 'service', 'partition_date', 'json.*') \
    .withColumnRenamed('date', 'file_date') \
    .withColumnRenamed('partition_date', 'date')
{"id": "foo", "color": "blue", "date": "2021-12-12"}
{"id": "bar", "color": "red", "date": "2021-12-13"}
{"id": "kix", "color": "yellow", "date": "2021-12-14"}
{"id": "kaz", "color": "blue", "date": "2021-12-15"}
{"id": "dir", "color": "red", "date": "2021-12-16"}
{"id": "tux", "color": "yellow", "date": "2021-12-17"}
+-------+-------+----------+---+------+----------+
|company|service|      date| id| color| file_date|
+-------+-------+----------+---+------+----------+
|   abcd|    xyz|2021-01-01|kaz|  blue|2021-12-15|
|   abcd|    xyz|2021-01-01|dir|   red|2021-12-16|
|   abcd|    xyz|2021-01-01|tux|yellow|2021-12-17|
|   abcd|    xyz|2021-01-01|foo|  blue|2021-12-12|
|   abcd|    xyz|2021-01-01|bar|   red|2021-12-13|
|   abcd|    xyz|2021-01-01|kix|yellow|2021-12-14|
+-------+-------+----------+---+------+----------+
-----------------------
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, DateType

# schema of json files
schema = StructType([
    StructField('id', StringType(), True),
    StructField('color', StringType(), True),
    StructField('date', DateType(), True)
])

df = sparkSession.read.text('resources') \
    .withColumnRenamed('date', 'partition_date') \
    .withColumn('json', F.from_json(F.col('value'), schema)) \
    .select('company', 'service', 'partition_date', 'json.*') \
    .withColumnRenamed('date', 'file_date') \
    .withColumnRenamed('partition_date', 'date')
{"id": "foo", "color": "blue", "date": "2021-12-12"}
{"id": "bar", "color": "red", "date": "2021-12-13"}
{"id": "kix", "color": "yellow", "date": "2021-12-14"}
{"id": "kaz", "color": "blue", "date": "2021-12-15"}
{"id": "dir", "color": "red", "date": "2021-12-16"}
{"id": "tux", "color": "yellow", "date": "2021-12-17"}
+-------+-------+----------+---+------+----------+
|company|service|      date| id| color| file_date|
+-------+-------+----------+---+------+----------+
|   abcd|    xyz|2021-01-01|kaz|  blue|2021-12-15|
|   abcd|    xyz|2021-01-01|dir|   red|2021-12-16|
|   abcd|    xyz|2021-01-01|tux|yellow|2021-12-17|
|   abcd|    xyz|2021-01-01|foo|  blue|2021-12-12|
|   abcd|    xyz|2021-01-01|bar|   red|2021-12-13|
|   abcd|    xyz|2021-01-01|kix|yellow|2021-12-14|
+-------+-------+----------+---+------+----------+
-----------------------
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, DateType

# schema of json files
schema = StructType([
    StructField('id', StringType(), True),
    StructField('color', StringType(), True),
    StructField('date', DateType(), True)
])

df = sparkSession.read.text('resources') \
    .withColumnRenamed('date', 'partition_date') \
    .withColumn('json', F.from_json(F.col('value'), schema)) \
    .select('company', 'service', 'partition_date', 'json.*') \
    .withColumnRenamed('date', 'file_date') \
    .withColumnRenamed('partition_date', 'date')
{"id": "foo", "color": "blue", "date": "2021-12-12"}
{"id": "bar", "color": "red", "date": "2021-12-13"}
{"id": "kix", "color": "yellow", "date": "2021-12-14"}
{"id": "kaz", "color": "blue", "date": "2021-12-15"}
{"id": "dir", "color": "red", "date": "2021-12-16"}
{"id": "tux", "color": "yellow", "date": "2021-12-17"}
+-------+-------+----------+---+------+----------+
|company|service|      date| id| color| file_date|
+-------+-------+----------+---+------+----------+
|   abcd|    xyz|2021-01-01|kaz|  blue|2021-12-15|
|   abcd|    xyz|2021-01-01|dir|   red|2021-12-16|
|   abcd|    xyz|2021-01-01|tux|yellow|2021-12-17|
|   abcd|    xyz|2021-01-01|foo|  blue|2021-12-12|
|   abcd|    xyz|2021-01-01|bar|   red|2021-12-13|
|   abcd|    xyz|2021-01-01|kix|yellow|2021-12-14|
+-------+-------+----------+---+------+----------+
-----------------------
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, DateType

# schema of json files
schema = StructType([
    StructField('id', StringType(), True),
    StructField('color', StringType(), True),
    StructField('date', DateType(), True)
])

df = sparkSession.read.text('resources') \
    .withColumnRenamed('date', 'partition_date') \
    .withColumn('json', F.from_json(F.col('value'), schema)) \
    .select('company', 'service', 'partition_date', 'json.*') \
    .withColumnRenamed('date', 'file_date') \
    .withColumnRenamed('partition_date', 'date')
{"id": "foo", "color": "blue", "date": "2021-12-12"}
{"id": "bar", "color": "red", "date": "2021-12-13"}
{"id": "kix", "color": "yellow", "date": "2021-12-14"}
{"id": "kaz", "color": "blue", "date": "2021-12-15"}
{"id": "dir", "color": "red", "date": "2021-12-16"}
{"id": "tux", "color": "yellow", "date": "2021-12-17"}
+-------+-------+----------+---+------+----------+
|company|service|      date| id| color| file_date|
+-------+-------+----------+---+------+----------+
|   abcd|    xyz|2021-01-01|kaz|  blue|2021-12-15|
|   abcd|    xyz|2021-01-01|dir|   red|2021-12-16|
|   abcd|    xyz|2021-01-01|tux|yellow|2021-12-17|
|   abcd|    xyz|2021-01-01|foo|  blue|2021-12-12|
|   abcd|    xyz|2021-01-01|bar|   red|2021-12-13|
|   abcd|    xyz|2021-01-01|kix|yellow|2021-12-14|
+-------+-------+----------+---+------+----------+

How do I parse xml documents in Palantir Foundry?

copy iconCopydownload iconDownload
buildscript {
    repositories {
       // some other things
    }

    dependencies {
        classpath "com.palantir.transforms.python:lang-python-gradle-plugin:${transformsLangPythonPluginVersion}"
    }
}

apply plugin: 'com.palantir.transforms.lang.python'
apply plugin: 'com.palantir.transforms.lang.python-defaults'

dependencies {
    condaJars "com.databricks:spark-xml_2.13:0.14.0"
}

// Apply the testing plugin
apply plugin: 'com.palantir.transforms.lang.pytest-defaults'

// ... some other awesome features you should enable
from transforms.api import transform, Output, Input
from transforms.verbs.dataframes import union_many


def read_files(spark_session, paths):
    parsed_dfs = []
    for file_name in paths:
        parsed_df = spark_session.read.format('xml').options(rowTag="tag").load(file_name)
        parsed_dfs += [parsed_df]
    output_df = union_many(*parsed_dfs, how="wide")
    return output_df


@transform(
    the_output=Output("my.awesome.output"),
    the_input=Input("my.awesome.input"),
)
def my_compute_function(the_input, the_output, ctx):
    session = ctx.spark_session
    input_filesystem = the_input.filesystem()
    hadoop_path = input_filesystem.hadoop_path
    files = [hadoop_path + "/" + file_name.path for file_name in input_filesystem.ls()]
    output_df = read_files(session, files)
    the_output.write_dataframe(output_df)
<tag>
<field1>
my_value
</field1>
</tag>
from myproject.datasets import xml_parse_transform
from pkg_resources import resource_filename


def test_parse_xml(spark_session):
    file_path = resource_filename(__name__, "sample.xml")
    parsed_df = xml_parse_transform.read_files(spark_session, [file_path])
    assert parsed_df.count() == 1
    assert set(parsed_df.columns) == {"field1"}
-----------------------
buildscript {
    repositories {
       // some other things
    }

    dependencies {
        classpath "com.palantir.transforms.python:lang-python-gradle-plugin:${transformsLangPythonPluginVersion}"
    }
}

apply plugin: 'com.palantir.transforms.lang.python'
apply plugin: 'com.palantir.transforms.lang.python-defaults'

dependencies {
    condaJars "com.databricks:spark-xml_2.13:0.14.0"
}

// Apply the testing plugin
apply plugin: 'com.palantir.transforms.lang.pytest-defaults'

// ... some other awesome features you should enable
from transforms.api import transform, Output, Input
from transforms.verbs.dataframes import union_many


def read_files(spark_session, paths):
    parsed_dfs = []
    for file_name in paths:
        parsed_df = spark_session.read.format('xml').options(rowTag="tag").load(file_name)
        parsed_dfs += [parsed_df]
    output_df = union_many(*parsed_dfs, how="wide")
    return output_df


@transform(
    the_output=Output("my.awesome.output"),
    the_input=Input("my.awesome.input"),
)
def my_compute_function(the_input, the_output, ctx):
    session = ctx.spark_session
    input_filesystem = the_input.filesystem()
    hadoop_path = input_filesystem.hadoop_path
    files = [hadoop_path + "/" + file_name.path for file_name in input_filesystem.ls()]
    output_df = read_files(session, files)
    the_output.write_dataframe(output_df)
<tag>
<field1>
my_value
</field1>
</tag>
from myproject.datasets import xml_parse_transform
from pkg_resources import resource_filename


def test_parse_xml(spark_session):
    file_path = resource_filename(__name__, "sample.xml")
    parsed_df = xml_parse_transform.read_files(spark_session, [file_path])
    assert parsed_df.count() == 1
    assert set(parsed_df.columns) == {"field1"}
-----------------------
buildscript {
    repositories {
       // some other things
    }

    dependencies {
        classpath "com.palantir.transforms.python:lang-python-gradle-plugin:${transformsLangPythonPluginVersion}"
    }
}

apply plugin: 'com.palantir.transforms.lang.python'
apply plugin: 'com.palantir.transforms.lang.python-defaults'

dependencies {
    condaJars "com.databricks:spark-xml_2.13:0.14.0"
}

// Apply the testing plugin
apply plugin: 'com.palantir.transforms.lang.pytest-defaults'

// ... some other awesome features you should enable
from transforms.api import transform, Output, Input
from transforms.verbs.dataframes import union_many


def read_files(spark_session, paths):
    parsed_dfs = []
    for file_name in paths:
        parsed_df = spark_session.read.format('xml').options(rowTag="tag").load(file_name)
        parsed_dfs += [parsed_df]
    output_df = union_many(*parsed_dfs, how="wide")
    return output_df


@transform(
    the_output=Output("my.awesome.output"),
    the_input=Input("my.awesome.input"),
)
def my_compute_function(the_input, the_output, ctx):
    session = ctx.spark_session
    input_filesystem = the_input.filesystem()
    hadoop_path = input_filesystem.hadoop_path
    files = [hadoop_path + "/" + file_name.path for file_name in input_filesystem.ls()]
    output_df = read_files(session, files)
    the_output.write_dataframe(output_df)
<tag>
<field1>
my_value
</field1>
</tag>
from myproject.datasets import xml_parse_transform
from pkg_resources import resource_filename


def test_parse_xml(spark_session):
    file_path = resource_filename(__name__, "sample.xml")
    parsed_df = xml_parse_transform.read_files(spark_session, [file_path])
    assert parsed_df.count() == 1
    assert set(parsed_df.columns) == {"field1"}
-----------------------
buildscript {
    repositories {
       // some other things
    }

    dependencies {
        classpath "com.palantir.transforms.python:lang-python-gradle-plugin:${transformsLangPythonPluginVersion}"
    }
}

apply plugin: 'com.palantir.transforms.lang.python'
apply plugin: 'com.palantir.transforms.lang.python-defaults'

dependencies {
    condaJars "com.databricks:spark-xml_2.13:0.14.0"
}

// Apply the testing plugin
apply plugin: 'com.palantir.transforms.lang.pytest-defaults'

// ... some other awesome features you should enable
from transforms.api import transform, Output, Input
from transforms.verbs.dataframes import union_many


def read_files(spark_session, paths):
    parsed_dfs = []
    for file_name in paths:
        parsed_df = spark_session.read.format('xml').options(rowTag="tag").load(file_name)
        parsed_dfs += [parsed_df]
    output_df = union_many(*parsed_dfs, how="wide")
    return output_df


@transform(
    the_output=Output("my.awesome.output"),
    the_input=Input("my.awesome.input"),
)
def my_compute_function(the_input, the_output, ctx):
    session = ctx.spark_session
    input_filesystem = the_input.filesystem()
    hadoop_path = input_filesystem.hadoop_path
    files = [hadoop_path + "/" + file_name.path for file_name in input_filesystem.ls()]
    output_df = read_files(session, files)
    the_output.write_dataframe(output_df)
<tag>
<field1>
my_value
</field1>
</tag>
from myproject.datasets import xml_parse_transform
from pkg_resources import resource_filename


def test_parse_xml(spark_session):
    file_path = resource_filename(__name__, "sample.xml")
    parsed_df = xml_parse_transform.read_files(spark_session, [file_path])
    assert parsed_df.count() == 1
    assert set(parsed_df.columns) == {"field1"}

docker build vue3 not compatible with element-ui on node:16-buster-slim

copy iconCopydownload iconDownload
...
COPY package.json /home
RUN npm config set legacy-peer-deps true
RUN npm install --prefix /home

Why is repartition faster than partitionBy in Spark?

copy iconCopydownload iconDownload
spark.range(1000).withColumn("partition", 'id % 100)
    .repartition('partition).write.csv("/tmp/test.csv")
spark.range(1000).withColumn("partition", 'id % 100)
    .write.partitionBy("partition").csv("/tmp/test2.csv")
-----------------------
spark.range(1000).withColumn("partition", 'id % 100)
    .repartition('partition).write.csv("/tmp/test.csv")
spark.range(1000).withColumn("partition", 'id % 100)
    .write.partitionBy("partition").csv("/tmp/test2.csv")
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \
-----------------------
df = spark.read.format("xml") \
  .options(rowTag="DeviceData") \
  .load(file_path, schema=meter_data) \
.withColumn("partition", hash(col("_DeviceName")).cast("Long") % num_partitions) \
.repartition("partition") \
.write.format("json") \
.write.format("json") \
.partitionBy("partition") \
output_path + "\partition=0\"
output_path + "\partition=1\"
output_path + "\partition=99\"
.coalesce(num_partitions) \
.write.format("json") \
.partitionBy("partition") \
.repartition("partition") \
.write.format("json") \
.partitionBy("partition") \

Get difference between two version of delta lake table

copy iconCopydownload iconDownload
import uk.co.gresearch.spark.diff.DatasetDiff

df1.diff(df2)
-----------------------
val lastVersion = DeltaTable.forPath(spark, PATH_TO_DELTA_TABLE)
    .history()
    .select(col("version"))
    .collect.toList
    .headOption
    .getOrElse(throw new Exception("Is this table empty ?"))
val addPathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"add is not null")
    .select(s"add.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
val removePathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"remove is not null")
    .select(s"remove.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
import org.apache.spark.sql.functions._
val addDF = spark
  .read
  .format("parquet")
  .load(addPathList: _*)
  .withColumn("add_remove", lit("add"))
val removeDF = spark
  .read
  .format("parquet")
  .load(removePathList: _*)
  .withColumn("add_remove", lit("remove"))
addDF.union(removeDF).show()


+----------+----------+
|updatedate|add_remove|
+----------+----------+
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
+----------+----------+
only showing top 20 rows
-----------------------
val lastVersion = DeltaTable.forPath(spark, PATH_TO_DELTA_TABLE)
    .history()
    .select(col("version"))
    .collect.toList
    .headOption
    .getOrElse(throw new Exception("Is this table empty ?"))
val addPathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"add is not null")
    .select(s"add.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
val removePathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"remove is not null")
    .select(s"remove.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
import org.apache.spark.sql.functions._
val addDF = spark
  .read
  .format("parquet")
  .load(addPathList: _*)
  .withColumn("add_remove", lit("add"))
val removeDF = spark
  .read
  .format("parquet")
  .load(removePathList: _*)
  .withColumn("add_remove", lit("remove"))
addDF.union(removeDF).show()


+----------+----------+
|updatedate|add_remove|
+----------+----------+
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
+----------+----------+
only showing top 20 rows
-----------------------
val lastVersion = DeltaTable.forPath(spark, PATH_TO_DELTA_TABLE)
    .history()
    .select(col("version"))
    .collect.toList
    .headOption
    .getOrElse(throw new Exception("Is this table empty ?"))
val addPathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"add is not null")
    .select(s"add.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
val removePathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"remove is not null")
    .select(s"remove.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
import org.apache.spark.sql.functions._
val addDF = spark
  .read
  .format("parquet")
  .load(addPathList: _*)
  .withColumn("add_remove", lit("add"))
val removeDF = spark
  .read
  .format("parquet")
  .load(removePathList: _*)
  .withColumn("add_remove", lit("remove"))
addDF.union(removeDF).show()


+----------+----------+
|updatedate|add_remove|
+----------+----------+
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
+----------+----------+
only showing top 20 rows
-----------------------
val lastVersion = DeltaTable.forPath(spark, PATH_TO_DELTA_TABLE)
    .history()
    .select(col("version"))
    .collect.toList
    .headOption
    .getOrElse(throw new Exception("Is this table empty ?"))
val addPathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"add is not null")
    .select(s"add.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
val removePathList = spark
    .read
    .json(s"ROOT_PATH/_delta_log/0000NUMVERSION.json")
    .where(s"remove is not null")
    .select(s"remove.path")
    .collect()
    .map(path => formatPath(path.toString))
    .toList
import org.apache.spark.sql.functions._
val addDF = spark
  .read
  .format("parquet")
  .load(addPathList: _*)
  .withColumn("add_remove", lit("add"))
val removeDF = spark
  .read
  .format("parquet")
  .load(removePathList: _*)
  .withColumn("add_remove", lit("remove"))
addDF.union(removeDF).show()


+----------+----------+
|updatedate|add_remove|
+----------+----------+
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
|      null|       add|
+----------+----------+
only showing top 20 rows

Community Discussions

Trending Discussions on spark
  • spark-shell throws java.lang.reflect.InvocationTargetException on running
  • Why joining structure-identic dataframes gives different results?
  • AttributeError: Can't get attribute 'new_block' on &lt;module 'pandas.core.internals.blocks'&gt;
  • Problems when writing parquet with timestamps prior to 1900 in AWS Glue 3.0
  • NoSuchMethodError on com.fasterxml.jackson.dataformat.xml.XmlMapper.coercionConfigDefaults()
  • Cannot find conda info. Please verify your conda installation on EMR
  • How to set Docker Compose `env_file` relative to `.yml` file when multiple `--file` option is used?
  • Read spark data with column that clashes with partition name
  • How do I parse xml documents in Palantir Foundry?
  • docker build vue3 not compatible with element-ui on node:16-buster-slim
Trending Discussions on spark

QUESTION

spark-shell throws java.lang.reflect.InvocationTargetException on running

Asked 2022-Apr-01 at 19:53

When I execute run-example SparkPi, for example, it works perfectly, but when I run spark-shell, it throws these exceptions:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/C:/big_data/spark-3.2.0-bin-hadoop3.2-scala2.13/jars/spark-unsafe_2.13-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Scala version 2.13.5 (OpenJDK 64-Bit Server VM, Java 11.0.9.1)
Type in expressions to have them evaluated.
Type :help for more information.
21/12/11 19:28:36 ERROR SparkContext: Error initializing SparkContext.
java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at org.apache.spark.executor.Executor.addReplClassLoaderIfNeeded(Executor.scala:909)
        at org.apache.spark.executor.Executor.<init>(Executor.scala:160)
        at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
        at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
        at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
        at scala.Option.getOrElse(Option.scala:201)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
        at org.apache.spark.repl.Main$.createSparkSession(Main.scala:114)
        at $line3.$read$$iw.<init>(<console>:5)
        at $line3.$read.<init>(<console>:4)
        at $line3.$read$.<clinit>(<console>)
        at $line3.$eval$.$print$lzycompute(<synthetic>:6)
        at $line3.$eval$.$print(<synthetic>:5)
        at $line3.$eval.$print(<synthetic>)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:670)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1006)
        at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$1(IMain.scala:506)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:43)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:505)
        at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$3(IMain.scala:519)
        at scala.tools.nsc.interpreter.IMain.doInterpret(IMain.scala:519)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:501)
        at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216)
        at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64)
        at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$interpretPreamble$1(ILoop.scala:924)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at scala.tools.nsc.interpreter.shell.ILoop.interpretPreamble(ILoop.scala:924)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$3(ILoop.scala:963)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.shell.ILoop.echoOff(ILoop.scala:90)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$2(ILoop.scala:963)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.IMain.withSuppressedSettings(IMain.scala:1406)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$1(ILoop.scala:954)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64)
        at scala.tools.nsc.interpreter.shell.ILoop.run(ILoop.scala:954)
        at org.apache.spark.repl.Main$.doMain(Main.scala:84)
        at org.apache.spark.repl.Main$.main(Main.scala:59)
        at org.apache.spark.repl.Main.main(Main.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Illegal character in path at index 42: spark://DESKTOP-JO73CF4.mshome.net:2103/C:\classes
        at java.base/java.net.URI$Parser.fail(URI.java:2913)
        at java.base/java.net.URI$Parser.checkChars(URI.java:3084)
        at java.base/java.net.URI$Parser.parseHierarchical(URI.java:3166)
        at java.base/java.net.URI$Parser.parse(URI.java:3114)
        at java.base/java.net.URI.<init>(URI.java:600)
        at org.apache.spark.repl.ExecutorClassLoader.<init>(ExecutorClassLoader.scala:57)
        ... 67 more
21/12/11 19:28:36 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
        at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:173)
        at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
        at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:927)
        at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2516)
        at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2086)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1442)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:2086)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:677)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
        at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
        at scala.Option.getOrElse(Option.scala:201)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
        at org.apache.spark.repl.Main$.createSparkSession(Main.scala:114)
        at $line3.$read$$iw.<init>(<console>:5)
        at $line3.$read.<init>(<console>:4)
        at $line3.$read$.<clinit>(<console>)
        at $line3.$eval$.$print$lzycompute(<synthetic>:6)
        at $line3.$eval$.$print(<synthetic>:5)
        at $line3.$eval.$print(<synthetic>)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:670)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1006)
        at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$1(IMain.scala:506)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:43)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:505)
        at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$3(IMain.scala:519)
        at scala.tools.nsc.interpreter.IMain.doInterpret(IMain.scala:519)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:501)
        at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216)
        at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64)
        at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$interpretPreamble$1(ILoop.scala:924)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at scala.tools.nsc.interpreter.shell.ILoop.interpretPreamble(ILoop.scala:924)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$3(ILoop.scala:963)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.shell.ILoop.echoOff(ILoop.scala:90)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$2(ILoop.scala:963)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.IMain.withSuppressedSettings(IMain.scala:1406)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$1(ILoop.scala:954)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64)
        at scala.tools.nsc.interpreter.shell.ILoop.run(ILoop.scala:954)
        at org.apache.spark.repl.Main$.doMain(Main.scala:84)
        at org.apache.spark.repl.Main$.main(Main.scala:59)
        at org.apache.spark.repl.Main.main(Main.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/12/11 19:28:36 WARN MetricsSystem: Stopping a MetricsSystem that is not running
21/12/11 19:28:36 ERROR Main: Failed to initialize Spark session.
java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at org.apache.spark.executor.Executor.addReplClassLoaderIfNeeded(Executor.scala:909)
        at org.apache.spark.executor.Executor.<init>(Executor.scala:160)
        at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
        at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
        at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
        at scala.Option.getOrElse(Option.scala:201)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
        at org.apache.spark.repl.Main$.createSparkSession(Main.scala:114)
        at $line3.$read$$iw.<init>(<console>:5)
        at $line3.$read.<init>(<console>:4)
        at $line3.$read$.<clinit>(<console>)
        at $line3.$eval$.$print$lzycompute(<synthetic>:6)
        at $line3.$eval$.$print(<synthetic>:5)
        at $line3.$eval.$print(<synthetic>)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:670)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1006)
        at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$1(IMain.scala:506)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:43)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:505)
        at scala.tools.nsc.interpreter.IMain.$anonfun$doInterpret$3(IMain.scala:519)
        at scala.tools.nsc.interpreter.IMain.doInterpret(IMain.scala:519)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:503)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:501)
        at scala.tools.nsc.interpreter.IMain.$anonfun$quietRun$1(IMain.scala:216)
        at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64)
        at scala.tools.nsc.interpreter.IMain.quietRun(IMain.scala:216)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$interpretPreamble$1(ILoop.scala:924)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at scala.tools.nsc.interpreter.shell.ILoop.interpretPreamble(ILoop.scala:924)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$3(ILoop.scala:963)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.shell.ILoop.echoOff(ILoop.scala:90)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$2(ILoop.scala:963)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.IMain.withSuppressedSettings(IMain.scala:1406)
        at scala.tools.nsc.interpreter.shell.ILoop.$anonfun$run$1(ILoop.scala:954)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.tools.nsc.interpreter.shell.ReplReporterImpl.withoutPrintingResults(Reporter.scala:64)
        at scala.tools.nsc.interpreter.shell.ILoop.run(ILoop.scala:954)
        at org.apache.spark.repl.Main$.doMain(Main.scala:84)
        at org.apache.spark.repl.Main$.main(Main.scala:59)
        at org.apache.spark.repl.Main.main(Main.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Illegal character in path at index 42: spark://DESKTOP-JO73CF4.mshome.net:2103/C:\classes
        at java.base/java.net.URI$Parser.fail(URI.java:2913)
        at java.base/java.net.URI$Parser.checkChars(URI.java:3084)
        at java.base/java.net.URI$Parser.parseHierarchical(URI.java:3166)
        at java.base/java.net.URI$Parser.parse(URI.java:3114)
        at java.base/java.net.URI.<init>(URI.java:600)
        at org.apache.spark.repl.ExecutorClassLoader.<init>(ExecutorClassLoader.scala:57)
        ... 67 more
21/12/11 19:28:36 ERROR Utils: Uncaught exception in thread shutdown-hook-0
java.lang.ExceptionInInitializerError
        at org.apache.spark.executor.Executor.stop(Executor.scala:333)
        at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2019)
        at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.util.Try$.apply(Try.scala:210)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException
        at org.apache.spark.shuffle.ShuffleBlockPusher$.<clinit>(ShuffleBlockPusher.scala:465)
        ... 16 more
21/12/11 19:28:36 WARN ShutdownHookManager: ShutdownHook '' failed, java.util.concurrent.ExecutionException: java.lang.ExceptionInInitializerError
java.util.concurrent.ExecutionException: java.lang.ExceptionInInitializerError
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
Caused by: java.lang.ExceptionInInitializerError
        at org.apache.spark.executor.Executor.stop(Executor.scala:333)
        at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2019)
        at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
        at scala.util.Try$.apply(Try.scala:210)
        at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException
        at org.apache.spark.shuffle.ShuffleBlockPusher$.<clinit>(ShuffleBlockPusher.scala:465)
        ... 16 more

As I can see it caused by Illegal character in path at index 42: spark://DESKTOP-JO73CF4.mshome.net:2103/C:\classes, but I don't understand what does it mean exactly and how to deal with that

How can I solve this problem?

I use Spark 3.2.0 Pre-built for Apache Hadoop 3.3 and later (Scala 2.13)

JAVA_HOME, HADOOP_HOME, SPARK_HOME path variables are set.

ANSWER

Answered 2022-Jan-07 at 15:11

i face the same problem, i think Spark 3.2 is the problem itself

switched to Spark 3.1.2, it works fine

Source https://stackoverflow.com/questions/70317481

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

CVE-2020-9480 CRITICAL
In Apache Spark 2.4.5 and earlier, a standalone resource manager's master may be configured to require authentication (spark.authenticate) via a shared secret. When enabled, however, a specially-crafted RPC to the master can succeed in starting an application's resources on the Spark cluster, even without the shared key. This can be leveraged to execute shell commands on the host machine. This does not affect Spark clusters using other resource managers (YARN, Mesos, etc).
In all versions of Apache Spark, its standalone resource manager accepts code to execute on a 'master' host, that then runs that code on 'worker' hosts. The master itself does not, by design, execute user code. A specially-crafted request to the master can, however, cause the master to execute code too. Note that this does not affect standalone clusters with authentication enabled. While the master host typically has less outbound access to other resources than a worker, the execution of code on the master is nevertheless unexpected.

Install spark

You can download it from GitHub, Maven.
You can use spark like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the spark component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

DOWNLOAD this Library from

Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

Explore Related Topics

Share this Page

share link
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from
over 430 million Knowledge Items
Find more libraries
Reuse Solution Kits and Libraries Curated by Popular Use Cases

Save this library and start creating your kit

  • © 2022 Open Weaver Inc.