snappy | Snappy compression format in the Go programming language
kandi X-RAY | snappy Summary
Support
Quality
Security
License
Reuse
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample Here
snappy Key Features
snappy Examples and Code Snippets
Trending Discussions on snappy
Trending Discussions on snappy
QUESTION
I have a stage path as below
copy into table1 as (
select $1:InvestorID::varchar as Investor_ID from @company_stage/pbook/2022-03-10/Invor/part-00000-33cbc68b-69c1-40c0-943c-f586dfab3f49-c000.snappy.parquet
)
This is my S3 location company_stage/pbook/2022-03-10/Invor
,
I need to make this dynamic:
I) I need to change this "2022-03-10" folder to current date
II)it must take all parquet files in the folder automatically, without me mentioning of filename. How to achieve this?
ANSWER
Answered 2022-Mar-14 at 10:31Here is one approach. Your stage shouldn't include the date as part of the stage name because if it did, you would need a new stage every day. Better to define the stage as company_stage/pbook/
.
To make it dynamic, I suggest using the pattern
option together with the COPY INTO command. You could create a variable with the regex pattern expression using current_date(), something like this:
set mypattern = '\.*'||to_char(current_date(), 'YYYY-MM-DD')||'\.*';
Then use this variable in your COPY INTO command like this:
copy into table1 as (
select $1:InvestorID::varchar as Investor_ID from @company_stage/pbook/ pattern = $mypattern
)
Of course you can adjust your pattern matching as you see fit.
QUESTION
Is there a way to horizontally scroll only to start or specified position of previous or next element with Jetpack Compose?
ANSWER
Answered 2021-Aug-22 at 19:08You can check the scrolling direction like so
@Composable
private fun LazyListState.isScrollingUp(): Boolean {
var previousIndex by remember(this) { mutableStateOf(firstVisibleItemIndex) }
var previousScrollOffset by remember(this) { mutableStateOf(firstVisibleItemScrollOffset) }
return remember(this) {
derivedStateOf {
if (previousIndex != firstVisibleItemIndex) {
previousIndex > firstVisibleItemIndex
} else {
previousScrollOffset >= firstVisibleItemScrollOffset
}.also {
previousIndex = firstVisibleItemIndex
previousScrollOffset = firstVisibleItemScrollOffset
}
}
}.value
}
Of course, you will need to create a rememberLazyListState()
, and then pass it to the list as a parameter.
Then, based upon the scrolling direction, you can call lazyListState.scrollTo(lazyListState.firstVisibleItemIndex + 1)
in a coroutine (if the user is scrolling right), and appropriate calls for the other direction.
QUESTION
In my application config i have defined the following properties:
logging.file.name = application.logs
When i run my application it's creating two files application.logs.0
and application.logs.0.lck
and the content of file as follow
2022-02-16T12:55:05.656986Z
1645016105656
986000
0
org.apache.catalina.core.StandardService
INFO
org.apache.catalina.core.StandardService
startInternal
1
Starting service [Tomcat]
2022-02-16T12:55:05.671696Z
1645016105671
696000
1
org.apache.catalina.core.StandardEngine
INFO
org.apache.catalina.core.StandardEngine
startInternal
1
Starting Servlet engine: [Apache Tomcat/9.0.48]
It's not properly printing logs and don't want to output in the xml format
My Dependency Tree:
[INFO] com.walmart.uss:trigger:jar:0.0.1-SNAPSHOT
[INFO] +- com.google.cloud:google-cloud-logging:jar:3.0.0:compile
[INFO] | +- com.google.guava:guava:jar:31.0.1-jre:compile
[INFO] | +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] | +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] | +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO] | +- org.checkerframework:checker-qual:jar:3.8.0:compile
[INFO] | +- com.google.errorprone:error_prone_annotations:jar:2.8.1:compile
[INFO] | +- com.google.j2objc:j2objc-annotations:jar:1.3:compile
[INFO] | +- io.grpc:grpc-api:jar:1.41.0:compile
[INFO] | +- io.grpc:grpc-context:jar:1.41.0:compile
[INFO] | +- io.grpc:grpc-stub:jar:1.41.0:compile
[INFO] | +- io.grpc:grpc-protobuf:jar:1.41.0:compile
[INFO] | +- io.grpc:grpc-protobuf-lite:jar:1.41.0:compile
[INFO] | +- com.google.api:api-common:jar:2.0.5:compile
[INFO] | +- javax.annotation:javax.annotation-api:jar:1.3.2:compile
[INFO] | +- com.google.auto.value:auto-value-annotations:jar:1.8.2:compile
[INFO] | +- com.google.protobuf:protobuf-java:jar:3.18.1:compile
[INFO] | +- com.google.protobuf:protobuf-java-util:jar:3.18.1:compile
[INFO] | +- com.google.code.gson:gson:jar:2.8.7:compile
[INFO] | +- com.google.api.grpc:proto-google-common-protos:jar:2.6.0:compile
[INFO] | +- com.google.api.grpc:proto-google-cloud-logging-v2:jar:0.92.0:compile
[INFO] | +- com.google.api:gax:jar:2.6.1:compile
[INFO] | +- io.opencensus:opencensus-api:jar:0.28.0:compile
[INFO] | +- com.google.api:gax-grpc:jar:2.6.1:compile
[INFO] | +- io.grpc:grpc-auth:jar:1.41.0:compile
[INFO] | +- com.google.auth:google-auth-library-credentials:jar:1.2.1:compile
[INFO] | +- io.grpc:grpc-netty-shaded:jar:1.41.0:compile
[INFO] | +- io.grpc:grpc-alts:jar:1.41.0:compile
[INFO] | +- io.grpc:grpc-grpclb:jar:1.41.0:compile
[INFO] | +- org.conscrypt:conscrypt-openjdk-uber:jar:2.5.1:compile
[INFO] | +- org.threeten:threetenbp:jar:1.5.1:compile
[INFO] | +- com.google.cloud:google-cloud-core-grpc:jar:2.2.0:compile
[INFO] | +- com.google.auth:google-auth-library-oauth2-http:jar:1.2.1:compile
[INFO] | +- com.google.http-client:google-http-client-gson:jar:1.40.1:compile
[INFO] | +- com.google.http-client:google-http-client:jar:1.40.1:compile
[INFO] | +- commons-logging:commons-logging:jar:1.2:compile
[INFO] | +- commons-codec:commons-codec:jar:1.15:compile
[INFO] | +- org.apache.httpcomponents:httpcore:jar:4.4.14:compile
[INFO] | +- io.opencensus:opencensus-contrib-http-util:jar:0.28.0:compile
[INFO] | +- io.grpc:grpc-core:jar:1.41.0:compile
[INFO] | +- com.google.android:annotations:jar:4.1.1.4:runtime
[INFO] | +- org.codehaus.mojo:animal-sniffer-annotations:jar:1.20:runtime
[INFO] | +- io.perfmark:perfmark-api:jar:0.23.0:runtime
[INFO] | +- com.google.cloud:google-cloud-core:jar:2.2.0:compile
[INFO] | \- com.google.api.grpc:proto-google-iam-v1:jar:1.1.6:compile
[INFO] +- org.springframework.boot:spring-boot-starter:jar:2.5.2:compile
[INFO] | +- org.springframework.boot:spring-boot:jar:2.5.2:compile
[INFO] | +- org.springframework.boot:spring-boot-autoconfigure:jar:2.5.2:compile
[INFO] | +- jakarta.annotation:jakarta.annotation-api:jar:1.3.5:compile
[INFO] | +- org.springframework:spring-core:jar:5.3.8:compile
[INFO] | | \- org.springframework:spring-jcl:jar:5.3.8:compile
[INFO] | \- org.yaml:snakeyaml:jar:1.28:compile
[INFO] +- org.springframework.boot:spring-boot-starter-test:jar:2.5.2:test
[INFO] | +- org.springframework.boot:spring-boot-test:jar:2.5.2:test
[INFO] | +- org.springframework.boot:spring-boot-test-autoconfigure:jar:2.5.2:test
[INFO] | +- com.jayway.jsonpath:json-path:jar:2.5.0:test
[INFO] | | \- net.minidev:json-smart:jar:2.4.7:compile
[INFO] | | \- net.minidev:accessors-smart:jar:2.4.7:compile
[INFO] | +- jakarta.xml.bind:jakarta.xml.bind-api:jar:2.3.3:compile
[INFO] | | \- jakarta.activation:jakarta.activation-api:jar:1.2.2:compile
[INFO] | +- org.assertj:assertj-core:jar:3.19.0:test
[INFO] | +- org.hamcrest:hamcrest:jar:2.2:test
[INFO] | +- org.junit.jupiter:junit-jupiter:jar:5.7.2:test
[INFO] | | +- org.junit.jupiter:junit-jupiter-api:jar:5.7.2:test
[INFO] | | | +- org.apiguardian:apiguardian-api:jar:1.1.0:test
[INFO] | | | +- org.opentest4j:opentest4j:jar:1.2.0:test
[INFO] | | | \- org.junit.platform:junit-platform-commons:jar:1.7.2:test
[INFO] | | +- org.junit.jupiter:junit-jupiter-params:jar:5.7.2:test
[INFO] | | \- org.junit.jupiter:junit-jupiter-engine:jar:5.7.2:test
[INFO] | | \- org.junit.platform:junit-platform-engine:jar:1.7.2:test
[INFO] | +- org.mockito:mockito-core:jar:3.9.0:test
[INFO] | | +- net.bytebuddy:byte-buddy:jar:1.10.22:compile
[INFO] | | +- net.bytebuddy:byte-buddy-agent:jar:1.10.22:test
[INFO] | | \- org.objenesis:objenesis:jar:3.2:compile
[INFO] | +- org.mockito:mockito-junit-jupiter:jar:3.9.0:test
[INFO] | +- org.skyscreamer:jsonassert:jar:1.5.0:test
[INFO] | | \- com.vaadin.external.google:android-json:jar:0.0.20131108.vaadin1:test
[INFO] | +- org.springframework:spring-test:jar:5.3.8:test
[INFO] | \- org.xmlunit:xmlunit-core:jar:2.8.2:test
[INFO] +- org.springframework.boot:spring-boot-starter-thymeleaf:jar:2.5.2:compile
[INFO] | +- org.thymeleaf:thymeleaf-spring5:jar:3.0.12.RELEASE:compile
[INFO] | | \- org.thymeleaf:thymeleaf:jar:3.0.12.RELEASE:compile
[INFO] | | +- org.attoparser:attoparser:jar:2.0.5.RELEASE:compile
[INFO] | | \- org.unbescape:unbescape:jar:1.1.6.RELEASE:compile
[INFO] | \- org.thymeleaf.extras:thymeleaf-extras-java8time:jar:3.0.4.RELEASE:compile
[INFO] +- org.springframework:spring-webmvc:jar:5.3.8:compile
[INFO] | +- org.springframework:spring-aop:jar:5.3.8:compile
[INFO] | +- org.springframework:spring-beans:jar:5.3.8:compile
[INFO] | +- org.springframework:spring-context:jar:5.3.8:compile
[INFO] | +- org.springframework:spring-expression:jar:5.3.8:compile
[INFO] | \- org.springframework:spring-web:jar:5.3.8:compile
[INFO] +- org.springframework.boot:spring-boot-starter-security:jar:2.5.2:compile
[INFO] | +- org.springframework.security:spring-security-config:jar:5.5.1:compile
[INFO] | | \- org.springframework.security:spring-security-core:jar:5.5.1:compile
[INFO] | | \- org.springframework.security:spring-security-crypto:jar:5.5.1:compile
[INFO] | \- org.springframework.security:spring-security-web:jar:5.5.1:compile
[INFO] +- org.springframework.data:spring-data-jpa:jar:2.5.2:compile
[INFO] | +- org.springframework.data:spring-data-commons:jar:2.5.2:compile
[INFO] | +- org.springframework:spring-orm:jar:5.3.8:compile
[INFO] | | \- org.springframework:spring-jdbc:jar:5.3.8:compile
[INFO] | +- org.springframework:spring-tx:jar:5.3.8:compile
[INFO] | +- org.aspectj:aspectjrt:jar:1.9.6:compile
[INFO] | \- org.slf4j:slf4j-api:jar:1.7.31:compile
[INFO] +- org.springframework.boot:spring-boot-starter-data-jpa:jar:2.5.2:compile
[INFO] | +- org.springframework.boot:spring-boot-starter-aop:jar:2.5.2:compile
[INFO] | | \- org.aspectj:aspectjweaver:jar:1.9.6:compile
[INFO] | +- org.springframework.boot:spring-boot-starter-jdbc:jar:2.5.2:compile
[INFO] | | \- com.zaxxer:HikariCP:jar:4.0.3:compile
[INFO] | +- jakarta.transaction:jakarta.transaction-api:jar:1.3.3:compile
[INFO] | +- jakarta.persistence:jakarta.persistence-api:jar:2.2.3:compile
[INFO] | +- org.hibernate:hibernate-core:jar:5.4.32.Final:compile
[INFO] | | +- org.jboss.logging:jboss-logging:jar:3.4.2.Final:compile
[INFO] | | +- org.javassist:javassist:jar:3.27.0-GA:compile
[INFO] | | +- antlr:antlr:jar:2.7.7:compile
[INFO] | | +- org.jboss:jandex:jar:2.2.3.Final:compile
[INFO] | | +- com.fasterxml:classmate:jar:1.5.1:compile
[INFO] | | +- org.dom4j:dom4j:jar:2.1.3:compile
[INFO] | | +- org.hibernate.common:hibernate-commons-annotations:jar:5.1.2.Final:compile
[INFO] | | \- org.glassfish.jaxb:jaxb-runtime:jar:2.3.4:compile
[INFO] | | +- org.glassfish.jaxb:txw2:jar:2.3.4:compile
[INFO] | | +- com.sun.istack:istack-commons-runtime:jar:3.0.12:compile
[INFO] | | \- com.sun.activation:jakarta.activation:jar:1.2.2:runtime
[INFO] | \- org.springframework:spring-aspects:jar:5.3.8:compile
[INFO] +- org.projectlombok:lombok:jar:1.18.12:provided
[INFO] +- com.h2database:h2:jar:1.4.190:runtime
[INFO] +- org.springframework.boot:spring-boot-starter-web:jar:2.5.2:compile
[INFO] | +- org.springframework.boot:spring-boot-starter-json:jar:2.5.2:compile
[INFO] | | +- com.fasterxml.jackson.datatype:jackson-datatype-jdk8:jar:2.12.3:compile
[INFO] | | +- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:jar:2.12.3:compile
[INFO] | | \- com.fasterxml.jackson.module:jackson-module-parameter-names:jar:2.12.3:compile
[INFO] | \- org.springframework.boot:spring-boot-starter-tomcat:jar:2.5.2:compile
[INFO] | +- org.apache.tomcat.embed:tomcat-embed-core:jar:9.0.48:compile
[INFO] | +- org.apache.tomcat.embed:tomcat-embed-el:jar:9.0.48:compile
[INFO] | \- org.apache.tomcat.embed:tomcat-embed-websocket:jar:9.0.48:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.12:compile
[INFO] +- org.springframework.integration:spring-integration-core:jar:5.5.3:compile
[INFO] | +- org.springframework:spring-messaging:jar:5.3.8:compile
[INFO] | +- org.springframework.retry:spring-retry:jar:1.3.1:compile
[INFO] | \- io.projectreactor:reactor-core:jar:3.4.7:compile
[INFO] | \- org.reactivestreams:reactive-streams:jar:1.0.3:compile
[INFO] +- org.apache.commons:commons-text:jar:1.9:compile
[INFO] | \- org.apache.commons:commons-lang3:jar:3.12.0:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.12.3:compile
[INFO] +- com.fasterxml.jackson.core:jackson-core:jar:2.12.3:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.12.3:compile
[INFO] +- org.springframework.boot:spring-boot-starter-actuator:jar:2.5.2:compile
[INFO] | +- org.springframework.boot:spring-boot-actuator-autoconfigure:jar:2.5.2:compile
[INFO] | | \- org.springframework.boot:spring-boot-actuator:jar:2.5.2:compile
[INFO] | \- io.micrometer:micrometer-core:jar:1.7.1:compile
[INFO] | +- org.hdrhistogram:HdrHistogram:jar:2.1.12:compile
[INFO] | \- org.latencyutils:LatencyUtils:jar:2.0.3:runtime
[INFO] +- org.apache.maven.plugins:maven-compiler-plugin:jar:3.8.1:compile
[INFO] | +- org.apache.maven:maven-plugin-api:jar:3.0:compile
[INFO] | | +- org.apache.maven:maven-model:jar:3.0:compile
[INFO] | | \- org.sonatype.sisu:sisu-inject-plexus:jar:1.4.2:compile
[INFO] | | \- org.sonatype.sisu:sisu-inject-bean:jar:1.4.2:compile
[INFO] | | \- org.sonatype.sisu:sisu-guice:jar:noaop:2.1.7:compile
[INFO] | +- org.apache.maven:maven-artifact:jar:3.0:compile
[INFO] | | \- org.codehaus.plexus:plexus-utils:jar:2.0.4:compile
[INFO] | +- org.apache.maven:maven-core:jar:3.0:compile
[INFO] | | +- org.apache.maven:maven-settings:jar:3.0:compile
[INFO] | | +- org.apache.maven:maven-settings-builder:jar:3.0:compile
[INFO] | | +- org.apache.maven:maven-repository-metadata:jar:3.0:compile
[INFO] | | +- org.apache.maven:maven-model-builder:jar:3.0:compile
[INFO] | | +- org.apache.maven:maven-aether-provider:jar:3.0:runtime
[INFO] | | +- org.sonatype.aether:aether-impl:jar:1.7:compile
[INFO] | | | \- org.sonatype.aether:aether-spi:jar:1.7:compile
[INFO] | | +- org.sonatype.aether:aether-api:jar:1.7:compile
[INFO] | | +- org.sonatype.aether:aether-util:jar:1.7:compile
[INFO] | | +- org.codehaus.plexus:plexus-interpolation:jar:1.14:compile
[INFO] | | +- org.codehaus.plexus:plexus-classworlds:jar:2.2.3:compile
[INFO] | | +- org.codehaus.plexus:plexus-component-annotations:jar:1.5.5:compile
[INFO] | | \- org.sonatype.plexus:plexus-sec-dispatcher:jar:1.3:compile
[INFO] | | \- org.sonatype.plexus:plexus-cipher:jar:1.4:compile
[INFO] | +- org.apache.maven.shared:maven-shared-utils:jar:3.2.1:compile
[INFO] | | \- commons-io:commons-io:jar:2.5:compile
[INFO] | +- org.apache.maven.shared:maven-shared-incremental:jar:1.1:compile
[INFO] | +- org.codehaus.plexus:plexus-java:jar:0.9.10:compile
[INFO] | | +- org.ow2.asm:asm:jar:6.2:compile
[INFO] | | \- com.thoughtworks.qdox:qdox:jar:2.0-M8:compile
[INFO] | +- org.codehaus.plexus:plexus-compiler-api:jar:2.8.4:compile
[INFO] | +- org.codehaus.plexus:plexus-compiler-manager:jar:2.8.4:compile
[INFO] | \- org.codehaus.plexus:plexus-compiler-javac:jar:2.8.4:runtime
[INFO] +- org.postgresql:postgresql:jar:42.2.23:compile
[INFO] +- junit:junit:jar:4.12:test
[INFO] | \- org.hamcrest:hamcrest-core:jar:2.2:test
[INFO] +- org.springframework.boot:spring-boot-loader:jar:2.5.6:compile
[INFO] +- com.google.cloud:google-cloud-dataproc:jar:2.2.2:compile
[INFO] | \- com.google.api.grpc:proto-google-cloud-dataproc-v1:jar:2.2.2:compile
[INFO] +- mysql:mysql-connector-java:jar:8.0.25:compile
[INFO] +- com.google.cloud:google-cloud-bigquery:jar:2.3.3:compile
[INFO] | +- com.google.cloud:google-cloud-core-http:jar:2.2.0:compile
[INFO] | +- com.google.api-client:google-api-client:jar:1.32.2:compile
[INFO] | +- com.google.oauth-client:google-oauth-client:jar:1.32.1:compile
[INFO] | +- com.google.http-client:google-http-client-apache-v2:jar:1.40.1:compile
[INFO] | +- com.google.http-client:google-http-client-appengine:jar:1.40.1:compile
[INFO] | +- com.google.api:gax-httpjson:jar:0.91.1:compile
[INFO] | +- com.google.http-client:google-http-client-jackson2:jar:1.40.1:compile
[INFO] | +- org.checkerframework:checker-compat-qual:jar:2.5.5:compile
[INFO] | \- com.google.apis:google-api-services-bigquery:jar:v2-rev20211017-1.32.1:compile
[INFO] +- org.apache.spark:spark-core_2.12:jar:3.1.0:compile
[INFO] | +- com.thoughtworks.paranamer:paranamer:jar:2.8:compile
[INFO] | +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] | | +- org.apache.commons:commons-compress:jar:1.8.1:compile
[INFO] | | \- org.tukaani:xz:jar:1.5:compile
[INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.8.2:compile
[INFO] | | \- org.apache.avro:avro-ipc:jar:1.8.2:compile
[INFO] | +- com.twitter:chill_2.12:jar:0.9.5:compile
[INFO] | | \- com.esotericsoftware:kryo-shaded:jar:4.0.2:compile
[INFO] | | \- com.esotericsoftware:minlog:jar:1.3.0:compile
[INFO] | +- com.twitter:chill-java:jar:0.9.5:compile
[INFO] | +- org.apache.xbean:xbean-asm7-shaded:jar:4.15:compile
[INFO] | +- org.apache.hadoop:hadoop-client:jar:3.2.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-common:jar:3.2.0:compile
[INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | | | +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | | +- org.eclipse.jetty:jetty-servlet:jar:9.4.42.v20210604:compile
[INFO] | | | | +- org.eclipse.jetty:jetty-security:jar:9.4.42.v20210604:compile
[INFO] | | | | \- org.eclipse.jetty:jetty-util-ajax:jar:9.4.42.v20210604:compile
[INFO] | | | +- javax.servlet.jsp:jsp-api:jar:2.1:runtime
[INFO] | | | +- commons-beanutils:commons-beanutils:jar:1.9.3:compile
[INFO] | | | +- org.apache.commons:commons-configuration2:jar:2.1.1:compile
[INFO] | | | +- com.google.re2j:re2j:jar:1.1:compile
[INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:3.2.0:compile
[INFO] | | | | \- com.nimbusds:nimbus-jose-jwt:jar:9.10:compile
[INFO] | | | | \- com.github.stephenc.jcip:jcip-annotations:jar:1.0-1:compile
[INFO] | | | +- org.apache.curator:curator-client:jar:2.12.0:compile
[INFO] | | | +- org.apache.htrace:htrace-core4:jar:4.1.0-incubating:compile
[INFO] | | | +- org.apache.kerby:kerb-simplekdc:jar:1.0.1:compile
[INFO] | | | | +- org.apache.kerby:kerb-client:jar:1.0.1:compile
[INFO] | | | | | +- org.apache.kerby:kerby-config:jar:1.0.1:compile
[INFO] | | | | | +- org.apache.kerby:kerb-core:jar:1.0.1:compile
[INFO] | | | | | | \- org.apache.kerby:kerby-pkix:jar:1.0.1:compile
[INFO] | | | | | | +- org.apache.kerby:kerby-asn1:jar:1.0.1:compile
[INFO] | | | | | | \- org.apache.kerby:kerby-util:jar:1.0.1:compile
[INFO] | | | | | +- org.apache.kerby:kerb-common:jar:1.0.1:compile
[INFO] | | | | | | \- org.apache.kerby:kerb-crypto:jar:1.0.1:compile
[INFO] | | | | | +- org.apache.kerby:kerb-util:jar:1.0.1:compile
[INFO] | | | | | \- org.apache.kerby:token-provider:jar:1.0.1:compile
[INFO] | | | | \- org.apache.kerby:kerb-admin:jar:1.0.1:compile
[INFO] | | | | +- org.apache.kerby:kerb-server:jar:1.0.1:compile
[INFO] | | | | | \- org.apache.kerby:kerb-identity:jar:1.0.1:compile
[INFO] | | | | \- org.apache.kerby:kerby-xdr:jar:1.0.1:compile
[INFO] | | | +- org.codehaus.woodstox:stax2-api:jar:3.1.4:compile
[INFO] | | | +- com.fasterxml.woodstox:woodstox-core:jar:5.0.3:compile
[INFO] | | | \- dnsjava:dnsjava:jar:2.1.7:compile
[INFO] | | +- org.apache.hadoop:hadoop-hdfs-client:jar:3.2.0:compile
[INFO] | | | \- com.squareup.okhttp:okhttp:jar:2.7.5:compile
[INFO] | | | \- com.squareup.okio:okio:jar:1.6.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:3.2.0:compile
[INFO] | | | \- javax.xml.bind:jaxb-api:jar:2.3.1:compile
[INFO] | | | \- javax.activation:javax.activation-api:jar:1.2.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-yarn-client:jar:3.2.0:compile
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.2.0:compile
[INFO] | | | \- org.apache.hadoop:hadoop-yarn-common:jar:3.2.0:compile
[INFO] | | | +- javax.servlet:javax.servlet-api:jar:4.0.1:compile
[INFO] | | | +- org.eclipse.jetty:jetty-util:jar:9.4.42.v20210604:compile
[INFO] | | | +- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.12.3:compile
[INFO] | | | \- com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.12.3:compile
[INFO] | | | \- com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.12.3:compile
[INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.2.0:compile
[INFO] | | | \- org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.2.0:compile
[INFO] | | \- org.apache.hadoop:hadoop-annotations:jar:3.2.0:compile
[INFO] | +- org.apache.spark:spark-launcher_2.12:jar:3.1.0:compile
[INFO] | +- org.apache.spark:spark-kvstore_2.12:jar:3.1.0:compile
[INFO] | | \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile
[INFO] | +- org.apache.spark:spark-network-common_2.12:jar:3.1.0:compile
[INFO] | +- org.apache.spark:spark-network-shuffle_2.12:jar:3.1.0:compile
[INFO] | +- org.apache.spark:spark-unsafe_2.12:jar:3.1.0:compile
[INFO] | +- javax.activation:activation:jar:1.1.1:compile
[INFO] | +- org.apache.curator:curator-recipes:jar:2.13.0:compile
[INFO] | | \- org.apache.curator:curator-framework:jar:2.13.0:compile
[INFO] | +- org.apache.zookeeper:zookeeper:jar:3.4.14:compile
[INFO] | | \- org.apache.yetus:audience-annotations:jar:0.5.0:compile
[INFO] | +- jakarta.servlet:jakarta.servlet-api:jar:4.0.4:compile
[INFO] | +- org.apache.commons:commons-math3:jar:3.4.1:compile
[INFO] | +- org.slf4j:jul-to-slf4j:jar:1.7.31:compile
[INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.31:compile
[INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile
[INFO] | +- org.xerial.snappy:snappy-java:jar:1.1.8.2:compile
[INFO] | +- org.lz4:lz4-java:jar:1.7.1:compile
[INFO] | +- com.github.luben:zstd-jni:jar:1.4.8-1:compile
[INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.9.0:compile
[INFO] | | \- org.roaringbitmap:shims:jar:0.9.0:runtime
[INFO] | +- commons-net:commons-net:jar:3.1:compile
[INFO] | +- org.scala-lang.modules:scala-xml_2.12:jar:1.2.0:compile
[INFO] | +- org.scala-lang:scala-library:jar:2.12.10:compile
[INFO] | +- org.scala-lang:scala-reflect:jar:2.12.10:compile
[INFO] | +- org.json4s:json4s-jackson_2.12:jar:3.7.0-M5:compile
[INFO] | | \- org.json4s:json4s-core_2.12:jar:3.7.0-M5:compile
[INFO] | | +- org.json4s:json4s-ast_2.12:jar:3.7.0-M5:compile
[INFO] | | \- org.json4s:json4s-scalap_2.12:jar:3.7.0-M5:compile
[INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.33:compile
[INFO] | | +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.6:compile
[INFO] | | \- org.glassfish.hk2.external:jakarta.inject:jar:2.6.1:compile
[INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.33:compile
[INFO] | | \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.3:compile
[INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.33:compile
[INFO] | | \- jakarta.validation:jakarta.validation-api:jar:2.0.2:compile
[INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.33:compile
[INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.33:compile
[INFO] | +- org.glassfish.jersey.inject:jersey-hk2:jar:2.33:compile
[INFO] | | \- org.glassfish.hk2:hk2-locator:jar:2.6.1:compile
[INFO] | | +- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.6.1:compile
[INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.6.1:compile
[INFO] | | \- org.glassfish.hk2:hk2-utils:jar:2.6.1:compile
[INFO] | +- io.netty:netty-all:jar:4.1.65.Final:compile
[INFO] | +- com.clearspring.analytics:stream:jar:2.9.6:compile
[INFO] | +- io.dropwizard.metrics:metrics-core:jar:4.1.24:compile
[INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:4.1.24:compile
[INFO] | +- io.dropwizard.metrics:metrics-json:jar:4.1.24:compile
[INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:4.1.24:compile
[INFO] | +- io.dropwizard.metrics:metrics-jmx:jar:4.1.24:compile
[INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.12.3:compile
[INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile
[INFO] | +- oro:oro:jar:2.0.8:compile
[INFO] | +- net.razorvine:pyrolite:jar:4.30:compile
[INFO] | +- net.sf.py4j:py4j:jar:0.10.9:compile
[INFO] | +- org.apache.spark:spark-tags_2.12:jar:3.1.0:compile
[INFO] | +- org.apache.commons:commons-crypto:jar:1.1.0:compile
[INFO] | \- org.spark-project.spark:unused:jar:1.0.0:compile
[INFO] +- org.apache.spark:spark-streaming_2.12:jar:3.1.0:compile
[INFO] +- org.apache.spark:spark-streaming-kafka-0-10_2.12:jar:3.1.0:compile
[INFO] | +- org.apache.spark:spark-token-provider-kafka-0-10_2.12:jar:3.1.0:compile
[INFO] | \- org.apache.kafka:kafka-clients:jar:2.7.1:compile
[INFO] +- org.apache.spark:spark-avro_2.12:jar:3.1.0:compile
[INFO] +- org.apache.spark:spark-sql-kafka-0-10_2.12:jar:3.1.0:compile
[INFO] | \- org.apache.commons:commons-pool2:jar:2.9.0:compile
[INFO] +- org.codehaus.janino:janino:jar:3.0.8:compile
[INFO] +- org.codehaus.janino:commons-compiler:jar:3.0.8:compile
[INFO] +- org.apache.spark:spark-sql_2.12:jar:3.1.0:compile
[INFO] | +- com.univocity:univocity-parsers:jar:2.9.0:compile
[INFO] | +- org.apache.spark:spark-sketch_2.12:jar:3.1.0:compile
[INFO] | +- org.apache.spark:spark-catalyst_2.12:jar:3.1.0:compile
[INFO] | | +- org.scala-lang.modules:scala-parser-combinators_2.12:jar:1.1.2:compile
[INFO] | | +- org.antlr:antlr4-runtime:jar:4.8-1:compile
[INFO] | | +- org.apache.arrow:arrow-vector:jar:2.0.0:compile
[INFO] | | | +- org.apache.arrow:arrow-format:jar:2.0.0:compile
[INFO] | | | +- org.apache.arrow:arrow-memory-core:jar:2.0.0:compile
[INFO] | | | \- com.google.flatbuffers:flatbuffers-java:jar:1.9.0:compile
[INFO] | | \- org.apache.arrow:arrow-memory-netty:jar:2.0.0:compile
[INFO] | +- org.apache.orc:orc-core:jar:1.5.12:compile
[INFO] | | +- org.apache.orc:orc-shims:jar:1.5.12:compile
[INFO] | | +- commons-lang:commons-lang:jar:2.6:compile
[INFO] | | +- io.airlift:aircompressor:jar:0.10:compile
[INFO] | | \- org.threeten:threeten-extra:jar:1.5.0:compile
[INFO] | +- org.apache.orc:orc-mapreduce:jar:1.5.12:compile
[INFO] | +- org.apache.hive:hive-storage-api:jar:2.7.2:compile
[INFO] | +- org.apache.parquet:parquet-column:jar:1.10.1:compile
[INFO] | | +- org.apache.parquet:parquet-common:jar:1.10.1:compile
[INFO] | | \- org.apache.parquet:parquet-encoding:jar:1.10.1:compile
[INFO] | \- org.apache.parquet:parquet-hadoop:jar:1.10.1:compile
[INFO] | +- org.apache.parquet:parquet-format:jar:2.4.0:compile
[INFO] | \- org.apache.parquet:parquet-jackson:jar:1.10.1:compile
[INFO] +- org.springframework.kafka:spring-kafka:jar:2.8.2:compile
[INFO] +- com.google.cloud:google-cloud-storage:jar:2.1.9:compile
[INFO] | \- com.google.apis:google-api-services-storage:jar:v1-rev20210918-1.32.1:compile
[INFO] \- za.co.absa:abris_2.12:jar:6.0.0:compile
[INFO] +- io.confluent:kafka-avro-serializer:jar:6.2.1:compile
[INFO] | +- io.confluent:kafka-schema-serializer:jar:6.2.1:compile
[INFO] | \- io.confluent:common-utils:jar:6.2.1:compile
[INFO] +- io.confluent:kafka-schema-registry-client:jar:6.2.1:compile
[INFO] | +- io.swagger:swagger-annotations:jar:1.6.2:compile
[INFO] | \- io.swagger:swagger-core:jar:1.6.2:compile
[INFO] | +- com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:jar:2.12.3:compile
[INFO] | \- io.swagger:swagger-models:jar:1.6.2:compile
[INFO] \- za.co.absa.commons:commons_2.12:jar:1.0.0:compile
My Spark Integration with Spring boot is causing the issue, i am not able to dependency which is causing it
ANSWER
Answered 2022-Feb-16 at 13:12Acording to this answer: https://stackoverflow.com/a/51236918/16651073 tomcat falls back to default logging if it can resolve the location
Can you try to save the properties without the spaces.
Like this: logging.file.name=application.logs
QUESTION
It's my first Kafka program.
From a kafka_2.13-3.1.0
instance, I created a Kafka topic poids_garmin_brut
and filled it with this csv
:
kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic poids_garmin_brut
kafka-console-producer.sh --broker-list localhost:9092 --topic poids_garmin_brut < "Poids(1).csv"
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
" 12 Fév. 2022",
06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
[...]
And at anytime now, before or after running the program I'll show, its content can be displayed by a kafka-console-consumer
command:
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic poids_garmin_brut --from-beginning
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
" 12 Fév. 2022",
06:17,72.2 kg,0.0 kg,22.8,25.3 %,29.7 kg,3.6 kg,54.5 %,
" 11 Fév. 2022",
05:54,72.2 kg,0.1 kg,22.8,25.6 %,29.7 kg,3.5 kg,54.3 %,
" 10 Fév. 2022",
06:14,72.3 kg,0.0 kg,22.8,25.9 %,29.7 kg,3.5 kg,54.1 %,
" 9 Fév. 2022",
06:06,72.3 kg,0.5 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 8 Fév. 2022",
07:14,71.8 kg,0.7 kg,22.7,26.3 %,29.6 kg,3.5 kg,53.8 %,
Here is the Java program, based on org.apache.kafka:kafka-streams:3.1.0
dependency, extracting this topic as a stream:
package extracteur.garmin;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.slf4j.*;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.util.Properties;
@SpringBootApplication
public class Kafka {
/** Logger. */
private static final Logger LOGGER = LoggerFactory.getLogger(Kafka.class);
public static void main(String[] args) {
LOGGER.info("L'extracteur de données Garmin démarre...");
/* Les données du fichier CSV d'entrée sont sous cette forme :
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
*/
// Création d'un flux sans clef et valeur : chaîne de caractères.
StreamsBuilder builder = new StreamsBuilder();
KStream stream = builder.stream("poids_garmin_brut");
// C'est un foreach de Kafka, pas de lambda java. Il est lazy.
stream.foreach((key, value) -> {
LOGGER.info(value);
});
KafkaStreams streams = new KafkaStreams(builder.build(), config());
streams.start();
// Fermer le flux Kafka quand la VM s'arrêtera, en faisant appeler
streams.close();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
/**
* Propriétés pour le démarrage.
* @return propriétés de configuration.
*/
private static Properties config() {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "dev1");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.Void().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
return config;
}
}
But, while the logs don't seem to report any error during execution, my program doesn't enter the stream.forEach
, and therefore: displays no content from that topic.
(in this log I removed the dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088-
part of [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088-StreamThread-1]
you should read inside, for SO message length and lisibility. And org.apache.kafka
becames o.a.k.
).
/usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -XX:TieredStopAtLevel=1 -noverify -Dspring.output.ansi.enabled=always -Dcom.sun.management.jmxremote -Dspring.jmx.enabled=true -Dspring.liveBeansView.mbeanDomain -Dspring.application.admin.enabled=true -javaagent:/opt/idea-IU-212.5284.40/lib/idea_rt.jar=41397:/opt/idea-IU-212.5284.40/bin -Dfile.encoding=UTF-8 -classpath /home/lebihan/dev/Java/garmin/target/classes:/home/lebihan/.m2/repository/org/slf4j/slf4j-api/1.7.33/slf4j-api-1.7.33.jar:/home/lebihan/.m2/repository/org/slf4j/log4j-over-slf4j/1.7.33/log4j-over-slf4j-1.7.33.jar:/home/lebihan/.m2/repository/ch/qos/logback/logback-classic/1.2.10/logback-classic-1.2.10.jar:/home/lebihan/.m2/repository/ch/qos/logback/logback-core/1.2.10/logback-core-1.2.10.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-web/2.6.3/spring-boot-starter-web-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter/2.6.3/spring-boot-starter-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot/2.6.3/spring-boot-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-autoconfigure/2.6.3/spring-boot-autoconfigure-2.6.3.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-logging/2.6.3/spring-boot-starter-logging-2.6.3.jar:/home/lebihan/.m2/repository/org/apache/logging/log4j/log4j-to-slf4j/2.17.1/log4j-to-slf4j-2.17.1.jar:/home/lebihan/.m2/repository/org/apache/logging/log4j/log4j-api/2.17.1/log4j-api-2.17.1.jar:/home/lebihan/.m2/repository/org/slf4j/jul-to-slf4j/1.7.33/jul-to-slf4j-1.7.33.jar:/home/lebihan/.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/home/lebihan/.m2/repository/org/yaml/snakeyaml/1.29/snakeyaml-1.29.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-json/2.6.3/spring-boot-starter-json-2.6.3.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jdk8/2.13.1/jackson-datatype-jdk8-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.13.1/jackson-datatype-jsr310-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.13.1/jackson-module-parameter-names-2.13.1.jar:/home/lebihan/.m2/repository/org/springframework/boot/spring-boot-starter-tomcat/2.6.3/spring-boot-starter-tomcat-2.6.3.jar:/home/lebihan/.m2/repository/org/apache/tomcat/embed/tomcat-embed-core/9.0.56/tomcat-embed-core-9.0.56.jar:/home/lebihan/.m2/repository/org/apache/tomcat/embed/tomcat-embed-el/9.0.56/tomcat-embed-el-9.0.56.jar:/home/lebihan/.m2/repository/org/apache/tomcat/embed/tomcat-embed-websocket/9.0.56/tomcat-embed-websocket-9.0.56.jar:/home/lebihan/.m2/repository/org/springframework/spring-web/5.3.15/spring-web-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-beans/5.3.15/spring-beans-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-webmvc/5.3.15/spring-webmvc-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-aop/5.3.15/spring-aop-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-context/5.3.15/spring-context-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-expression/5.3.15/spring-expression-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-core/5.3.15/spring-core-5.3.15.jar:/home/lebihan/.m2/repository/org/springframework/spring-jcl/5.3.15/spring-jcl-5.3.15.jar:/home/lebihan/.m2/repository/org/apache/kafka/kafka-streams/3.1.0/kafka-streams-3.1.0.jar:/home/lebihan/.m2/repository/org/apache/kafka/kafka-clients/3.0.0/kafka-clients-3.0.0.jar:/home/lebihan/.m2/repository/com/github/luben/zstd-jni/1.5.0-2/zstd-jni-1.5.0-2.jar:/home/lebihan/.m2/repository/org/lz4/lz4-java/1.7.1/lz4-java-1.7.1.jar:/home/lebihan/.m2/repository/org/xerial/snappy/snappy-java/1.1.8.1/snappy-java-1.1.8.1.jar:/home/lebihan/.m2/repository/org/rocksdb/rocksdbjni/6.22.1.1/rocksdbjni-6.22.1.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.13.1/jackson-annotations-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.13.1/jackson-databind-2.13.1.jar:/home/lebihan/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.13.1/jackson-core-2.13.1.jar extracteur.garmin.Kafka
07:57:49.720 [main] INFO extracteur.garmin.Kafka - L'extracteur de données Garmin démarre...
07:57:49.747 [main] INFO o.a.k.streams.StreamsConfig - StreamsConfig values:
acceptable.recovery.lag = 10000
application.id = dev1
application.server =
bootstrap.servers = [localhost:9092]
buffered.records.per.partition = 1000
built.in.metrics.version = latest
cache.max.bytes.buffering = 10485760
client.id =
commit.interval.ms = 30000
connections.max.idle.ms = 540000
default.deserialization.exception.handler = class o.a.k.streams.errors.LogAndFailExceptionHandler
default.key.serde = class o.a.k.common.serialization.Serdes$VoidSerde
default.list.key.serde.inner = null
default.list.key.serde.type = null
default.list.value.serde.inner = null
default.list.value.serde.type = null
default.production.exception.handler = class o.a.k.streams.errors.DefaultProductionExceptionHandler
default.timestamp.extractor = class o.a.k.streams.processor.FailOnInvalidTimestamp
default.value.serde = class o.a.k.common.serialization.Serdes$StringSerde
max.task.idle.ms = 0
max.warmup.replicas = 2
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
num.standby.replicas = 0
num.stream.threads = 1
poll.ms = 100
probing.rebalance.interval.ms = 600000
processing.guarantee = at_least_once
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
replication.factor = -1
request.timeout.ms = 40000
retries = 0
retry.backoff.ms = 100
rocksdb.config.setter = null
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
state.cleanup.delay.ms = 600000
state.dir = /tmp/kafka-streams
task.timeout.ms = 300000
topology.optimization = none
upgrade.from = null
window.size.ms = null
windowed.inner.class.serde = null
windowstore.changelog.additional.retention.ms = 86400000
07:57:49.760 [main] INFO o.a.k.clients.admin.AdminClientConfig - AdminClientConfig values:
bootstrap.servers = [localhost:9092]
client.dns.lookup = use_all_dns_ips
client.id = admin
connections.max.idle.ms = 300000
default.api.timeout.ms = 60000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
07:57:49.790 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.790 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.790 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269788
07:57:49.793 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] Kafka Streams version: 3.1.0
07:57:49.793 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] Kafka Streams commit ID: 37edeed0777bacb3
07:57:49.800 [main] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Creating restore consumer client
07:57:49.802 [main] INFO o.a.k.clients.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = none
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = StreamThread-1-restore-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = null
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class o.a.k.clients.consumer.RangeAssignor, class o.a.k.clients.consumer.CooperativeStickyAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
07:57:49.816 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.816 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.816 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269816
07:57:49.818 [main] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Creating thread producer client
07:57:49.820 [main] INFO o.a.k.clients.producer.ProducerConfig - ProducerConfig values:
acks = -1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.dns.lookup = use_all_dns_ips
client.id = StreamThread-1-producer
compression.type = none
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = true
interceptor.classes = []
key.serializer = class o.a.k.common.serialization.ByteArraySerializer
linger.ms = 100
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metadata.max.idle.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class o.a.k.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 2147483647
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class o.a.k.common.serialization.ByteArraySerializer
07:57:49.828 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.828 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.828 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269828
07:57:49.830 [main] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Creating consumer client
07:57:49.831 [main] INFO o.a.k.clients.consumer.ConsumerConfig - ConsumerConfig values:
allow.auto.create.topics = false
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = StreamThread-1-consumer
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = dev1
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = false
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 1000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [o.a.k.streams.processor.internals.StreamsPartitionAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 45000
socket.connection.setup.timeout.max.ms = 30000
socket.connection.setup.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.3
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class o.a.k.common.serialization.ByteArrayDeserializer
replication.factor = -1
windowstore.changelog.additional.retention.ms = 86400000
07:57:49.836 [main] INFO o.a.k.streams.processor.internals.assignment.AssignorConfiguration - stream-thread [StreamThread-1-consumer] Cooperative rebalancing protocol is enabled now
07:57:49.840 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka version: 3.0.0
07:57:49.840 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
07:57:49.840 [main] INFO o.a.k.common.utils.AppInfoParser - Kafka startTimeMs: 1644908269840
07:57:49.844 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] State transition from CREATED to REBALANCING
07:57:49.845 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Starting
07:57:49.845 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] State transition from CREATED to STARTING
07:57:49.845 [StreamThread-1] INFO o.a.k.clients.consumer.KafkaConsumer - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Subscribed to topic(s): poids_garmin_brut
07:57:49.845 [main] INFO o.a.k.streams.KafkaStreams - stream-client [dev1-d1c8ce47-6fbf-41b7-b8aa-e3d094703088] State transition from REBALANCING to PENDING_SHUTDOWN
07:57:49.846 [kafka-streams-close-thread] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Informed to shut down
07:57:49.846 [kafka-streams-close-thread] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] State transition from STARTING to PENDING_SHUTDOWN
07:57:49.919 [kafka-producer-network-thread | StreamThread-1-producer] INFO o.a.k.clients.Metadata - [Producer clientId=StreamThread-1-producer] Cluster ID: QKJGs4glRAy7besZxXNCrg
07:57:49.920 [StreamThread-1] INFO o.a.k.clients.Metadata - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Cluster ID: QKJGs4glRAy7besZxXNCrg
07:57:49.921 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Discovered group coordinator debian:9092 (id: 2147483647 rack: null)
07:57:49.922 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] (Re-)joining group
07:57:49.929 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Request joining group due to: need to re-join with the given member-id
07:57:49.929 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] (Re-)joining group
07:57:49.930 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Successfully joined group with generation Generation{generationId=3, memberId='StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c', protocol='stream'}
07:57:49.936 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] All members participating in this rebalance:
d1c8ce47-6fbf-41b7-b8aa-e3d094703088: [StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c].
07:57:49.938 [StreamThread-1] INFO o.a.k.streams.processor.internals.assignment.HighAvailabilityTaskAssignor - Decided on assignment: {d1c8ce47-6fbf-41b7-b8aa-e3d094703088=[activeTasks: ([0_0]) standbyTasks: ([]) prevActiveTasks: ([]) prevStandbyTasks: ([]) changelogOffsetTotalsByTask: ([]) taskLagTotals: ([]) capacity: 1 assigned: 1]} with no followup probing rebalance.
07:57:49.938 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] Assigned tasks [0_0] including stateful [] to clients as:
d1c8ce47-6fbf-41b7-b8aa-e3d094703088=[activeTasks: ([0_0]) standbyTasks: ([])].
07:57:49.939 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] Client d1c8ce47-6fbf-41b7-b8aa-e3d094703088 per-consumer assignment:
prev owned active {}
prev owned standby {StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c=[]}
assigned active {StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c=[0_0]}
revoking active {}
assigned standby {}
07:57:49.939 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] Finished stable assignment of tasks, no followup rebalances required.
07:57:49.939 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Finished assignment for group at generation 3: {StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c=Assignment(partitions=[poids_garmin_brut-0], userDataSize=52)}
07:57:49.943 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Successfully synced group in generation Generation{generationId=3, memberId='StreamThread-1-consumer-34c0df37-baeb-4582-bdfe-79ab9e2e410c', protocol='stream'}
07:57:49.943 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Updating assignment with
Assigned partitions: [poids_garmin_brut-0]
Current owned partitions: []
Added partitions (assigned - owned): [poids_garmin_brut-0]
Revoked partitions (owned - assigned): []
07:57:49.943 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Notifying assignor about the new Assignment(partitions=[poids_garmin_brut-0], userDataSize=52)
07:57:49.944 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamsPartitionAssignor - stream-thread [StreamThread-1-consumer] No followup rebalance was requested, resetting the rebalance schedule.
07:57:49.944 [StreamThread-1] INFO o.a.k.streams.processor.internals.TaskManager - stream-thread [StreamThread-1] Handle new assignment with:
New active tasks: [0_0]
New standby tasks: []
Existing active tasks: []
Existing standby tasks: []
07:57:49.950 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Adding newly assigned partitions: poids_garmin_brut-0
07:57:49.953 [StreamThread-1] INFO o.a.k.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=StreamThread-1-consumer, groupId=dev1] Found no committed offset for partition poids_garmin_brut-0
07:57:49.954 [StreamThread-1] INFO o.a.k.streams.processor.internals.StreamThread - stream-thread [StreamThread-1] Shutting down
[...]
Process finished with exit code 0
What am I doing wrong?
I'm running my Kafka instance and its Java program locally, on the same PC.
I've experienced
3.1.0
and2.8.1
versions of Kafka, or removed any traces of Spring in the Java program without success.
I belive I'm facing a configuration problem.
ANSWER
Answered 2022-Feb-15 at 14:36Following should work.
LOGGER.info("L'extracteur de données Garmin démarre...");
/* Les données du fichier CSV d'entrée sont sous cette forme :
Durée,Poids,Variation,IMC,Masse grasse,Masse musculaire squelettique,Masse osseuse,Masse hydrique,
" 14 Fév. 2022",
06:37,72.1 kg,0.3 kg,22.8,26.3 %,29.7 kg,3.5 kg,53.8 %,
" 13 Fév. 2022",
06:48,72.4 kg,0.2 kg,22.9,25.4 %,29.8 kg,3.6 kg,54.4 %,
*/
// Création d'un flux sans clef et valeur : chaîne de caractères.
StreamsBuilder builder = new StreamsBuilder();
builder.stream("poids_garmin_brut")
.foreach((k, v) -> {
LOGGER.info(v.toString());
});
KafkaStreams streams = new KafkaStreams(builder.build(), config());
streams.start();
// Fermer le flux Kafka quand la VM s'arrêtera, en faisant appeler
//streams.close();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
OUTPUT
2022-02-15 20:05:54 INFO ConsumerCoordinator:291 - [Consumer clientId=dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1-consumer, groupId=dev1] Adding newly assigned partitions: poids_garmin_brut-0
2022-02-15 20:05:54 INFO StreamThread:229 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] State transition from STARTING to PARTITIONS_ASSIGNED
2022-02-15 20:05:54 INFO ConsumerCoordinator:844 - [Consumer clientId=dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1-consumer, groupId=dev1] Setting offset for partition poids_garmin_brut-0 to the committed offset FetchPosition{offset=21, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[LAPTOP-J1JBHQUR:9092 (id: 0 rack: null)], epoch=0}}
2022-02-15 20:05:54 INFO StreamTask:240 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] task [0_0] Initialized
2022-02-15 20:05:54 INFO StreamTask:265 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] task [0_0] Restored and ready to run
2022-02-15 20:05:54 INFO StreamThread:882 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] Restoration took 30 ms for all tasks [0_0]
2022-02-15 20:05:54 INFO StreamThread:229 - stream-thread [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING
2022-02-15 20:05:54 INFO KafkaStreams:332 - stream-client [dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b] State transition from REBALANCING to RUNNING
2022-02-15 20:05:54 INFO KafkaConsumer:2254 - [Consumer clientId=dev1-5e3fab76-51c7-41b5-aedf-99a4a071589b-StreamThread-1-consumer, groupId=dev1] Requesting the log end offset for poids_garmin_brut-0 in order to compute lag
2022-02-15 20:06:03 INFO Main:33 - Test22
2022-02-15 20:06:06 INFO Main:33 - Test23
QUESTION
I am working in the VDI of a company and they use their own artifactory for security reasons. Currently I am writing unit tests to perform tests for a function that deletes entries from a delta table. When I started, I received an error of unresolved dependencies, because my spark session was configured in a way that it would load jars from maven. I was able to solve this issue by loading these jars locally from /opt/spark/jars. Now my code looks like this:
class TestTransformation(unittest.TestCase):
@classmethod
def test_ksu_deletion(self):
self.spark = SparkSession.builder\
.appName('SPARK_DELETION')\
.config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.S3SingleDriverLogStore")\
.config("spark.jars", "/opt/spark/jars/delta-core_2.12-0.7.0.jar, /opt/spark/jars/hadoop-aws-3.2.0.jar")\
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")\
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")\
.getOrCreate()
os.environ["KSU_DELETION_OBJECT"]="UNITTEST/"
deltatable = DeltaTable.forPath(self.spark, "/projects/some/path/snappy.parquet")
deltatable.delete(col("DATE") < get_current()
However, I am getting the error message:
E py4j.protocol.Py4JJavaError: An error occurred while calling z:io.delta.tables.DeltaTable.forPath.
E : java.lang.NoSuchMethodError: org.apache.spark.sql.AnalysisException.(Ljava/lang/String;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;)V
Do you have any idea by what this is caused? I am assuming it has to do with the way I am configuring spark.sql.extions and/or the spark.sql.catalog, but to be honest, I am quite a newb in Spark. I would greatly appreciate any hint.
Thanks a lot in advance!
Edit: We are using Spark 3.0.2 (Scala 2.12.10). According to https://docs.delta.io/latest/releases.html, this should be compatible. Apart from the SparkSession, I trimmed down the subsequent code to
df.spark.read.parquet(Path/to/file.snappy.parquet)
and now I am getting the error message
java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DeltaDelete has interface org.apache.spark.sql.catalyst.plans.logical.UnaryNode as super class
As I said, I am quite new to (Py)Spark, so please dont hesitate to mention things you consider completely obvious.
Edit 2: I checked the Python path I am exporting in the Shell before running the code and I can see the following: Could this cause any problem? I dont understand why I do not get this error when running the code within pipenv (with spark-submit)
ANSWER
Answered 2022-Feb-14 at 10:18It looks like that you're using incompatible version of the Delta lake library. 0.7.0 was for Spark 3.0, but you're using another version - either lower, or higher. Consult Delta releases page to find mapping between Delta version & required Spark versions.
If you're using Spark 3.1 or 3.2, consider using delta-spark Python package that will install all necessary dependencies, so you just import DeltaTable
class.
Update: Yes, this happens because of the conflicting versions - you need to remove delta-spark
and pyspark
Python package, and install pyspark==3.0.2
explicitly.
P.S. Also, look onto pytest-spark package that can simplify specification of configuration for all tests. You can find examples of it + Delta here.
QUESTION
I'm working on exporting data from Foundry datasets in parquet format using various Magritte export tasks to an ABFS system (but the same issue occurs with SFTP, S3, HDFS, and other file based exports).
The datasets I'm exporting are relatively small, under 512 MB in size, which means they don't really need to be split across multiple parquet files, and putting all the data in one file is enough. I've done this by ending the previous transform with a .coalesce(1)
to get all of the data in a single file.
The issues are:
- By default the file name is
part-0000-.snappy.parquet
, with a different rid on every build. This means that, whenever a new file is uploaded, it appears in the same folder as an additional file, the only way to tell which is the newest version is by last modified date. - Every version of the data is stored in my external system, this takes up unnecessary storage unless I frequently go in and delete old files.
All of this is unnecessary complexity being added to my downstream system, I just want to be able to pull the latest version of data in a single step.
ANSWER
Answered 2022-Jan-13 at 15:27This is possible by renaming the single parquet file in the dataset so that it always has the same file name, that way the export task will overwrite the previous file in the external system.
This can be done using raw file system access. The write_single_named_parquet_file
function below validates its inputs, creates a file with a given name in the output dataset, then copies the file in the input dataset to it. The result is a schemaless output dataset that contains a single named parquet file.
Notes
- The build will fail if the input contains more than one parquet file, as pointed out in the question, calling
.coalesce(1)
(or.repartition(1)
) is necessary in the upstream transform - If you require transaction history in your external store, or your dataset is much larger than 512 MB this method is not appropriate, as only the latest version is kept, and you likely want multiple parquet files for use in your downstream system. The
createTransactionFolders
(put each new export in a different folder) andflagFile
(create a flag file once all files have been written) options can be useful in this case. - The transform does not require any spark executors, so it is possible to use
@configure()
to give it a driver only profile. Giving the driver additional memory should fix out of memory errors when working with larger datasets. shutil.copyfileobj
is used because the 'files' that are opened are actually just file objects.
Full code snippet
example_transform.py
from transforms.api import transform, Input, Output
import .utils
@transform(
output=Output("/path/to/output"),
source_df=Input("/path/to/input"),
)
def compute(output, source_df):
return utils.write_single_named_parquet_file(output, source_df, "readable_file_name")
utils.py
from transforms.api import Input, Output
import shutil
import logging
log = logging.getLogger(__name__)
def write_single_named_parquet_file(output: Output, input: Input, file_name: str):
"""Write a single ".snappy.parquet" file with a given file name to a transforms output, containing the data of the
single ".snappy.parquet" file in the transforms input. This is useful when you need to export the data using
magritte, wanting a human readable name in the output, when not using separate transaction folders this should cause
the previous output to be automatically overwritten.
The input to this function must contain a single ".snappy.parquet" file, this can be achieved by calling
`.coalesce(1)` or `.repartition(1)` on your dataframe at the end of the upstream transform that produces the input.
This function should not be used for large dataframes (e.g. those greater than 512 mb in size), instead
transaction folders should be enabled in the export. This function can work for larger sizes, but you may find you
need additional driver memory to perform both the coalesce/repartition in the upstream transform, and here.
This produces a dataset without a schema, so features like expectations can't be used.
Parameters:
output (Output): The transforms output to write the single custom named ".snappy.parquet" file to, this is
the dataset you want to export
input (Input): The transforms input containing the data to be written to output, this must contain only one
".snappy.parquet" file (it can contain other files, for example logs)
file_name: The name of the file to be written, if the ".snappy.parquet" will be automatically appended if not
already there, and ".snappy" and ".parquet" will be corrected to ".snappy.parquet"
Raises:
RuntimeError: Input dataset must be coalesced or repartitioned into a single file.
RuntimeError: Input dataset file system cannot be empty.
Returns:
void: writes the response to output, no return value
"""
output.set_mode("replace") # Make sure it is snapshotting
input_files_df = input.filesystem().files() # Get all files
input_files = [row[0] for row in input_files_df.collect()] # noqa - first column in files_df is path
input_files = [f for f in input_files if f.endswith(".snappy.parquet")] # filter non parquet files
if len(input_files) > 1:
raise RuntimeError("Input dataset must be coalesced or repartitioned into a single file.")
if len(input_files) == 0:
raise RuntimeError("Input dataset file system cannot be empty.")
input_file_path = input_files[0]
log.info("Inital output file name: " + file_name)
# check for snappy.parquet and append if needed
if file_name.endswith(".snappy.parquet"):
pass # if it is already correct, do nothing
elif file_name.endswith(".parquet"):
# if it ends with ".parquet" (and not ".snappy.parquet"), remove parquet and append ".snappy.parquet"
file_name = file_name.removesuffix(".parquet") + ".snappy.parquet"
elif file_name.endswith(".snappy"):
# if it ends with just ".snappy" then append ".parquet"
file_name = file_name + ".parquet"
else:
# if doesn't end with any of the above, add ".snappy.parquet"
file_name = file_name + ".snappy.parquet"
log.info("Final output file name: " + file_name)
with input.filesystem().open(input_file_path, "rb") as in_f: # open the input file
with output.filesystem().open(file_name, "wb") as out_f: # open the output file
shutil.copyfileobj(in_f, out_f) # write the file into a new file
QUESTION
I'm fairly new with Delta and lakehouse on databricks. I have some questions, based on the following actions:
- I import some parquet files
- Convert them to delta (creating 1 snappy.parquet file)
- Delete one random row (creating 1 new snappy.parquet file).
- I check content of both snappy files (version 0 of delta table, and version1), and they both contain all of the data, each one with it's specific differences.
Does this mean delta simply duplicates data for every new version?
How is this scalable? or am I missing something?
ANSWER
Answered 2022-Feb-07 at 07:22Yes, that's how Delta lake works - when you're doing modification of the data, it won't write only delta, but takes the original file that is affected by change, make changes, and write it back. But take into account that not all data is duplicated - only that were in the file where affected rows are. For example, you have 3 data files, and you're making changes to some rows that are in the 2nd file. In this case, Delta will create a new file with number 4 that contains necessary changes + the rest of data from file 2, so you will have following versions:
- Version 0: files 1, 2 & 3
- Version 1: files, 1, 3 & 4
QUESTION
I am trying to call an OWL API java program through terminal and it crashes, while the exact same code is running ok when I run it in IntelliJ.
The exception that rises in my main code is this:
Exception in thread "main" java.lang.NoSuchMethodError: 'boolean org.semanticweb.owlapi.io.RDFResource.idRequiredForIndividualOrAxiom()'
at org.semanticweb.owlapi.rdf.rdfxml.renderer.RDFXMLRenderer.render(RDFXMLRenderer.java:204)
at org.semanticweb.owlapi.rdf.RDFRendererBase.render(RDFRendererBase.java:448)
at org.semanticweb.owlapi.rdf.RDFRendererBase.renderOntologyHeader(RDFRendererBase.java:441)
at org.semanticweb.owlapi.rdf.RDFRendererBase.render(RDFRendererBase.java:247)
at org.semanticweb.owlapi.rdf.rdfxml.renderer.RDFXMLStorer.storeOntology(RDFXMLStorer.java:51)
at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:142)
at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:106)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1347)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1333)
at com.stelios.JavaExplanations.main(JavaExplanations.java:112)
It seems as if calling idRequiredForIndividualOrAxiom()
on an RDFResource
object doesn't find the method that is inherited by RDFNode
class, but I have no clue why.
In order to post here, I kept only the saveOntology
call in a minimal example and the exception that is thrown is the same with extra steps:
Exception in thread "main" java.lang.NoSuchMethodError: 'boolean org.semanticweb.owlapi.io.RDFResource.idRequiredForIndividualOrAxiom()'
at org.semanticweb.owlapi.rdf.rdfxml.renderer.RDFXMLRenderer.render(RDFXMLRenderer.java:204)
at org.semanticweb.owlapi.rdf.rdfxml.renderer.RDFXMLRenderer.render(RDFXMLRenderer.java:249)
at org.semanticweb.owlapi.rdf.RDFRendererBase.renderEntity(RDFRendererBase.java:298)
at org.semanticweb.owlapi.rdf.RDFRendererBase.render(RDFRendererBase.java:292)
at org.semanticweb.owlapi.rdf.RDFRendererBase.lambda$renderEntities$6(RDFRendererBase.java:285)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1033)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.semanticweb.owlapi.rdf.RDFRendererBase.renderEntities(RDFRendererBase.java:285)
at org.semanticweb.owlapi.rdf.RDFRendererBase.renderInOntologySignatureEntities(RDFRendererBase.java:269)
at org.semanticweb.owlapi.rdf.RDFRendererBase.renderOntologyComponents(RDFRendererBase.java:253)
at org.semanticweb.owlapi.rdf.RDFRendererBase.render(RDFRendererBase.java:248)
at org.semanticweb.owlapi.rdf.rdfxml.renderer.RDFXMLStorer.storeOntology(RDFXMLStorer.java:51)
at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:142)
at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:106)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1347)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1333)
at com.stelios.JavaExplanations.main(JavaExplanations.java:47)
In both my original code and the minimal example I call java with: java -cp /home/stelios/java_explanations/target/java_explanations-1.0-SNAPSHOT-jar-with-dependencies.jar com.stelios.JavaExplanations
Here is the minimal example that repeats this behavior for me. This is the Java code:
package com.stelios;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.util.*;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.*;
import org.semanticweb.owlapi.model.*;
public class JavaExplanations {
public static void main(String[] args) throws OWLOntologyCreationException, FileNotFoundException, OWLOntologyStorageException {
String ontology1 = "/home/stelios/Desktop/huiyfgds/ONTO_ASRTD_hz162pai";
String ontology2 = "/home/stelios/Desktop/huiyfgds/ONTO_INFRD_hz162pai";
OWLOntologyManager ontology_manager = OWLManager.createOWLOntologyManager();
OWLOntology asserted_ontology = ontology_manager.loadOntologyFromOntologyDocument(new File(ontology1));
ontology_manager.saveOntology(asserted_ontology, new StreamDocumentTarget(new FileOutputStream(ontology2)));
}
}
This is the pom.xml
in IntelliJ:
4.0.0
com.stelios.expl
java_explanations
1.0-SNAPSHOT
11
11
net.sourceforge.owlapi
owlexplanation
5.0.0
net.sourceforge.owlapi
owlapi-distribution
5.1.9
net.sourceforge.owlapi
org.semanticweb.hermit
1.4.5.519
org.slf4j
slf4j-api
1.7.32
org.slf4j
slf4j-nop
1.7.32
org.apache.maven.plugins
maven-jar-plugin
src/main/resources/META-INF/MANIFEST.MF
maven-assembly-plugin
package
single
jar-with-dependencies
src/main/java
**/*.java
I think that most probably it is some dependency/version error but I don't see how this can be. I package everything I need in the jar file I give as classpath, defining the wanted versions in pom.xml, and in this jar I can find only one org/semanticweb/owlapi/io/RDFResource.class file.
Reading this and this I thought about having 2 different versions of OWL API, as I had another .jar with OWL API version 3.4.9 in it, in the directory tree. I moved the file and rebuilt the maven package just to be sure, and (as expected) no change.
Other than the saveOntology() call, my original program is working as intended.
The only thing out of the ordinary is that IntelliJ is giving me a Plugin 'maven-assembly-plugin:' not found
problem, which I haven't managed to solve in any way, and have been ignoring as it hasn't been an issue in any of the operations I have needed. (If you know how to solve it of course, give me suggestions, but my main problem is the earlier mentioned exception).
EDIT Here is the mvn dependency:tree
output.
[INFO] Scanning for projects...
[INFO]
[INFO] -----------------< com.stelios.expl:java_explanations >-----------------
[INFO] Building java_explanations 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ java_explanations ---
[INFO] com.stelios.expl:java_explanations:jar:1.0-SNAPSHOT
[INFO] +- net.sourceforge.owlapi:owlexplanation:jar:5.0.0:compile
[INFO] | +- net.sourceforge.owlapi:owlapi-api:jar:5.1.19:compile (version selected from constraint [5.0.0,5.9.9])
[INFO] | | \- javax.inject:javax.inject:jar:1:compile
[INFO] | +- net.sourceforge.owlapi:owlapi-tools:jar:5.1.19:compile (version selected from constraint [5.0.0,5.9.9])
[INFO] | \- net.sourceforge.owlapi:telemetry:jar:5.0.0:compile
[INFO] | \- net.sourceforge.owlapi:owlapi-parsers:jar:5.1.19:compile (version selected from constraint [5.0.0,5.9.9])
[INFO] +- net.sourceforge.owlapi:owlapi-distribution:jar:5.1.9:compile
[INFO] | +- net.sourceforge.owlapi:owlapi-compatibility:jar:5.1.9:compile
[INFO] | | \- net.sourceforge.owlapi:owlapi-apibinding:jar:5.1.9:compile
[INFO] | | +- net.sourceforge.owlapi:owlapi-impl:jar:5.1.9:compile
[INFO] | | +- net.sourceforge.owlapi:owlapi-oboformat:jar:5.1.9:compile
[INFO] | | \- net.sourceforge.owlapi:owlapi-rio:jar:5.1.9:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-core:jar:2.9.7:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.7:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.7:compile
[INFO] | +- org.apache.commons:commons-rdf-api:jar:0.5.0:compile
[INFO] | +- org.tukaani:xz:jar:1.6:compile
[INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.22:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-model:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-api:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-languages:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-datatypes:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-binary:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-n3:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-nquads:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-ntriples:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-rdfjson:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-jsonld:jar:2.3.2:compile
[INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.5.2:compile
[INFO] | | | \- org.apache.httpcomponents:httpcore:jar:4.4.4:compile
[INFO] | | \- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-rdfxml:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-trix:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-turtle:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-rio-trig:jar:2.3.2:compile
[INFO] | +- org.eclipse.rdf4j:rdf4j-util:jar:2.3.2:compile
[INFO] | +- com.github.jsonld-java:jsonld-java:jar:0.12.0:compile
[INFO] | | +- org.apache.httpcomponents:httpclient-osgi:jar:4.5.5:compile
[INFO] | | | +- org.apache.httpcomponents:httpmime:jar:4.5.5:compile
[INFO] | | | \- org.apache.httpcomponents:fluent-hc:jar:4.5.5:compile
[INFO] | | \- org.apache.httpcomponents:httpcore-osgi:jar:4.4.9:compile
[INFO] | | \- org.apache.httpcomponents:httpcore-nio:jar:4.4.9:compile
[INFO] | +- com.github.vsonnier:hppcrt:jar:0.7.5:compile
[INFO] | +- com.github.ben-manes.caffeine:caffeine:jar:2.6.1:compile
[INFO] | +- com.google.guava:guava:jar:22.0:compile (version selected from constraint [18.0,22.0])
[INFO] | | +- com.google.errorprone:error_prone_annotations:jar:2.0.18:compile
[INFO] | | +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] | | \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.14:compile
[INFO] | +- com.google.code.findbugs:jsr305:jar:3.0.2:compile (version selected from constraint [2.0.0,4))
[INFO] | \- commons-io:commons-io:jar:2.5:compile
[INFO] +- net.sourceforge.owlapi:org.semanticweb.hermit:jar:1.4.5.519:compile
[INFO] | +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] | +- org.apache.ws.commons.axiom:axiom-api:jar:1.2.14:compile
[INFO] | | +- org.apache.geronimo.specs:geronimo-activation_1.1_spec:jar:1.1:compile
[INFO] | | +- org.apache.geronimo.specs:geronimo-javamail_1.4_spec:jar:1.7.1:compile
[INFO] | | +- jaxen:jaxen:jar:1.1.4:compile
[INFO] | | +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
[INFO] | | \- org.apache.james:apache-mime4j-core:jar:0.7.2:compile
[INFO] | +- org.apache.ws.commons.axiom:axiom-c14n:jar:1.2.14:compile
[INFO] | +- org.apache.ws.commons.axiom:axiom-impl:jar:1.2.14:compile
[INFO] | | \- org.codehaus.woodstox:woodstox-core-asl:jar:4.1.4:compile
[INFO] | | \- org.codehaus.woodstox:stax2-api:jar:3.1.1:compile
[INFO] | +- org.apache.ws.commons.axiom:axiom-dom:jar:1.2.14:compile
[INFO] | +- dk.brics.automaton:automaton:jar:1.11-8:compile
[INFO] | +- gnu.getopt:java-getopt:jar:1.0.13:compile
[INFO] | \- net.sf.trove4j:trove4j:jar:3.0.3:compile
[INFO] +- org.slf4j:slf4j-api:jar:1.7.22:compile
[INFO] +- org.slf4j:slf4j-nop:jar:1.7.32:compile
[INFO] \- org.apache.maven.plugins:maven-assembly-plugin:maven-plugin:3.3.0:compile
[INFO] +- org.apache.maven:maven-plugin-api:jar:3.0:compile
[INFO] | \- org.sonatype.sisu:sisu-inject-plexus:jar:1.4.2:compile
[INFO] | \- org.sonatype.sisu:sisu-inject-bean:jar:1.4.2:compile
[INFO] | \- org.sonatype.sisu:sisu-guice:jar:noaop:2.1.7:compile
[INFO] +- org.apache.maven:maven-core:jar:3.0:compile
[INFO] | +- org.apache.maven:maven-settings:jar:3.0:compile
[INFO] | +- org.apache.maven:maven-settings-builder:jar:3.0:compile
[INFO] | +- org.apache.maven:maven-repository-metadata:jar:3.0:compile
[INFO] | +- org.apache.maven:maven-model-builder:jar:3.0:compile
[INFO] | +- org.apache.maven:maven-aether-provider:jar:3.0:runtime
[INFO] | +- org.sonatype.aether:aether-impl:jar:1.7:compile
[INFO] | | \- org.sonatype.aether:aether-spi:jar:1.7:compile
[INFO] | +- org.sonatype.aether:aether-api:jar:1.7:compile
[INFO] | +- org.sonatype.aether:aether-util:jar:1.7:compile
[INFO] | +- org.codehaus.plexus:plexus-classworlds:jar:2.2.3:compile
[INFO] | +- org.codehaus.plexus:plexus-component-annotations:jar:1.5.5:compile
[INFO] | \- org.sonatype.plexus:plexus-sec-dispatcher:jar:1.3:compile
[INFO] | \- org.sonatype.plexus:plexus-cipher:jar:1.4:compile
[INFO] +- org.apache.maven:maven-artifact:jar:3.0:compile
[INFO] +- org.apache.maven:maven-model:jar:3.0:compile
[INFO] +- org.apache.maven.shared:maven-common-artifact-filters:jar:3.1.0:compile
[INFO] | \- org.apache.maven.shared:maven-shared-utils:jar:3.1.0:compile
[INFO] +- org.apache.maven.shared:maven-artifact-transfer:jar:0.11.0:compile
[INFO] +- org.codehaus.plexus:plexus-interpolation:jar:1.25:compile
[INFO] +- org.codehaus.plexus:plexus-archiver:jar:4.2.1:compile
[INFO] | +- org.apache.commons:commons-compress:jar:1.19:compile
[INFO] | \- org.iq80.snappy:snappy:jar:0.4:compile
[INFO] +- org.apache.maven.shared:file-management:jar:3.0.0:compile
[INFO] +- org.apache.maven.shared:maven-shared-io:jar:3.0.0:compile
[INFO] | +- org.apache.maven:maven-compat:jar:3.0:compile
[INFO] | \- org.apache.maven.wagon:wagon-provider-api:jar:2.10:compile
[INFO] +- org.apache.maven.shared:maven-filtering:jar:3.1.1:compile
[INFO] | \- org.sonatype.plexus:plexus-build-api:jar:0.0.7:compile
[INFO] +- org.codehaus.plexus:plexus-io:jar:3.2.0:compile
[INFO] +- org.apache.maven:maven-archiver:jar:3.5.0:compile
[INFO] +- org.codehaus.plexus:plexus-utils:jar:3.3.0:compile
[INFO] \- commons-codec:commons-codec:jar:1.6:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.339 s
[INFO] Finished at: 2022-01-27T13:06:01+02:00
[INFO] ------------------------------------------------------------------------
Process finished with exit code 0
ANSWER
Answered 2022-Jan-31 at 10:43As can be seen in the comments of the post, my problem is fixed, so I thought I'd collect a closing answer here to not leave the post pending.
The actual solution: As explained here nicely by @UninformedUser, the issue was that I had conflicting maven package versions in my dependencies. Bringing everything in sync with each other solved the issue.
Incidental solution: As I wrote in the comments above, specifically defining 3.3.0
for the maven-assembly-plugin
happened to solve the issue. But this was only chance, as explained here by @Ignazio, just because the order of "assembling" things changed, overwriting the conflicting package.
Huge thanks to both for the help.
QUESTION
I have a Parquet file in AWS S3. I would like to read it into a Pandas DataFrame. There are two ways for me to accomplish this.
1)
import pyarrow.parquet as pq
table = pq.read_table("s3://tpc-h-parquet/lineitem/part0.snappy.parquet") (takes 1 sec)
pandas_table = table.to_pandas() ( takes 1 sec !!! )
2)
import pandas as pd
table = pd.read_parquet("s3://tpc-h-parquet/lineitem/part0.snappy.parquet") (takes 2 sec)
I suspect option 2 is really just doing option 1 under the hood anyways.
What is the fastest way for me to read a Parquet file into Pandas?
ANSWER
Answered 2022-Jan-26 at 19:16You are correct. Option 2 is just option 1 under the hood.
What is the fastest way for me to read a Parquet file into Pandas?
Both option 1 and option 2 are probably good enough. However, if you are trying to shave off every bit you may need to go one layer deeper, depending on your pyarrow version. It turns out that Option 1 is actually also just a proxy, in this case to the datasets API:
import pyarrow.dataset as ds
dataset = ds.dataset("s3://tpc-h-parquet/lineitem/part0.snappy.parquet")
table = dataset.to_table(use_threads=True)
df = table.to_pandas()
For pyarrow versions >= 4 and < 7 you can usually get slightly better performance on S3 using the asynchronous scanner:
import pyarrow.dataset as ds
dataset = ds.dataset("s3://tpc-h-parquet/lineitem/part0.snappy.parquet")
table = dataset.to_table(use_threads=True, use_async=True)
df = table.to_pandas()
In pyarrow version 7 the asynchronous scanner is the default so you can once again simply use pd.read_parquet("s3://tpc-h-parquet/lineitem/part0.snappy.parquet")
QUESTION
I am getting the same error as this question, but the recommended solution of setting blocksize=None
isn't solving the issue for me. I'm trying to convert the NYC taxi data from CSV to Parquet and this is the code I'm running:
ddf = dd.read_csv(
"s3://nyc-tlc/trip data/yellow_tripdata_2010-*.csv",
parse_dates=["pickup_datetime", "dropoff_datetime"],
blocksize=None,
dtype={
"tolls_amount": "float64",
"store_and_fwd_flag": "object",
},
)
ddf.to_parquet(
"s3://coiled-datasets/nyc-tlc/2010",
engine="pyarrow",
compression="snappy",
write_metadata_file=False,
)
Here's the error I'm getting:
"ParserError: Error tokenizing data. C error: Expected 18 fields in line 2958, saw 19".
Adding blocksize=None
helps sometimes, see here for example, and I'm not sure why it's not solving my issue.
Any suggestions on how to get past this issue?
This code works for the 2011 taxi data, so their must be something weird in the 2010 taxi data that's causing this issue.
ANSWER
Answered 2022-Jan-19 at 17:08The raw file s3://nyc-tlc/trip data/yellow_tripdata_2010-02.csv
contains an error (one too many commas). This is the offending line (middle) and its neighbours:
VTS,2010-02-16 08:02:00,2010-02-16 08:14:00,5,4.2999999999999998,-73.955112999999997,40.786718,1,,-73.924710000000005,40.841335000000001,CSH,11.699999999999999,0,0.5,0,0,12.199999999999999
CMT,2010-02-24 16:25:18,2010-02-24 16:52:14,1,12.4,-73.988956000000002,40.736567000000001,1,,,-73.861762999999996,40.768383999999998,CAS,29.300000000000001,1,0.5,0,4.5700000000000003,35.369999999999997
VTS,2010-02-16 07:58:00,2010-02-16 08:09:00,1,2.9700000000000002,-73.977469999999997,40.779359999999997,1,,-74.004427000000007,40.742137999999997,CRD,9.3000000000000007,0,0.5,1.5,0,11.300000000000001
Some of the options are:
on_bad_lines
kwarg to pandas can be set towarn
orskip
(so this should be also possible withdask.dataframe
;fix the raw file (knowing where the error is) with something like
sed
(assuming you can modify the raw files) or on the fly by reading the file line by line.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install snappy
Support
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesExplore Kits - Develop, implement, customize Projects, Custom Functions and Applications with kandi kits
Save this library and start creating your kit
Share this Page