Gobblin ERROR: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record:

Chhaya Vankhede :

I am tring to ingest data from mysql table to hdfs. but it is giving me below error

IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582873318919_0] 504 - Processing record incurs an unexpected exception:

java.lang.RuntimeException: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record: 
{"id":"1","name":"abc","password":"abc","derivedwatermarkcolumn":"abc"}
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:647)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    ... 22 more
IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582893709536_0] 567 - Task task_GobblinMySql_1582893709536_0 failed
java.lang.RuntimeException: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:505)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    ... 12 more


below is the record schema

IST INFO  [JobScheduler-0] org.apache.gobblin.source.jdbc.JdbcExtractor [demo_user_1582893709536_0] 361 - Schema:[

{"columnName":"id","dataType":{"type":"int"},"isWaterMark":false,"primaryKey":1,"length":0,"precision":10,"scale":0,"isNullabl
e":false,"format":"","comment":"","isUnique":false},

{"columnName":"name","dataType":"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"password","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"derivedwatermarkcolumn","dataType":{"type":"timestamp"},"isWaterMark":true,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNul
lable":false,"comment":"Default watermark column","isUnique":false}]

The datatype of the watermark derivedwatermarkcolumn is timestamp but in record it is string 'abc'.

The job and properties files are as below.

mysql.pull

# Job properties
job.name=GobblinMySql
job.group=MySql
job.description=Data pull from MySql
job.lock.enabled=False


# Extract properties
extract.namespace=demo
extract.table.type=snapshot_only
extract.table.name=user
extract.delta.fields=name,password
extract.primary.key.fields=id

# Property to consider the extract as full dump
extract.is.full=true

# Source properties
source.querybased.schema=demo
source.entity=user
source.querybased.extract.type=snapshot

mysql.properties

# Source properties - source class to extract data from Mysql Source
source.class=org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource

# Source properties
source.max.number.of.partitions=1
source.querybased.partition.interval=1
source.querybased.is.compression=false
source.querybased.watermark.type=timestamp

# Source connection properties
source.conn.driver=com.mysql.jdbc.Driver
source.conn.username=root
source.conn.password=root
source.conn.host=localhost
source.conn.port=3306
source.conn.timeout=1500

# Converter properties - Record from mysql source will be processed by the below series of converters
converter.classes=org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter

# date columns format
converter.avro.timestamp.format=YYYY-MM-DD HH:MM:SS
converter.avro.date.format=yyyy-MM-dd
converter.avro.time.format=HH:mm:ss

# Qualitychecker properties
qualitychecker.task.policies=org.apache.gobblin.policies.count.RowCountPolicy,org.apache.gobblin.policies.schema.SchemaCompatibilityPolicy
qualitychecker.task.policy.types=OPTIONAL,OPTIONAL

# Publisher properties
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher

What is causing this error in config file? Please help if anyone know.

alex :

Looks like the name of the watermark column comes from extract.delta.fields property. In your example, it's set to "name,password", so the name is treated as a watermark. Try setting it to "derivedwatermarkcolumn".

How I found this: I've looked through the code of MysqlSource class to find where the watermark was mentioned, and then used IntelliJ's inspector to find out where the data is coming from. You can get it through a context menu -> Analyze -> Analyze data flow to here.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=19388&siteId=1