问题:报错:return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Job failed with org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 57, had
oop03, executor 2): UnknownReason
解决:因为语句中collect_set(named_struct))顺序有错
方法:
select
ID,
GH,
XM,
ZCXX,
SFZB,
DQZT,
SFCZBC,
LXDH,
BGDD,
BGDH,
DZYX
from
(
select
ID,
GH,
XM
from ods_jzgxx
where dt='2024-04-10'
)ojzg
left join
(
select
JZGBH,
JSGH,
SFZB,
DQZT,
SFCZBC
from ods_jzgzxxx
where dt='2024-04-10'
)ojzgzxxx
on ojzg.ID=ojzgzxxx.JZGBH
left join
(
select
JSGH,
LXDH,
BGDD,
BGDH,
DZYX
from ods_jzglxfs
where JSGH is not null and dt='2024-04-10'
)ojzglxfs
on ojzg.GH=ojzglxfs.JSGH
left join
(
select
JZGBH,
collect_set(named_struct('SHZC',SHZC,'SHNY',SHNY,'BFDW',BFDW)) ZCXX
from
(
select
JZGBH,
SHZC,
SHNY,
BFDW
from ods_jzgzcxx
where dt='2024-04-10'
)tmp
group by JZGBH
)ojzgzcxx
on ojzg.ID=ojzgzcxx.JZGBH;
因为left join这里顺序不正确,如果把含有collect_set(named_struct))方法语句放前面就会报错,需要放在最后