EXPLAIN语句执行问题

Viewed 4

最近在做数据血缘的相关工作,对explain语句的解析有两个问题

  1. EXPLAIN VERBOSE语句在查询insert语句的时候,OLAP TABLE SINK输出的TUPLE ID对应的TupleDescriptor的tbl内容为null
    如SQL语句
EXPLAIN VERBOSE
INSERT into data_platform.test1 select
    t1.uid as ID,
    max(dt) as create_time,
    min(dt) as update_time
from data_platform.test t1
left join data_platform.ods_test_wh_1 t2 on t1.uid=t2.AGE
where uid > 10
group by uid
order by uid

获得的结果为

PLAN FRAGMENT 0
  OUTPUT EXPRS:
    id[#15]
    create_time[#16]
    update_time[#17]
  PARTITION: HASH_PARTITIONED: uid[#5]

  HAS_COLO_PLAN_NODE: true

  OLAP TABLE SINK
    TUPLE ID: 7
    RANDOM
    IS_PARTIAL_UPDATE: false
  5:VAGGREGATE (update finalize)(543)
  |  output: max(dt[#11])[#13], min(dt[#11])[#14]
  |  group by: uid[#10]
  |  cardinality=950,123
  |  final projections: uid[#12], create_time[#13], update_time[#14]
  |  final project output tuple id: 6
  |  distribute expr lists: uid[#10]
  |  tuple ids: 5 
  |  
  4:VHASH JOIN(533)
  |  join op: LEFT OUTER JOIN(PARTITIONED)[]
  |  equal join conjunct: (uid[#5] = expr_cast(AGE as INT)[#4])
  |  cardinality=2,999,986
  |  vec output tuple id: 4
  |  output tuple id: 4
  |  vIntermediate tuple ids: 3 
  |  hash output slot ids: 5 6 
  |  isMarkJoin: false
  |  final projections: uid[#7], dt[#8]
  |  final project output tuple id: 4
  |  distribute expr lists: uid[#5]
  |  distribute expr lists: expr_cast(AGE as INT)[#4]
  |  tuple ids: 2 1N 
  |  
  |----1:VEXCHANGE
  |       offset: 0
  |       distribute expr lists: 
  |       tuple ids: 1N 
  |    
  3:VEXCHANGE
     offset: 0
     distribute expr lists: 
     tuple ids: 2 

PLAN FRAGMENT 1

  PARTITION: RANDOM

  HAS_COLO_PLAN_NODE: false

  STREAM DATA SINK
    EXCHANGE ID: 03
    HASH_PARTITIONED: uid[#5]

  2:VOlapScanNode(502)
     TABLE: data_platform.test(test), PREAGGREGATION: ON
     PREDICATES: (uid[#5] > 10)
     partitions=1/1 (test)
     tablets=10/10, tabletList=99467678,99467694,99467710 ...
     cardinality=3000017, avgRowSize=0.0, numNodes=1
     pushAggOp=NONE
     tuple ids: 2 

PLAN FRAGMENT 2

  PARTITION: HASH_PARTITIONED: USER[#0]

  HAS_COLO_PLAN_NODE: false

  STREAM DATA SINK
    EXCHANGE ID: 01
    HASH_PARTITIONED: expr_cast(AGE as INT)[#4]

  0:VOlapScanNode(513)
     TABLE: data_platform.ods_test_wh_1(ods_test_wh_1), PREAGGREGATION: ON
     PREDICATES: ((CAST(AGE[#1] AS int) > 10) AND (__DORIS_DELETE_SIGN__[#2] = 0))
     partitions=1/1 (ods_test_wh_1)
     tablets=10/10, tabletList=167939628,167939632,167939636 ...
     cardinality=64, avgRowSize=0.0, numNodes=1
     pushAggOp=NONE
     final projections: CAST(AGE[#1] AS int)
     final project output tuple id: 1
     tuple ids: 0 

Tuples:
TupleDescriptor{id=0, tbl=ods_test_wh_1}
  SlotDescriptor{id=1, col=AGE, colUniqueId=1, type=tinyint, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=2, col=__DORIS_DELETE_SIGN__, colUniqueId=2, type=tinyint, nullable=false, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=1, tbl=ods_test_wh_1}
  SlotDescriptor{id=4, col=null, colUniqueId=null, type=int, nullable=true, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=2, tbl=test}
  SlotDescriptor{id=5, col=uid, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=6, col=dt, colUniqueId=1, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=3, tbl=null}
  SlotDescriptor{id=7, col=uid, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=8, col=dt, colUniqueId=1, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=9, col=null, colUniqueId=null, type=int, nullable=true, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=4, tbl=null}
  SlotDescriptor{id=10, col=uid, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=11, col=dt, colUniqueId=1, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=5, tbl=null}
  SlotDescriptor{id=12, col=uid, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=13, col=null, colUniqueId=null, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=14, col=null, colUniqueId=null, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=6, tbl=null}
  SlotDescriptor{id=15, col=uid, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=16, col=null, colUniqueId=null, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=17, col=null, colUniqueId=null, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}

TupleDescriptor{id=7, tbl=null}
  SlotDescriptor{id=18, col=id, colUniqueId=0, type=int, nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=19, col=create_time, colUniqueId=1, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}
  SlotDescriptor{id=20, col=update_time, colUniqueId=2, type=datetimev2(0), nullable=true, isAutoIncrement=false, subColPath=null}

此处查阅了fe-core的代码,其中org.apache.doris.nereids.glue.translator.PhysicalPlanTranslator#visitPhysicalOlapTableSink方法中olapTuple对象生成后未设置table成员,此处应可以直接设置参数中olapTableSink的olapTableSink.getTargetTable()方法获取table并赋值到olapTuple,所以此处是否为bug
2. 查阅fe-core的DorisParser.g4文件时,EXPLAIN应该支持plantype和level的解析

explain
    : explainCommand planType?
          level=(VERBOSE | TREE | GRAPH | PLAN)?
          PROCESS?
    ;
planType
    : PARSED
    | ANALYZED
    | REWRITTEN | LOGICAL  // same type
    | OPTIMIZED | PHYSICAL   // same type
    | SHAPE
    | MEMO
    | ALL // default type
    ;

但实际无法执行形如EXPLAIN MEMO PLAN这样的语句,会报语法错误,请问是什么原因

1 Answers

用EXPLAIN做血缘,解析太麻烦了。
用sqllineage 这个python库去解析doris sql表和字段的关系,就用mysql语法去解析就好!