Hive Schema Tool元数据运维

Hive Schema Tool元数据运维

Hive Schema存在的问题

较早的Hive版本,不会在MetaStore中写入版本号。所以升级到新版本之后,会报错:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

在日志中会提示以下信息:

Caused by: MetaException(message:Version information not found in metastore. )

这种情况,可以在较早的Hive中设置hive.metastore.schema.verification=true,来开启版本号的写入。

但在版本升级时遇到了这种情况,就需要使用Hive Schema Tool来解决了。

什么是Hive Schema Tool

Hive提供Hive Schema Tool用于MetaSore Schema的运维修复、升级。

$ schematool -help
usage: schemaTool
 -dbType <databaseType>             Metastore database type
 -driver <driver>                   Driver name for connection
 -dryRun                            List SQL scripts (no execute)
 -help                              Print this message
 -info                              Show config and schema details
 -initSchema                        Schema initialization
 -initSchemaTo <initTo>             Schema initialization to a version
 -metaDbType <metaDatabaseType>     Used only if upgrading the system catalog for hive
 -passWord <password>               Override config file password
 -upgradeSchema                     Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -url <url>                         Connection url to the database
 -userName <user>                   Override config file user name
 -verbose                           Only print SQL statements
(Additional catalog related options added in Hive 3.0.0 (HIVE-19135] release are below.
 -createCatalog <catalog>       Create catalog with given name
 -catalogLocation <location>        Location of new catalog, required when adding a catalog
 -catalogDescription <description>  Description of new catalog
 -ifNotExists                       If passed then it is not an error to create an existing catalog
 -moveDatabase <database>                     Move a database between catalogs.  All tables under it would still be under it as part of new catalog. Argument is the database name. Requires --fromCatalog and --toCatalog parameters as well
 -moveTable  <table>                Move a table to a different database.  Argument is the table name. Requires --fromCatalog, --toCatalog, --fromDatabase, and --toDatabase 
 -toCatalog  <catalog>              Catalog a moving database or table is going to.  This is required if you are moving a database or table.
 -fromCatalog <catalog>             Catalog a moving database or table is coming from.  This is required if you are moving a database or table.
 -toDatabase  <database>            Database a moving table is going to.  This is required if you are moving a table.
 -fromDatabase <database>           Database a moving table is coming from.  This is required if you are moving a table.

支持derby|mysql|postgres|oracle|mssql这几种dbtype类型。

Hive Schema Tool的使用

以下是Hive Schema Tool的官方使用演示。

  1. 初始化元数据信息,在数据库derby中生成Shema数据

    schematool -dbType derby -initSchema
    
  2. 获取元数据Schema信息

    schematool -dbType derby -info
    
  3. 将元数据Schema信息升级到当前版本,upgradeSchemaFrom参数指定旧的hive版本

    schematool -dbType derby -upgradeSchemaFrom 0.10.0
    
  4. 将元数据Schema信息升级到当前版本,并查看升级所需要的脚本

    schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun
    
  5. 将hive元数据信息迁移到spark目录中

    schematool -moveDatabase db1 -fromCatalog hive -toCatalog spark
    
  6. 将Hive数据库和表迁移到Spark中

    # 在spark中创建对应数据库newdb,用于接收hive迁移来的数据库
    beeline ... -e "create database if not exists newdb";
    # 进行数据库迁移
    schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark
    # 进行表数据迁移
    schematool -moveTable table1 -fromCatalog hive -toCatalog spark  -fromDatabase db1 -toDatabase newdb
    

Hive Schema Tool解决Hive元数据问题十分方便,而且还支持数据迁移到Spark,当真是一款运维利器。

结束语

如果有帮助的,记得点赞、关注。在公众号《数舟》中,可以免费获取专栏《数据仓库》配套的视频课程、大数据集群自动安装脚本,并获取进群交流的途径。

我所有的大数据技术内容也会优先发布到公众号中。如果对某些大数据技术有兴趣,但没有充足的时间,在群里提出,我为大家安排分享。

公众号自取:

Hive Schema Tool元数据运维

上一篇:关于 PyXLL的使用问题


下一篇:eclipse安装springsources-tool-suits失败问题汇总