ClickHouse-005建库建表以及数据导入导出测试

## 一、创建数据库 **语法:** ``` CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] [ENGINE = engine(...)] CREATE DATABASE testdb; //创建数据库 DROP DATABASE testdb; //删除数据库 ``` ## 二、建表 ``` CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] ( name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [compression_codec] [TTL expr1], name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [compression_codec] [TTL expr2], ... ) ENGINE = engine ``` ``` CREATE TABLE test_table( province String, province_name String, create_date date ) ENGINE = MergeTree(create_date, (province), 8192); ``` **如果直接这样执行会报错,有两种方法解决:** 1: 在每一行后面加右斜杠,比如: ``` [root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn CREATE TABLE test_table01( \ province String, \ province_name String, \ create_date date \ ) ENGINE = MergeTree(create_date, (province), 8192); ``` 2: 在登录的时候加 -m参数支持多行模式,比如: ``` clickhouse-client -m [root@tidb06 ~]# clickhouse-client -udefault -m --password=j780UJy9D2tn CREATE TABLE test_table( province String, province_name String, create_date date ) ENGINE = MergeTree(create_date, (province), 8192); ``` **对建表sql的格式要求说明:** ENGINE:是表的引擎类型,最常用的MergeTree。还有一个Log引擎也是比较常用。MergeTree要求有一个日期字段,还有主键。Log没有这个限制。 create_date:是表的日期字段,一个表必须要有一个日期字段。 province:是表的主键,主键可以有多个字段,每个字段用逗号分隔 8192:是索引粒度,用默认值8192即可。 ## 三、导入数据 ### 3.1:普通的CSV文件导入 ``` cat > test_table.csv << EOF WA,WA_NAME,2020-08-25 CA,CA_NAME,2020-09-25 OR,OR_NAME,2020-10-25 EOF ``` **–-导数:** ``` clickhouse-client --query "INSERT INTO testdb.test_table FORMAT CSV" < test_table.csv; ``` **--或者用管道的方式:** ``` cat test_table.csv | clickhouse-client --query “INSERT INTO testdb.test_table FORMAT CSV” ``` **--测试演示:** ``` [root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table FORMAT CSV" < test_table.csv; [root@tidb06 ~]# [root@tidb06 ~]# cat test_table.csv | clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table01 FORMAT CSV" [root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table limit 2"; WA WA_NAME 2020-08-25 CA CA_NAME 2020-09-25 [root@tidb06 ~]# [root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table01 limit 2"; WA WA_NAME 2020-08-25 CA CA_NAME 2020-09-25 ``` ### 3.2:特殊的CSV文件导入(包含回车换行,转义符等) **说明:下载clickhouse官方提供的测试log引擎的表数据进行测试** 这是Yandex.Metrica 日志收集的表分为点击hits和访问表vsits,可以作为日志埋点的重要参考数据 **数据文件下载连接:** ``` curl https://clickhouse-datasets.s3.yandex.net/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv curl https://clickhouse-datasets.s3.yandex.net/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv ``` **关于这2个表的数据的恢复的官方介绍地址:** https://clickhouse.tech/docs/zh/getting-started/example-datasets/metrica/ **--建表:** ``` CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log; [root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "CREATE TABLE testdb01.test_table02 ( id1 UInt32, id2 Float32, name1 String, name2 String, date1 Date, date2 DateTime) ENGINE = Log;" ``` **--导入测试数据:** ``` [root@tidb06 ~]# cat test_table3.csv 1,123.456,”abc 123”,” abc" "'123”,2020-08-26,2020-08-26 17:08:09 [root@tidb06 ~]# cat test_table3.csv| clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO testdb01.test_table02 FORMAT CSV" [root@tidb06 ~]# [root@tidb06 ~]# clickhouse-client -udefault --password=j780UJy9D2tn --query "select * from testdb01.test_table02 limit 2"; 1 123.456 ”abc 123” ” abc" "\'123” 2020-08-26 2020-08-26 17:08:09 [root@tidb06 ~]# ``` **导入官方提供的测试数据:** ``` [root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv real 0m11.354s user 0m17.004s sys 0m0.890s [root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv real 0m33.984s user 0m54.570s sys 0m2.363s ``` **数据导入完后,优化一下表** ``` [root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.hits_v1 FINAL" real 0m20.816s user 0m0.017s sys 0m0.016s [root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "OPTIMIZE TABLE tutorial.visits_v1 FINAL" real 0m18.757s user 0m0.017s sys 0m0.016s ``` **优化完之后,查看下表数据量** ``` [root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.hits_v1" 8873898 real 0m0.189s user 0m0.018s sys 0m0.015s [root@tidb06 data1]# [root@tidb06 data1]# time clickhouse-client -udefault --password=j780UJy9D2tn -q "SELECT COUNT(*) FROM tutorial.visits_v1" 1676861 real 0m0.032s user 0m0.017s sys 0m0.014s ``` ### 4.导出数据: ``` [root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="select * from tutorial.visits_v1" > /data1/backup/tutorial.visits_v1.tsv real 0m13.066s user 0m11.303s sys 0m1.652s [root@tidb06 backup]# du -sh tutorial.visits_v1.tsv 2.5G tutorial.visits_v1.tsv ``` **4.1备份开启压缩:** ``` [root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="select * from tutorial.hits_v1" |gzip >/data1/backup/tutorial.hits_v1.tsv.gz real 3m49.556s user 4m1.825s sys 0m9.013s ``` [root@tidb06 backup]# du -sh tutorial.hits_v1.tsv.gz 1.3G tutorial.hits_v1.tsv.gz **4.2CTAS表快照:** ``` [root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query=" create table tutorial.hits_v2 as tutorial.hits_v1" real 0m0.088s user 0m0.017s sys 0m0.016s [root@tidb06 ~]# [root@tidb06 ~]# time clickhouse-client -udefault --password=j780UJy9D2tn --query="insert into table tutorial.hits_v2 select * from tutorial.hits_v1" real 0m26.225s user 0m0.024s sys 0m0.025s ``` **4.3远程拷贝表数据到本地库:** **授权tutorail 库给用户wujianwei,允许从172.16.0.246服务器来远程访问tidb06上clickhouse的tutorial库** **具体权限配置文件如下:** ``` [root@tidb06 users.d]# cat /etc/clickhouse-server/users.d/wujianwei_rw.xml c0952f7212b0161d07c6f45f00fdb73e17430f11172.16.0.246normal_2defaulttest008tutorial1 ``` ``` [root@tidb05 ~]# clickhouse-client --user=wujianwei -h 172.16.0.247 --password=j780UJy9D2tn --port=9000 --query "show databases;" test008 tutorial ``` **创建同样的表结构表:** ``` [root@tidb05 ~]# clickhouse-client -udefault -m tidb05 :) create database tutoria; CREATE TABLE tutorial.hits_v1 ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams` Nested( Key1 String, Key2 String, Key3 String, Key4 String, Key5 String, ValueDouble Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192; ``` **远程copy数据:** ``` tidb05 :) insert into table tutorial.hits_v1 select * from remote ('172.16.0.247','tutorial.hits_v1','wujianwei','j780UJy9D2tn'); INSERT INTO tutorial.hits_v1 SELECT * FROM remote('172.16.0.247', 'tutorial.hits_v1', 'wujianwei', 'j780UJy9D2tn') Ok. 0 rows in set. Elapsed: 27.291 sec. Processed 8.87 million rows, 8.46 GB (325.15 thousand rows/s., 310.01 MB/s.) tidb05 :) select count(*) from tutorial.hits_v1; SELECT count(*) FROM tutorial.hits_v1 ┌─count()─┐ │ 8873898 │ └─────────┘ ``` 本次演示到此结束,欢迎一起交流和学习。
上一篇:NRF52832学习笔记(12)——UART串口使用


下一篇:MySql 8.0.11 客户端连接失败:2059 - Authentication plugin 'caching_sha2_password' cannot be loaded: ....