reindex简单使用学习总结,总结了在一些场景中使用reindex做Elasticsearch数据迁移的方式。

1.简单的reindex

source里是源index,dest里是目标索引。remote里必须是在新集群中加入了白名单的ip和port
POST _reindex

{
    "source": {
		"remote": {
			"host": "http://ip:port" 
		},
        "index": "index1"
     },
     "dest": {
          "index": "index2"
     }
}

2.只reindex目标索引中缺少的

op_type设置为create,只迁移目标索引中没有但老集群有的数据

POST _reindex
{
     "source": {
		"remote": {
			"host": "http://ip:port" 
		},
        "index": "index1"
     },
     "dest": {
        "index": "index2",
		"op_type": "create"
     }
}

3.设置批次大小

通过设置size来实现,默认的size是1000

POST _reindex
{
    "source": {
		"remote": {
			"host": "http://ip:port" 
		},
        "index": "index1",
	    "size": 2000
    },
     "dest": {
          "index": "index2"
    }
}

4.遇到冲突继续

通过设置 “conflicts”: “proceed” 和 “op_type”: “create” 实现

POST _reindex
{
	"conflicts": "proceed",
    "source": {
		"remote": {
			"host": "http://ip:port" 
		},
          "index": "index1"
     },
     "dest": {
          "index": "index2",
		  "op_type": "create"
     }
}

5.只reindex符合条件的数据

通过dsl查询实现,查出需要reindex的数据。

POST _reindex
{
     "source": {
	 	"remote": {
			"host": "http://ip:port" 
	  	},
        "index": "index1",
		"query": {
			"term": { "name": "zs" }
		}
     },
     "dest": {
          "index": "index2"
     }
}

6.只reindex 源索引中的部分字段

通过 _source 指定要reindex的字段

POST _reindex
{
    "source": {
		"remote": {
			"host": "http://ip:port" 
		},
        "index": "index1",
		"_source": [ "column1","column2" ]
     },
     "dest": {
          "index": "index2"
     }
}

7.屏蔽不想reindex的数据

使用 excludes 屏蔽不想reindex的字段

POST _reindex
{
     "source": {
		"remote": {
			"host": "http://ip:port" 
		},
        "index": "index1",
		"excludes": [ "column1","column2" ]
   	 },
    "dest": {
          "index": "index2"
     }
}

8.用script脚本在reindex时做数据处理

通过painless实现。painless是es 5.x以后推出的一种简单,安全的脚本语言。也是es 5.x以后默认的脚本语言。
此实例时将boolean中的True转换为true.

POST _reindex
{
	"source": {
		"remote": {
			"host": "http://ip:port" 
		},
	"index": "index1"
	},
	"dest": {
		"index": "index2"
	},
	"script": {
		"inline": "if (ctx._source.auth != null) {ctx._source.column=ctx._source.column.toString().toLowerCase();} ",
		"lang": "painless"
	}
}

9.字段重新命名

同样是使用script,将name属性重命名为newName

POST _reindex
{
	"source": {
		"remote": {
			"host": "http://ip:port" 
		},
		"index": "index1"
	},
	"dest": {
		"index": "index2"
	},
	"script": { 
		"inline": "ctx._source.newName = ctx._source.remove(\"name\")",
		"lang": "painless"
	}
}

10.客户端双写时

通过设置"conflicts":“proceed” 和 “version_type”: “external” 来保证保证version低的不覆盖

{
	"conflicts":"proceed",
	"source": {
		"remote": {
			"host": "http://ip:port" 
		},
		"index": "index1"
	},
	"dest": {
		"version_type": "external",
		"index": "index2"
	}
}
上一篇:《python数据分析》5.2.1~5.2.4


下一篇:18 Rest高级客户端实践(五):ReIndex