Elasticsearch-RestAPI

wenking 12/19/2023 Elasticsearch

# 集群

# 集群健康状态

### 查看集群健康状态
GET http://localhost:9200/_cluster/health

1
2

# 索引操作

# 查询索引

# 列出所有的索引列表

GET http://localhost:9200/_cat/indices?v

# 获取索引映射

GET http://localhost:9200/hotel/_mapping 
Content-Type: application/json

1
2

# 获取索引配置

GET http://localhost:9200/hotel/_settings 
Content-Type: application/json

1
2

# 删除索引

DELETE http://localhost:9200/hotel

# 创建索引

PUT http://localhost:9200/hotel
Content-Type: application/json

{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "keyword"
      },
      "location": {
        "type": "nested",
        "properties": {
          "address": {
            "type": "text"
          },
          "street": {
            "type": "text"
          },
          "city": {
            "type": "keyword"
          }
        }
      }
    }
  },
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

一些索引创建可选项：

{
    "mappings":{
        ....
    },
    "settings": {
        // 指定索引分片数量
        "number_of_shards": 3,
        "number_of_replicas": 2,
        
        "analysis": {
            "analyzer": {
              "my_analyzer": {
                "type": "custom",
                "tokenizer": "standard",           // 使用标准分词器
                "filter": ["lowercase", "my_stopwords"],  // 应用小写过滤器和自定义停用词过滤器
                "char_filter": ["html_strip"]     // 预处理阶段应用 HTML 去除过滤器
              }
            },
            "filter": {
              "my_stopwords": {
                "type": "stop",
                "stopwords": ["and", "the", "a"]
              }
           }
      }
        
   },
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

在 Elasticsearch 中，分析器（Analyzer）是一个组件，它负责将文本字段分解成单个词汇（tokens），这些词汇用于搜索和索引。

设置分析器时，通常需要配置以下参数:

tokenizer：负责将输入文本分割成一个或多个词汇。
filter：在分词器之后应用的处理步骤，它们可以对生成的词汇进行进一步的处理，例如转换为小写、删除停用词、词干提取等。
char_filter：在分词器之前应用，用于对输入文本中的字符进行预处理，例如去除HTML标签、转义特殊字符等。(html_strip、mapping)
language：对于特定语言的文本，Elasticsearch 提供了语言分析器，它们包含了特定于语言的分词器和过滤器。
type：指定分析器的类型，如 standard（标准分析器）
stopwords：如果使用了 stop 过滤器，可以在这里指定一个停用词列表，这些词在分析过程中会被忽略。

# 修改索引

elasticsearch中是不能直接修改索引的，因此当我们需要添加新的映射或修改映射的时候，推荐使用 reindex API

POST http://localhost:9200/_reindex
Content-Type: application/json

{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}

1
2
3
4
5
6
7
8
9
10
11

# 数据类型

# 文本类型

text：用于全文搜索和分析的字段。
keyword：用于精确值、排序和聚合的字段。

# 数值类型

long
integer
short
byte
double
float

# 范围类型

integer_range
float_range
long_range
double_range
date_range

# 其他常见类型

date
boolean

# 其他类型

binary
geo_point
geo_shape
ip
数组（Array）：虽然 Elasticsearch 中没有专门的数组数据类型，但任何字段都可以存储一个值的数组。所有数组元素必须是同一类型。

# 文档操作

# 插入和修改

# 往索引库中插入单条数据

POST http://localhost:9200/hotel/_doc
Content-Type: application/json

{
  "id": "1",
  "name": "Shop A",
  "location": {
    "address": "123 Main St",
    "city": "New York",
    "state": "NY",
    "zip": "10001"
  },
  "category": "Electronics",
  "rating": 4.5
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

# 往索引库中批量插入数据

POST http://192.168.1.6:9200/_bulk
Content-Type: application/x-ndjson

{ "index": { "_index": "shops", "_id": "1" } }
{ "id": "1", "name": "Shop A", "location": { "address": "123 Main St", "city": "New York", "state": "NY", "zip": "10001" }, "category": "Electronics", "rating": 4.5 }
{ "index": { "_index": "shops", "_id": "2" } }
{ "id": "2", "name": "Shop B", "location": { "address": "456 Market St", "city": "San Francisco", "state": "CA", "zip": "94101" }, "category": "Clothing", "rating": 4.2 }
{ "index": { "_index": "shops", "_id": "3" } }
{ "id": "3", "name": "Shop C", "location": { "address": "789 Park Ave", "city": "Los Angeles", "state": "CA", "zip": "90001" }, "category": "Books", "rating": 4.8 }

1
2
3
4
5
6
7
8
9
10

curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary @-
<< EOF
{ "index": { "_index": "shops", "_id": "1" } }
{ "id": "1", "name": "Shop A", "location": { "address": "123 Main St", "city": "New York", "state": "NY", "zip": "10001" }, "category": "Electronics", "rating": 4.5 }
{ "index": { "_index": "shops", "_id": "2" } }
{ "id": "2", "name": "Shop B", "location": { "address": "456 Market St", "city": "San Francisco", "state": "CA", "zip": "94101" }, "category": "Clothing", "rating": 4.2 }
{ "index": { "_index": "shops", "_id": "3" } }
{ "id": "3", "name": "Shop C", "location": { "address": "789 Park Ave", "city": "Los Angeles", "state": "CA", "zip": "90001" }, "category": "Books", "rating": 4.8 }
EOF

1
2
3
4
5
6
7
8
9

# 修改文档数据

通过文档id更新文档（推荐使用，建议覆盖）

PUT http://localhost:9200/hotel/_doc/document_id 
Content-Type: application/json

{
  "field_name": "new_value",
  ...
}

1
2
3
4
5
6
7

通过条件更新文档(慎用)

POST http://localhost:9200/hotel/_update_by_query
Content-Type: application/json

{
  "script": {
    "source": "if (ctx._source.field_name == 'old_value') { ctx._source.field_name = 'new_value' }"
  },
  "query": {
    "match": {
      "another_field": "some_value"
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# 删除

通过文档id删除文档

DELETE http://localhost:9200/hotel/_doc/document_id   // document_id: VtLEkIwB7hZsTptPW2Ap

通过查询条件删除文档

POST http://localhost:9200/hotel/_delete_by_query
Content-Type: application/json

{
  "query": {
    "match": {
      "field_name": "value"
    }
  }
}

1
2
3
4
5
6
7
8
9
10

# 查询

# 查询指定索引库数据

###
GET http://localhost:9200/hotel/_search
Content-Type: application/json

{
  "query": {
    "match_all": {}
  }
}

1
2
3
4
5
6
7
8
9

# 数据迁移

# `reindex`

将数据导出到另外一个索引中

POST http://localhost:9200/hotel/_search
Content-Type: application/json

{
  "source": {
    "index": "source_index"
  },
  "dest": {
    "index": "target_index"
  }
}

1
2
3
4
5
6
7
8
9
10
11

# 快照

# 数据导出

创建快照仓库在原集群上

PUT http://localhost:9200/_snapshot/my_repository
Content-Type: application/json

{
  "type": "fs",
  "settings": {
    "location": "/path/to/your/snapshot/repo",
    "compress": true
  }
}

1
2
3
4
5
6
7
8
9
10

创建仓库的前提需要在elasticsearch.yml配置文件中指定如下配置：

path.repo: ["/path/to/your/snapshot/repo"]

将数据导出到快照中

PUT http://localhost:9200/_snapshot/my_repository/my_snapshot?wait_for_completion=true
Content-Type: application/json

{
  "indices": "source_index",
  "ignore_unavailable": true,
  "include_global_state": false
}

1
2
3
4
5
6
7
8

# 数据导入

创建快照仓库在目标集群上

PUT http://localhost:9200/_snapshot/my_repository
Content-Type: application/json

{
  "type": "fs",
  "settings": {
    "location": "/path/to/your/snapshot/repo"
  }
}

1
2
3
4
5
6
7
8
9

从快照中导入数据到目标集群上

PUT http://localhost:9200/_snapshot/my_repository/my_snapshot/_restore
Content-Type: application/json

{
  "indices": "source_index",
  "include_aliases": false,
  "rename_pattern": "(.+)",
  "rename_replacement": "target_index_$1"
}

1
2
3
4
5
6
7
8
9

# 测试

# 分词器

作用：可以使用分词器对输入内容进行分词测试

POST /_analyze

{
  "text": ["es真好学"],
  "analyzer": "pinyin"
}

1
2
3
4
5
6

如果存在自定义分词器在某个索引下，想使用该分词器进行测试，默认是找不到的。使用分词器时需要指定索引库名称。

POST /hotel/_analyze

{
  ...
}

1
2
3
4
5

Elasticsearch-JavaAPI Elasticsearch-DSL查询

KingのNote