Prometheus集+alertmanager集+influxDB远程存储，实现监控高可用

Prometheus集+alertmanager集+influxDB远程存储，实现监控⾼可⽤

服务器A ：192.168.1.190 （Prometheus、alertmanager）

服务器B ：192.168.1.206（Prometheus、alertmanager、influxdb、nginx）

基本HA + 远程存储

在基本HA模式的基础上通过添加Remote Storage存储⽀持，将监控数据保存在第三⽅存储服务上。

在保证Promthues服务可⽤性的基础上，同时确保了数据的持久化，当Promthues Server发⽣宕机或者数据丢失的情况下，可以快速的恢复。同时Promthues Server可能很好的进⾏迁移。因此，该⽅案适⽤于⽤户监控规模不⼤，但是希望能够将监控数据持久化，同时能够确保Promthues Server的可迁移性的场景。

在B 上使⽤docker安装influxDB库

mkdir -p /data/infuxdb

vi /data/l

version: '2'

services:

influxdb:

image: influxdb

container_name: influxdb

hostname: influxdb

restart: always

command: -config /etc/f

ports:

- "8086:8086"

- "8083:8083"

volumes:

- /data/influxdb/conf:/etc/influxdb

- /data/influxdb/data:/var/lib/influxdb/data

- /data/influxdb/meta:/var/lib/influxdb/meta

- /data/influxdb/wal:/var/lib/influxdb/wal

vi /data/influxdb/f

### Welcome to the InfluxDB configuration file.

# The values in this file override the default values used by the system if

# a config option is not specified. The commented out lines are the configuration

# field and the default value used. Uncommenting a line and changing the value

# will change the value used at runtime when the process is restarted.

# Once every 24 hours InfluxDB will report usage data to usage.influxdata

秸秆发酵剂# The data includes a random ID, os, arch, version, the number of series and other

# usage data. No data from user databases is ever transmitted.

# Change this option to true to disable reporting.

# reporting-disabled = false

# Bind address to use for the RPC service for backup and restore.

# bind-address = "127.0.0.1:8088"

>>>>>>#

#InfluxDB 配置优化 version 1.6##

>>>>>>#

###

### [meta]

###

### Controls the parameters for the Raft consensus group that stores metadata

### about the InfluxDB cluster.

###

[meta]

# Where the metadata/raft database is stored

# 元数据存储⽬录

dir = "/var/lib/influxdb/meta"

# Automatically create a default retention policy when creating a database.

# retention-autocreate = true

# If log messages are printed for the meta service

# logging-enabled = true

###

### [data]FANPN

###

### Controls where the actual shard data for InfluxDB lives and how it is

### flushed from the WAL. "dir" may need to be changed to a suitable place

### for your system, but the WAL settings are an advanced configuration. The

### defaults should work for most systems.

###

[data]

# The directory where the TSM storage engine stores TSM files.

# 数据存储的⽬录

dir = "/var/lib/influxdb/data"

# The directory where the TSM storage engine stores WAL files.

# wal数据⽬录

wal-dir = "/var/lib/influxdb/wal"

# The amount of time that a write will wait before fsyncing. A duration

# greater than 0 can be used to batch up multiple fsync calls. This is useful for slower

# disks or when WAL write contention is seen. A value of 0s fsyncs every write to the WAL.

# Values in the range of 0-100ms are recommended for non-SSD disks.

# wal-fsync-delay = "0s"

# The type of shard index to use for new shards. The default is an in-memory index that is

# recreated at startup. A value of "tsi1" will use a disk based index that supports higher

# cardinality datasets.

# index-version = "inmem"

# Trace logging provides more verbose output around the tsm engine. Turning

# this on can provide more useful output for debugging tsm engine issues.

# trace-logging-enabled = false

# Whether queries should be logged before execution. Very useful for troubleshooting, but will

# log any sensitive data contained within a query.

# query-log-enabled = true

# Settings for the TSM engine

# CacheMaxMemorySize is the maximum size a shard's cache can

# reach before it starts rejecting writes.

# Valid size suffixes are k, m, or g (case insensitive, 1024 = 1k).

# Values without a size suffix are in bytes.

# 4294967296(b)=4G 最⼤缓存数据，先缓存再写⼊

cache-max-memory-size = "8g"

# CacheSnapshotMemorySize is the size at which the engine will

铂钛催化剂# snapshot the cache and write it to a TSM file, freeing up memory

# Valid size suffixes are k, m, or g (case insensitive, 1024 = 1k).

# Values without a size suffix are in bytes.

# cache-snapshot-memory-size = "25m"

# CacheSnapshotWriteColdDuration is the length of time at

# which the engine will snapshot the cache and write it to

# a new TSM file if the shard hasn't received writes or deletes

# cache-snapshot-write-cold-duration = "10m"

# CompactFullWriteColdDuration is the duration at which the engine

97ssw# will compact all TSM files in a shard if it hasn't received a

# write or delete

# compact-full-write-cold-duration = "4h"

# The maximum number of concurrent full and level compactions that can run at one time. A

# value of 0 results in 50% of runtime.GOMAXPROCS(0) used at runtime. Any number greater

# than 0 limits compactions to that value. This setting does not apply

# to cache snapshotting.

# max-concurrent-compactions = 0

# The threshold, in bytes, when an index write-ahead log file will compact

# into an index file. Lower sizes will cause log files to be compacted more

# quickly and result in lower heap usage at the expense of write throughput.

# Higher sizes will be compacted less frequently, store more series in-memory,

# and provide higher write throughput.

# Valid size suffixes are k, m, or g (case insensitive, 1024 = 1k).

# Values without a size suffix are in bytes.

# max-index-log-file-size = "1m"

# The maximum series allowed per database before writes are dropped. This limit can prevent

# high cardinality issues at the database level. This limit can be disabled by setting it to

# 0.

max-series-per-database = 0

# The maximum number of tag values per tag that are allowed before writes are dropped. This limit # can prevent high cardinality tag values from being written to a measurement. This limit can be

# disabled by setting it to 0.

max-values-per-tag = 0

# If true, then the mmap advise value MADV_WILLNEED will be provided to the kernel with respect to # TSM files. This setting has been found to be problematic on some kernels, and defaults to off.

# It might help users who have slow disks in some cases.

# tsm-use-madv-willneed = false

###

### [coordinator]

###

### Controls the clustering service configuration.

###

[coordinator]

# The default time a write request will wait until a "timeout" error is returned to the caller.

write-timeout = "10s"

# The maximum number of concurrent queries allowed to be executing at one time. If a query is

# executed and exceeds this limit, an error is returned to the caller. This limit can be disabled

# by setting it to 0.

# max-concurrent-queries项是配置最⼤的可执⾏的命令数，此项值为零则表⽰⽆限制。

# 如果你执⾏的命令数超过这个配置项的数量，则会报如下错误:

# ERR: max concurrent queries reached

max-concurrent-queries = 0

# The maximum time a query will is allowed to execute before being killed by the system. This limit

# can help prevent run away queries. Setting the value to 0 disables the limit.

# query-timeout项⽤来配置命令的超时时间，如果命令的执⾏时长超过了此时间，则influxDB会杀掉这条语句并报出如下错误：

ca1521航班# ERR: query timeout reached

# 如果配置了连续查询，那么最好不要配置query-timeout超时时间，因为随着数据量的增加，连续查询⽣成的数据所需要的时间更长，配置之后会导致数据⽣成不成功。query-timeout = "0"

# The time threshold when a query will be logged as a slow query. This limit can be set to help

# discover slow or resource intensive queries. Setting the value to 0 disables the slow query logging.

# log-queries-after⽤来配置执⾏时长为多少的语句会被记录为慢查询。配置为0则表⽰不会记录这些语句。

# ⽐如，改项配置为“1s”，则执⾏时长超过1秒的语句会被标记为慢查询，并记录在⽇志⾥。

log-queries-after = "10s"

# The maximum number of points a SELECT can process. A value of 0 will make

# the maximum point count unlimited. This will only be checked every second so queries will not

# be aborted immediately when hitting the limit.

# 在point可控的情况下，可以设置此参数

# max-select-point配置⼀次可查询出的数据量，因为在influxDB中⼀条数据看做⼀个点，因此这个配置叫每次可查询的最⼤的点数。

# 配置为0则表⽰⽆限制，如果查询出来的数量⼤于此项配置，则influxDB会杀掉这条语句并报出如下错误：

# ERR: max number of points reached

max-select-point = 0

# The maximum number of series a SELECT can run. A value of 0 will make the maximum series

# count unlimited.

# max-select-series⽤来配置influxDB语句中最多可处理的series的数量，如果你的语句中要处理的series数量⼤于此配置，则influxDB不会执⾏这条语句并且会报出如下错误：# ERR: max select series count exceeded: <query_series_count> series

max-select-series = 0

# The maxium number of group by time bucket a SELECT can create. A value of zero will max the m

aximum

# number of buckets unlimited.

max-select-buckets = 0

###

### [retention]

###

### Controls the enforcement of retention policies for evicting old data.

###

[retention]

# Determines whether retention policy enforcement enabled.

# enabled = true

# The interval of time when retention policy enforcement checks run.

# check-interval = "30m"

###

### [shard-precreation]

###

### Controls the precreation of shards, so they are available before data arrives.

### Only shards that, after creation, will have both a start- and end-time in the

### future, will ever be created. Shards are never precreated that would be wholly

### or partially in the past.

[shard-precreation]

# Determines whether shard pre-creation service is enabled.

# enabled = true

# The interval of time when the check to pre-create new shards runs.

# check-interval = "10m"

# The default period ahead of the endtime of a shard group that its successor

# group is created.

advance-period = "10m"

###

### Controls the system self-monitoring, statistics and diagnostics.

###

### The internal database for monitoring data is created automatically if

### if it does not already exist. The target retention within this database

### is called 'monitor' and is also created with a retention period of 7 days

### and a replication factor of 1, if it does not exist. In all cases the

### this retention policy is configured as the default for the database.

[monitor]

# Whether to record statistics internally.

# store-enabled = true

# The destination database for recorded statistics

# store-database = "_internal"

# The interval at which to record statistics

# store-interval = "10s"

###

### [http]

###

### Controls how the HTTP endpoints are configured. These are the primary

### mechanism for getting data into and out of InfluxDB.

# Determines whether HTTP endpoint is enabled.

# enabled = true

# The bind address used by the HTTP service.

bind-address = ":8086"

# Determines whether user authentication is enabled over HTTP/HTTPS.

#auth-enabled = true

# The default realm sent back when issuing a basic auth challenge.

# realm = "InfluxDB"

# Determines whether HTTP request logging is enabled.

# 默认为true，会⽣成很多http请求的数据，建议关闭，不然⽇志⽂件跟插⼊数据量成正⽐，⼤致1:1的关系#

log-enabled = false

# Determines whether the HTTP write request logs should be suppressed when the log is enabled.

# suppress-write-log = false

# When HTTP request logging is enabled, this option specifies the path where

# log entries should be written. If unspecified, the default is to write to stderr, which

# intermingles HTTP logs with internal InfluxDB logging.

# If influxd is unable to access the specified path, it will log an error and fall back to writing

# the request log to stderr.

# access-log-path = ""

# Determines whether detailed write logging is enabled.

# write-tracing = false

# Determines whether the pprof endpoint is enabled. This endpoint is used for

# troubleshooting and monitoring.

# pprof-enabled = true

# Enables a pprof endpoint that binds to localhost:6060 immediately on startup.

# This is only needed to debug startup issues.

# debug-pprof-enabled = false

# Determines whether HTTPS is enabled.

# https-enabled = false

# The SSL certificate to use when HTTPS is enabled.

# https-certificate = "/etc/ssl/influxdb.pem"

# Use a separate private key location.

# https-private-key = ""

# The JWT auth shared secret to validate requests using JSON web tokens.

# shared-secret = ""

# The default chunk size for result sets that should be chunked.

# 查询页⾯显⽰最⼤记录数

max-row-limit = 10000

# The maximum number of HTTP connections that may be open at once. New connections that

# would exceed this limit are dropped. Setting this value to 0 disables the limit.

# max-connection-limit = 0

# Enable http service over unix domain socket

# unix-socket-enabled = false

# The path of the unix domain socket.

# bind-socket = "/var/run/influxdb.sock"

# The maximum size of a client request body, in bytes. Setting this value to 0 disables the limit.

# max-body-size = 25000000

# The maximum number of writes processed concurrently.

# Setting this to 0 disables the limit.

# max-concurrent-write-limit = 0

# The maximum number of writes queued for processing.

# Setting this to 0 disables the limit.

# max-enqueued-write-limit = 0

# The maximum duration for a write to wait in the queue to be processed.

# Setting this to 0 or setting max-concurrent-write-limit to 0 disables the limit.

# enqueued-write-timeout = 0

###

### [ifql]

###

### Configures the ifql RPC API.

###

[ifql]

# Determines whether the RPC service is enabled.

# enabled = true

# Determines whether additional logging is enabled.

# log-enabled = true

# The bind address used by the ifql RPC service.

# bind-address = ":8082"

###

### [logging]

###

### Controls how the logger emits logs to the output.

# Determines which log encoder to use for logs. Available options

# are auto, logfmt, and json. auto will use a more a more user-friendly

# output format if the output terminal is a TTY, but the format is not as

# easily machine-readable. When the output is a non-TTY, auto will use

# logfmt.

# format = "auto"

# Determines which level of logs will be emitted. The available levels

# are error, warn, info, and debug. Logs that are equal to or above the

# specified level will be emitted.

# level = "info"

# Suppresses the logo output that is printed when the program is started.

# The logo is always suppressed if STDOUT is not a TTY.

# suppress-logo = false

###

### [subscriber]

###

### Controls the subscriptions, which can be used to fork a copy of all data

### received by the InfluxDB host.

###

四球机[subscriber]

# Determines whether the subscriber service is enabled.

# enabled = true

# The default timeout for HTTP writes to subscribers.

# http-timeout = "30s"

# Allows insecure HTTPS connections to subscribers. This is useful when testing with self-

# signed certificates.

# insecure-skip-verify = false

# The path to the PEM encoded CA certs file. If the empty string, the default system certs will be used

# ca-certs = ""

# The number of writer goroutines processing the write channel.

# write-concurrency = 40

# The number of in-flight writes buffered in the write channel.

# write-buffer-size = 1000

###

### [[graphite]]

###

### Controls one or many listeners for Graphite data.

###

[[graphite]]

# Determines whether the graphite endpoint is enabled.

# enabled = false

# database = "graphite"

# retention-policy = ""

# bind-address = ":2003"

# protocol = "tcp"

# consistency-level = "one"

# These next lines control how batching works. You should have this enabled

# otherwise you could get dropped metrics or poor performance. Batching

# will buffer points in memory if you have many coming in.

# Flush if this many points get buffered

# batch-size = 5000

# number of batches that may be pending in memory

# batch-pending = 10

# Flush at least this often even if we haven't hit buffer limit

# batch-timeout = "1s"

# UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.

# udp-read-buffer = 0

### This string joins multiple matching 'measurement' values providing more control over the final measurement name. # separator = "."

### Default tags that will be added to all metrics. These can be overridden at the template level

### or by tags extracted from metric

# tags = ["region=us-east", "zone=1c"]

### Each template line requires a template pattern. It can have an optional

### filter before the template and separated by spaces. It can also have optional extra

### tags following the template. Multiple tags should be separated by commas and no spaces

### similar to the line protocol format. There can be only one default template.

# templates = [

# "*.app asurement",

# # Default template

# "server.*",

# ]

###

### [collectd]

###

### Controls one or many listeners for collectd data.

###

[[collectd]]

# enabled = false

# bind-address = ":25826"

本文发布于:2024-09-22 02:01:40，感谢您对本站的认可！

本文链接：https://www.17tex.com/tex/3/243762.html

上一篇：智慧园区安防管理系统详细建设方案

下一篇：Prometheus监控MySQL5.7

标签：数据配置语句查询监控

留言与评论（共有 0 条评论）