kafka查看存储数据的方法

我们通过之前得学习,得知kafka一些特性和python操作kafka得方法,但是我们还没有学习过查看kafka数据方法,然后在这一篇笔记中我们来学习下使用命令来查看kafka数据

我们首先查看kafka主题分布情况

/usr/local/kafka/bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic nginx-logs
Topic:nginx-logs        PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: nginx-logs       Partition: 0    Leader: 0       Replicas: 0     Isr: 0

Leader:指定主分区的broker id
Replicas: 副本在那些机器上
Isr:可以做为主分区的broker id

接着我们计入kafka存储得topic目录下,我的目录为/usr/local/kafka/logs/kafka/nginx-logs-0

可以发现数据文件由.index文件、.log文件、.timeindex文件组成

可以通过kafka安装目录bin目录下的kafka-run-class.sh查看这些文件的内容

查看.index文件的方法

../../../bin/kafka-run-class.sh kafka.tools.DumpLogSegments -files 00000000000000000000.index --print-data-log

返回结果我们可以看到偏移量和定位

......
offset: 1546 position: 1547281
offset: 1551 position: 1552092
offset: 1556 position: 1557085
offset: 1560 position: 1561398
offset: 1564 position: 1565669
offset: 1568 position: 1569935
offset: 1573 position: 1574858
offset: 1578 position: 1579745
offset: 1582 position: 1584156
offset: 1586 position: 1588553
offset: 1590 position: 1592966
offset: 1594 position: 1597382
offset: 1598 position: 1601877
offset: 1603 position: 1606706
offset: 1608 position: 1611517

查看.log文件的方法

../../../bin/kafka-run-class.sh kafka.tools.DumpLogSegments -files 00000000000000000000.log --print-data-log

返回的都是生产者发送的消息

| offset: 1607 CreateTime: 1573205403831 keysize: -1 valuesize: 900 sequence: -1 headerKeys: [] payload: {"timestamp":"08/Nov/2019:17:30:02 +0800","log":{"offset":3002136,"file":{"path":"/usr/local/nginx/logs/access.log"}},"response_code":200,"ecs":{"version":"1.1.0"},"request":"/","@version":"1","input":{"type":"log"},"auth":"-","method":"GET","httpversion":"1.1","client":"49.233.176.23","response_time":17589,"tags":["hk-nginx-logs"],"user_agent":"\"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)\"","agent":{"type":"filebeat","version":"7.4.0","id":"fcf18d27-a7d6-48d1-8749-7eaabc70d6d9","ephemeral_id":"173f7435-086b-43ae-b953-cc5fa8925e9b","hostname":"iZj6c1iqplyxdp3km5vse6Z"},"host":{"os":{"version":"6.10 (Final)","family":"redhat","name":"CentOS","platform":"centos","kernel":"2.6.32-754.12.1.el6.x86_64","codename":"Final"},"containerized":false,"architecture":"x86_64","hostname":"iZj6c1iqplyxdp3km5vse6Z","name":"iZj6c1iqplyxdp3km5vse6Z"},"@timestamp":"2019-11-08T09:30:02.620Z"}
baseOffset: 1608 lastOffset: 1608 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 1611517 CreateTime: 1573205419161 size: 1003 magic: 2 compresscodec: NONE crc: 1720564619 isvalid: true
| offset: 1608 CreateTime: 1573205419161 keysize: -1 valuesize: 933 sequence: -1 headerKeys: [] payload: {"timestamp":"08/Nov/2019:17:30:14 +0800","log":{"file":{"path":"/usr/local/nginx/logs/access.log"},"offset":3002278},"response_code":200,"ecs":{"version":"1.1.0"},"request":"/retrieve___fd3d_1991_1.html","@version":"1","input":{"type":"log"},"auth":"-","method":"GET","httpversion":"1.1","client":"54.36.148.0","response_time":4986,"tags":["hk-nginx-logs"],"user_agent":"\"Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)\"","agent":{"type":"filebeat","id":"fcf18d27-a7d6-48d1-8749-7eaabc70d6d9","version":"7.4.0","ephemeral_id":"173f7435-086b-43ae-b953-cc5fa8925e9b","hostname":"iZj6c1iqplyxdp3km5vse6Z"},"host":{"os":{"version":"6.10 (Final)","family":"redhat","name":"CentOS","platform":"centos","kernel":"2.6.32-754.12.1.el6.x86_64","codename":"Final"},"containerized":false,"architecture":"x86_64","hostname":"iZj6c1iqplyxdp3km5vse6Z","name":"iZj6c1iqplyxdp3km5vse6Z"},"@timestamp":"2019-11-08T09:30:17.626Z"}

查看.timeindex文件的方法

../../../bin/kafka-run-class.sh kafka.tools.DumpLogSegments -files 00000000000000000000.timeindex --print-data-log

返回时间戳和偏移量

......
timestamp: 1573204946730 offset: 1541
timestamp: 1573205029744 offset: 1546
timestamp: 1573205147762 offset: 1551
timestamp: 1573205186786 offset: 1556
timestamp: 1573205186787 offset: 1558
timestamp: 1573205188782 offset: 1563
timestamp: 1573205188792 offset: 1565
timestamp: 1573205217794 offset: 1572
timestamp: 1573205220814 offset: 1578
timestamp: 1573205221801 offset: 1582
timestamp: 1573205223960 offset: 1584
timestamp: 1573205223983 offset: 1587
timestamp: 1573205342820 offset: 1603
timestamp: 1573205419161 offset: 1608

index件和log文件组成segment,segment文件的命名规则是,partion全局的第一个segment从0开始,后续每个segment文件名为上一个全局partion的最大offset(偏移message数)。数值最大为64位long大小,19位数字字符长度,没有数字用0填充。log.segment.bytes参数配置了一个log文件的大小,文件大小超过这个值就会生成新的文件

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://sulao.cn/post/755.html

我要评论

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。