我们通过之前得学习,得知kafka一些特性和python操作kafka得方法,但是我们还没有学习过查看kafka数据方法,然后在这一篇笔记中我们来学习下使用命令来查看kafka数据
我们首先查看kafka主题分布情况
/usr/local/kafka/bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic nginx-logs Topic:nginx-logs PartitionCount:1 ReplicationFactor:1 Configs: Topic: nginx-logs Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Leader:指定主分区的broker id
Replicas: 副本在那些机器上
Isr:可以做为主分区的broker id
接着我们计入kafka存储得topic目录下,我的目录为/usr/local/kafka/logs/kafka/nginx-logs-0
可以发现数据文件由.index文件、.log文件、.timeindex文件组成
可以通过kafka安装目录bin目录下的kafka-run-class.sh查看这些文件的内容
查看.index文件的方法
../../../bin/kafka-run-class.sh kafka.tools.DumpLogSegments -files 00000000000000000000.index --print-data-log
返回结果我们可以看到偏移量和定位
...... offset: 1546 position: 1547281 offset: 1551 position: 1552092 offset: 1556 position: 1557085 offset: 1560 position: 1561398 offset: 1564 position: 1565669 offset: 1568 position: 1569935 offset: 1573 position: 1574858 offset: 1578 position: 1579745 offset: 1582 position: 1584156 offset: 1586 position: 1588553 offset: 1590 position: 1592966 offset: 1594 position: 1597382 offset: 1598 position: 1601877 offset: 1603 position: 1606706 offset: 1608 position: 1611517
查看.log文件的方法
../../../bin/kafka-run-class.sh kafka.tools.DumpLogSegments -files 00000000000000000000.log --print-data-log
返回的都是生产者发送的消息
| offset: 1607 CreateTime: 1573205403831 keysize: -1 valuesize: 900 sequence: -1 headerKeys: [] payload: {"timestamp":"08/Nov/2019:17:30:02 +0800","log":{"offset":3002136,"file":{"path":"/usr/local/nginx/logs/access.log"}},"response_code":200,"ecs":{"version":"1.1.0"},"request":"/","@version":"1","input":{"type":"log"},"auth":"-","method":"GET","httpversion":"1.1","client":"49.233.176.23","response_time":17589,"tags":["hk-nginx-logs"],"user_agent":"\"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)\"","agent":{"type":"filebeat","version":"7.4.0","id":"fcf18d27-a7d6-48d1-8749-7eaabc70d6d9","ephemeral_id":"173f7435-086b-43ae-b953-cc5fa8925e9b","hostname":"iZj6c1iqplyxdp3km5vse6Z"},"host":{"os":{"version":"6.10 (Final)","family":"redhat","name":"CentOS","platform":"centos","kernel":"2.6.32-754.12.1.el6.x86_64","codename":"Final"},"containerized":false,"architecture":"x86_64","hostname":"iZj6c1iqplyxdp3km5vse6Z","name":"iZj6c1iqplyxdp3km5vse6Z"},"@timestamp":"2019-11-08T09:30:02.620Z"} baseOffset: 1608 lastOffset: 1608 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 1611517 CreateTime: 1573205419161 size: 1003 magic: 2 compresscodec: NONE crc: 1720564619 isvalid: true | offset: 1608 CreateTime: 1573205419161 keysize: -1 valuesize: 933 sequence: -1 headerKeys: [] payload: {"timestamp":"08/Nov/2019:17:30:14 +0800","log":{"file":{"path":"/usr/local/nginx/logs/access.log"},"offset":3002278},"response_code":200,"ecs":{"version":"1.1.0"},"request":"/retrieve___fd3d_1991_1.html","@version":"1","input":{"type":"log"},"auth":"-","method":"GET","httpversion":"1.1","client":"54.36.148.0","response_time":4986,"tags":["hk-nginx-logs"],"user_agent":"\"Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)\"","agent":{"type":"filebeat","id":"fcf18d27-a7d6-48d1-8749-7eaabc70d6d9","version":"7.4.0","ephemeral_id":"173f7435-086b-43ae-b953-cc5fa8925e9b","hostname":"iZj6c1iqplyxdp3km5vse6Z"},"host":{"os":{"version":"6.10 (Final)","family":"redhat","name":"CentOS","platform":"centos","kernel":"2.6.32-754.12.1.el6.x86_64","codename":"Final"},"containerized":false,"architecture":"x86_64","hostname":"iZj6c1iqplyxdp3km5vse6Z","name":"iZj6c1iqplyxdp3km5vse6Z"},"@timestamp":"2019-11-08T09:30:17.626Z"}
查看.timeindex文件的方法
../../../bin/kafka-run-class.sh kafka.tools.DumpLogSegments -files 00000000000000000000.timeindex --print-data-log
返回时间戳和偏移量
...... timestamp: 1573204946730 offset: 1541 timestamp: 1573205029744 offset: 1546 timestamp: 1573205147762 offset: 1551 timestamp: 1573205186786 offset: 1556 timestamp: 1573205186787 offset: 1558 timestamp: 1573205188782 offset: 1563 timestamp: 1573205188792 offset: 1565 timestamp: 1573205217794 offset: 1572 timestamp: 1573205220814 offset: 1578 timestamp: 1573205221801 offset: 1582 timestamp: 1573205223960 offset: 1584 timestamp: 1573205223983 offset: 1587 timestamp: 1573205342820 offset: 1603 timestamp: 1573205419161 offset: 1608
index件和log文件组成segment,segment文件的命名规则是,partion全局的第一个segment从0开始,后续每个segment文件名为上一个全局partion的最大offset(偏移message数)。数值最大为64位long大小,19位数字字符长度,没有数字用0填充。log.segment.bytes参数配置了一个log文件的大小,文件大小超过这个值就会生成新的文件