1. 安装配置
选项 | 属性值 |
---|---|
CPU | 32 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz |
内核版本 | Linux host 4.9.0-13-amd64 #1 SMP Debian 4.9.228-1 |
DPDK版本 | dpdk-stable-17.11.3 |
网卡型号 | Broadcom Limited BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01) |
查看是否支持hpet,如果不支持则无输出内容,需要在BIOS中开启:
grep hpet /proc/timer_list
# ========================output info====================
Clock Event Device: hpet
set_next_event: hpet_legacy_next_event
shutdown: hpet_legacy_shutdown
periodic: hpet_legacy_set_periodic
oneshot: hpet_legacy_set_oneshot
resume: hpet_legacy_resume
启动hugepage支持(64位系统推荐使用1G的hugepages):
# 每块内存大小2MB,共预留1024个2MB内存块
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
然后将hugepages中的内存给DPDK使用:
mkdir /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
vim /etc/fstab
# add below item
nodev /mnt/huge hugetlbfs defaults 0 0
下载DPDK的源码下载,使用的是17.11.3:
wget http://fast.dpdk.org/rel/dpdk-17.11.3.tar.xz
tar xJf dpdk-17.11.3.tar.xz
cd dpdk-stable-17.11.3/
安装编译到 install 目录:
make install T=x86_64-native-linuxapp-gcc
# ========================格式描述====================
其中T表示Target, Target的描述格式是:
ARCH-MACHINE-EXECENV-TOOLCHAIN
- ARCH = i686, x86_64, ppc_64
- MACHINE = native, ivshmem, power8
- EXECENV = linuxapp, bsdapp
- TOOLCHAIN = gcc, icc
编译完成之后生产了环境目录x86_64-native-linuxapp-gcc
:
cd x86_64-native-linuxapp-gcc
ls
# ========================output info====================
app build include kmod lib Makefile
加载uio内核模块:
modprobe uio_pci_generic
要使用DPDK,必须将网卡绑定到uio_pci_generi模块,DPDK在tools目录下提供了dpdk-devbind.py脚本完成这个工作。首先在绑定前查看一下状态:
[root:/home/dingfuxiao/dpdk-stable-17.11.3/usertools]$./dpdk-devbind.py --status
# ========================output info====================
Network devices using DPDK-compatible driver
============================================
0000:82:00.1 'BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller 16d7' drv=uio_pci_generic unused=bnxt_en
Network devices using kernel driver
===================================
0000:01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eth3 drv=ixgbe unused=uio_pci_generic
0000:01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eth2 drv=ixgbe unused=uio_pci_generic
0000:06:00.0 'I350 Gigabit Network Connection 1521' if=eth4 drv=igb unused=uio_pci_generic *Active*
0000:06:00.1 'I350 Gigabit Network Connection 1521' if=eth5 drv=igb unused=uio_pci_generic
0000:82:00.0 'BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller 16d7' if=eth1 drv=bnxt_en unused=uio_pci_generic
Other Network devices
=====================
<none>
Crypto devices using DPDK-compatible driver
===========================================
<none>
Crypto devices using kernel driver
==================================
<none>
Other Crypto devices
====================
<none>
Eventdev devices using DPDK-compatible driver
=============================================
<none>
Eventdev devices using kernel driver
====================================
<none>
Other Eventdev devices
======================
<none>
Mempool devices using DPDK-compatible driver
============================================
<none>
Mempool devices using kernel driver
===================================
<none>
Other Mempool devices
=====================
<none>
可以看到现在没有绑定到DPDK的网络接口,现在将eth0和eth1绑定到DPDK:
# 需要先down掉,不然没办法绑定成功
ifconfig eth0 down
./dpdk-devbind.py --bind=uio_pci_generic eth0
./dpdk-devbind.py --status
ifconfig eth1 down
./dpdk-devbind.py --bind=uio_pci_generic eth1
./dpdk-devbind.py --status
# ========================output info====================
Network devices using DPDK-compatible driver
============================================
0000:82:00.0 'BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller 16d7' drv=uio_pci_generic unused=bnxt_en
0000:82:00.1 'BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller 16d7' drv=uio_pci_generic unused=bnxt_en
Network devices using kernel driver
===================================
0000:01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eth3 drv=ixgbe unused=uio_pci_generic
0000:01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection 10fb' if=eth2 drv=ixgbe unused=uio_pci_generic
0000:06:00.0 'I350 Gigabit Network Connection 1521' if=eth4 drv=igb unused=uio_pci_generic *Active*
0000:06:00.1 'I350 Gigabit Network Connection 1521' if=eth5 drv=igb unused=uio_pci_generic
Other Network devices
=====================
<none>
Crypto devices using DPDK-compatible driver
===========================================
<none>
Crypto devices using kernel driver
==================================
<none>
Other Crypto devices
====================
<none>
Eventdev devices using DPDK-compatible driver
=============================================
<none>
Eventdev devices using kernel driver
====================================
<none>
Other Eventdev devices
======================
<none>
Mempool devices using DPDK-compatible driver
============================================
<none>
Mempool devices using kernel driver
===================================
<none>
Other Mempool devices
=====================
<none>
可以看出,现在eth0和eth1已经使用DPDK了。运行dmesg | tail
查看内核模块注册信息:
dmesg | tail
# ========================output info====================
[ 40.303396] device-mapper: uevent: version 1.0.3
[ 40.303493] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-devel@redhat.com
[ 46.110282] systemd[1]: apt-daily.timer: Adding 3h 7min 32.471933s random time.
[ 115.004038] usb 1-1.4: USB disconnect, device number 3
[ 2833.197700] IPv6: ADDRCONF(NETDEV_UP): eth4: link is not ready
[ 2836.888038] igb 0000:06:00.0 eth4: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 2836.888221] IPv6: ADDRCONF(NETDEV_CHANGE): eth4: link becomes ready
[ 2883.084037] bnxt_en 0000:82:00.1 eth0: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive & transmit
[ 2885.403482] bnxt_en 0000:82:00.0 eth1: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive & transmit
[ 2990.589091] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
2. 运行hello world example
在编译之前必须将RET_SDK
和RTE_TARGET
导入到环境变量中,其中RET_SDK
是DPDK的安装目录,RTE_TARGET
是DPDK目标环境目录。
export RTE_SDK=/home/dingfuxiao/dpdk-stable-17.11.3/
export RTE_TARGET=x86_64-native-linuxapp-gcc
在examples目录里面编译一个简单的应用:
cd examples/ helloworld/
make
cd build/
./helloworld
# ========================output info====================
EAL: Detected 32 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL: probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:06:00.1 on NUMA socket 0
EAL: probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:82:00.0 on NUMA socket 1
EAL: probe driver: 14e4:16d7 net_bnxt
PMD: Broadcom Cumulus driver bnxt
PMD: 1.10.1:214.4.65
PMD: Driver HWRM version: 1.8.2
PMD: BNXT Driver/HWRM API mismatch.
PMD: Firmware API version is newer than driver.
PMD: The driver may be missing features.
PMD: bnxt found at mem c8210000, node addr 0x7ff520200000M
EAL: PCI device 0000:82:00.1 on NUMA socket 1
EAL: probe driver: 14e4:16d7 net_bnxt
PMD: 1.10.1:214.4.65
PMD: Driver HWRM version: 1.8.2
PMD: BNXT Driver/HWRM API mismatch.
PMD: Firmware API version is newer than driver.
PMD: The driver may be missing features.
PMD: bnxt found at mem c8200000, node addr 0x7ff520312000M
hello from core 1
hello from core 2
hello from core 3
hello from core 4
hello from core 5
hello from core 6
hello from core 7
hello from core 8
hello from core 9
hello from core 10
hello from core 11
hello from core 12
hello from core 13
hello from core 14
hello from core 15
hello from core 16
hello from core 17
hello from core 18
hello from core 19
hello from core 20
hello from core 21
hello from core 22
hello from core 23
hello from core 24
hello from core 25
hello from core 26
hello from core 27
hello from core 28
hello from core 29
hello from core 30
hello from core 31
hello from core 0
可能出现的错误及解决方式:
make install T=x86_64-native-linuxapp-gcc
时候可能会出现的numa.h:No such file or directory
的错误,解决方法就是安装libnuma-dev依赖:
apt-get install libnuma-dev
除此之外,还可能安装的依赖如下:
apt-get build-dep linux
apt-get install libnuma-dev
apt-get install linux-headers-amd64
3. hello world详解
不管学啥,当然是从hello world开始了~,代码如下:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <errno.h> /* 记录系统的最后一次错误代码 */
#include <sys/queue.h>
#include <rte_memory.h>
#include <rte_memzone.h>
#include <rte_launch.h>
#include <rte_eal.h>
#include <rte_per_lcore.h>
#include <rte_lcore.h>
#include <rte_debug.h>
static int
lcore_hello(__attribute__((unused)) void *arg)
{
unsigned lcore_id;
lcore_id = rte_lcore_id();
printf("hello from core %u\n", lcore_id);
return 0;
}
int
main(int argc, char **argv)
{
int ret;
unsigned lcore_id;
ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_panic("Cannot init EAL\n");
/* call lcore_hello() on every slave lcore */
RTE_LCORE_FOREACH_SLAVE(lcore_id) {
rte_eal_remote_launch(lcore_hello, NULL, lcore_id);
}
/* call it on master lcore too */
lcore_hello(NULL);
rte_eal_mp_wait_lcore();
return 0;
}
hello world代码非常简单,代码详解如下:
- 第一步是调用
rte_eal_init
初始化EAL(Environment Abstraction Layer, 环境抽象层)。EAL在每一个slave核上都创建一个线程,并绑定CPU。 - 使用
RTE_LCORE_FOREACH_SLAVE
遍历分配给DPDK的slave CPU核心,然后调用rte_eal_mp_remote_launch
注册回调函数lcore_hello
函数打印。 - 在master lcore上调用
lcore_hello
。 -
rte_eal_mp_wait_lcore
等待所有的slave核心退出,然后自己再退出。
注:_attribute_((unused)):表示该函数或变量可能不使用,这个属性可以避免编译器产生警告信息
function api:
int rte_eal_init(int argc,
char ** argv)
初始化环境抽象层(EAL)。这个函数只在主lcore上执行,尽可能快地在应用程序的main()函数中执行。函数在调用main()之前完成初始化过程。它将从属lcore置于等待状态。
int rte_eal_remote_launch(lcore_function_t * f,
void * arg,
unsigned slave_id )
在另一个lcore上启动一个函数。仅在主lcore上执行。发送一条消息给处于等待状态的从属lcore(由slave_id标识)(在第一次调用rte_eal_init()后为真)。这可以通过首先调用rte_eal_wait_lcore(slave_id)来检查。
当远程lcore接收到消息时,它切换到运行状态,然后使用参数arg调用函数f。执行完成后,远程lcore切换到完成状态,f的返回值存储在一个本地变量中,使用rte_eal_wait_lcore()读取。
#define RTE_LCORE_FOREACH_SLAVE(i)
# value:
for (i = rte_get_next_lcore(-1, 1, 0);i<RTE_MAX_LCORE;i = rte_get_next_lcore(i, 1, 0))
宏浏览除主lcore之外的所有运行的lcore。
#define rte_panic(...)
提供不可恢复的关键错误通知,并非正常终止执行。显示格式字符串及其展开的参数(类似printf)。函数永远不会返回。
void rte_eal_mp_wait_lcore(void)
等待所有lcore完成它们的工作。仅在主lcore上执行。为每个lcore发出rte_eal_wait_lcore()。返回值将被忽略。在调用rte_eal_mp_wait_lcore()之后,调用者可以假设所有从lcore都处于等待状态。
环境抽象层(EAL):
环境适配层(Environment Abstraction Layer)提供了通用的接口来隐藏了环境细节,使得上层app和库无需考虑这些细节。 EAL提供的典型服务有:
- DPDK的加载和启动:DPDK和指定的程序链接成一个独立的进程,并以某种方式加载
- CPU亲和性和分配处理:DPDK提供机制将执行单元绑定到特定的核上,就像创建一个执行程序一样。
- 系统内存分配:EAL实现了不同区域内存的分配,例如为设备接口提供了物理内存。
- PCI地址抽象:EAL提供了对PCI地址空间的访问接口
- 跟踪调试功能:日志信息,堆栈打印、异常挂起等等。
- 公用功能:提供了标准libc不提供的自旋锁、原子计数器等。
- CPU特征辨识:用于决定CPU运行时的一些特殊功能,决定当前CPU支持的特性,以便编译对应的二进制文件。
- 中断处理:提供接口用于向中断注册/解注册回掉函数。
- 告警功能:提供接口用于设置/取消指定时间环境下运行的毁掉函数。
即,环境抽象层使得上层程序与底层资源如硬件和内存空间的访问解耦,并提供接口调用底层服务。
在Linux用户空间环境,DPDK APP通过pthread库作为一个用户态程序运行。 设备的PCI信息和地址空间通过 /sys 内核接口及内核模块如uio_pci_generic或igb_uio来发现获取的。 linux内核文档中UIO描述,设备的UIO信息是在程序中用mmap重新映射的。
网友评论