Macvlan与ipvlan解析

本文主要就macvlan和ipvlan的工作模式以及差异做简要介绍;同时,为便于形象的理解,还会涉及到一些实际操作命令。

macvlan

这里的macvlan是linux kernel提供的一种network driver类型,它有别于传统交换机上提供的mac based vlan功能。可以在linux命令行执行lsmod | grep macvlan 查看当前内核是否加载了该driver;如果没有查看到,可以通过modprobe macvlan来载入,然后重新查看。

如果要查看内核源码,可以到以下路径:/drivers/net/macvlan.c

工作模式

  • Bridge:属于同一个parent接口的macvlan接口之间挂到同一个bridge上,可以二层互通(经过测试,发现这些macvlan接口都无法与parent 接口互通)。
  • VPEA:所有接口的流量都需要到外部switch才能够到达其他接口。
  • Private:接口只接受发送给自己MAC地址的报文。

    三种模式之间的差异,可以通过下图形象的理解:
    macvlan工作模式

实验时间

下面的实验创建了两个macvlan接口,分别放到两个netns中;然后验证这两个macvlan口之间客户互通。

先使用bridge mode创建两个macvlan接口,其parent接口都是enp0s8。

1
2
# ip link add link enp0s8 name macv1 type macvlan mode bridge
# ip link add link enp0s8 name macv2 type macvlan mode bridge

查看创建的结果(注意每个接口都有自己的mac地址):

1
2
3
4
5
6
7
# ip link 
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:31:1a:5d brd ff:ff:ff:ff:ff:ff
7: macv1@enp0s8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/ether 06:95:37:0c:83:36 brd ff:ff:ff:ff:ff:ff
8: macv2@enp0s8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/ether a2:e1:f1:e9:95:18 brd ff:ff:ff:ff:ff:ff

创建网络命名空间,并将接口加入进来,并通过dhcp server来分配IP地址:

1
2
3
4
5
# ip link set macv1 netns net-1
# ip link set macv2 netns net-2

# ip net exec net-1 dhclient macv1
# ip net exec net-2 dhclient macv2

发现macv1 ping不通宿主机接口enp0s8的地址:

1
2
3
4
5
# ip net exec net-1 ping 10.0.2.15
PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
^C
--- 10.0.2.15 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3008ms

但是可以ping通macv2的地址:

1
2
3
4
5
# ip net exec net-1 ping 10.0.2.17
PING 10.0.2.17 (10.0.2.17) 56(84) bytes of data.
64 bytes from 10.0.2.17: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 10.0.2.17: icmp_seq=2 ttl=64 time=0.052 ms
64 bytes from 10.0.2.17: icmp_seq=3 ttl=64 time=0.052 ms

ipvlan

下面介绍ipvlan接口,它也是linux kernel的一个network driver,具体代码见内核目录:/drivers/net/ipvlan/。与macvlan不同的是,macvlan是通过MAC查找macvlan设备,而ipvlan是通过IP查找ipvlan设备。

可以思考一下,ipvlan接口和直接在eth0上添加多个ip address的效果有什么区别?

工作模式

应用kernel-docs的一句话来介绍这两种模式的差异性:

  • l2: the slaves will RX/TX multicast and broadcast (if applicable)
    as well.
  • l3: the slaves
    will not receive nor can send multicast / broadcast traffic.

形象的理解见下图:
ipvlan工作模式

实验时间

以下实验分别创建了两个ipvlan接口,并放到两个netns中;ipvlan可以配置到不同的网段,它们彼此之间通过内部路由能够互访。

另外,L2 mode时,ipvlan接口能够接收到二层广播和组播报文;而L3 mode时,ipvlan接口不再处理所有二层报文。实验通过tcpdump抓ARP包的方式来验证该特性差异,通过实验,我们对ipvlan的两种模式应该有比较形象的理解。

L3 mode实验

创建两个ipvlan接口,都是使用l3mode;查看其mac地址都等同于其parent接口的mac地址:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# ip link add link enp0s3 ipvlan1 type ipvlan mode l3
# ip link add link enp0s3 ipvlan2 type ipvlan mode l3

# ip addr
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:d7:d6:a9 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
valid_lft 86193sec preferred_lft 86193sec
inet6 fe80::c6bb:4c9:37e8:7ac2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
7: ipvlan1@enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 08:00:27:d7:d6:a9 brd ff:ff:ff:ff:ff:ff
8: ipvlan2@enp0s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 08:00:27:d7:d6:a9 brd ff:ff:ff:ff:ff:ff

创建网络命名空间,并将接口放进命名空间:

1
2
3
4
5
# ip net add net-1
# ip net add net-2

# ip link set ipvlan1 netns net-1
# ip link set ipvlan2 netns net-2

由于MAC地址一样,因此无法通过dhclient来分配IP地址,只能手动配置:

1
2
# ip net exec net-1 ip addr add 10.0.2.18/24 dev ipvlan1
# ip net exec net-2 ip addr add 10.0.3.19/24 dev ipvlan2

添加默认路由

1
2
# ip net exec net-1 route add default dev ipvlan1
# ip net exec net-2 route add default dev ipvlan2

可以ping通另一个namespace中的ipvlan接口(它们位于不同的网段中)

1
2
3
4
5
6
7
8
# ip net exec net-2 ping 10.0.2.18
PING 10.0.2.18 (10.0.2.18) 56(84) bytes of data.
64 bytes from 10.0.2.18: icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from 10.0.2.18: icmp_seq=2 ttl=64 time=0.048 ms
^C
--- 10.0.2.18 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.048/0.060/0.072/0.012 ms

此时,在对端接口上抓不到ARP报文,说明二层广播和组播都不处理,工作在L3

1
2
3
4
# ip net exec net-1 tcpdump -ni ipvlan1 -p arp

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ipvlan1, link-type EN10MB (Ethernet), capture size 262144 bytes

无法ping通父接口enp0s3的IP

1
2
3
4
5
# ip net exec net-1 ping 10.0.2.15
PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
^C
--- 10.0.2.15 ping statistics ---
97 packets transmitted, 0 received, 100% packet loss, time 96106ms

L2 mode实验

通L3mode类似,实验流程不再过多介绍:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# ip link add link enp0s3 ipvlan1 type ipvlan mode l2
# ip link add link enp0s3 ipvlan2 type ipvlan mode l2

# ip net add net-1
# ip net add net-2

# ip link set ipvlan1 netns net-1
# ip link set ipvlan2 netns net-2

# ip net exec net-1 ip link set ipvlan1 up
# ip net exec net-2 ip link set ipvlan2 up

# ip net exec net-1 ip addr add 10.0.2.18/24 dev ipvlan1
# ip net exec net-2 ip addr add 10.0.3.18/24 dev ipvlan2

# ip net exec net-1 route add default dev ipvlan1
# ip net exec net-2 route add default dev ipvlan2

# ip net exec net-1 ip link set ipvlan1 up
# ip net exec net-2 ip link set ipvlan2 up

# ip net exec net-2 ip link set lo up
# ip net exec net-1 ip link set lo up

与L3 mode不通的是,发现可以抓取到ARP报文(二层广播)

1
2
3
4
5
6
7
# ip net exec net-1 tcpdump -ni ipvlan1 -p arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ipvlan1, link-type EN10MB (Ethernet), capture size 262144 bytes
21:06:27.461824 ARP, Request who-has 10.0.3.18 tell 10.0.2.18, length 28
21:06:27.461818 ARP, Request who-has 10.0.2.18 tell 10.0.3.18, length 28
21:06:27.461842 ARP, Reply 10.0.2.18 is-at 08:00:27:d7:d6:a9, length 28
21:06:27.461845 ARP, Reply 10.0.3.18 is-at 08:00:27:d7:d6:a9, length 28
0%