Skip to content

Instantly share code, notes, and snippets.

@wido
Created June 7, 2021 13:31
Show Gist options
  • Save wido/51cb9880d86f08f73766634d7f6df3f4 to your computer and use it in GitHub Desktop.
Save wido/51cb9880d86f08f73766634d7f6df3f4 to your computer and use it in GitHub Desktop.
BGP+EVPN+VXLAN with Apache CloudStack
#!/usr/bin/env bash
#
# Use BGP+EVPN for VXLAN with CloudStack instead of Multicast
#
# Place this file on all KVM hypervisors at /usr/share/modifyvxlan.sh
#
# More information about BGP and EVPN with FRR: https://vincent.bernat.ch/en/blog/2017-vxlan-bgp-evpn
#
DSTPORT=4789
# We bind our VXLAN tunnel IP(v4) on Loopback device 'lo'
DEV="lo"
usage() {
echo "Usage: $0: -o <op>(add | delete) -v <vxlan id> -p <pif> -b <bridge name> (-6)"
}
localAddr() {
local FAMILY=$1
if [[ -z "$FAMILY" || $FAMILY == "inet" ]]; then
ip -4 -o addr show scope global dev ${DEV} | awk 'NR==1 {gsub("/[0-9]+", "") ; print $4}'
fi
if [[ "$FAMILY" == "inet6" ]]; then
ip -6 -o addr show scope global dev ${DEV} | awk 'NR==1 {gsub("/[0-9]+", "") ; print $4}'
fi
}
addVxlan() {
local VNI=$1
local PIF=$2
local VXLAN_BR=$3
local FAMILY=$4
local VXLAN_DEV=vxlan${VNI}
local ADDR=$(localAddr ${FAMILY})
echo "local addr for VNI ${VNI} is ${ADDR}"
if [[ ! -d /sys/class/net/${VXLAN_DEV} ]]; then
ip -f ${FAMILY} link add ${VXLAN_DEV} type vxlan id ${VNI} local ${ADDR} dstport ${DSTPORT} nolearning
ip link set ${VXLAN_DEV} up
sysctl -qw net.ipv6.conf.${VXLAN_DEV}.disable_ipv6=1
fi
if [[ ! -d /sys/class/net/$VXLAN_BR ]]; then
ip link add name ${VXLAN_BR} type bridge
ip link set ${VXLAN_BR} up
sysctl -qw net.ipv6.conf.${VXLAN_BR}.disable_ipv6=1
fi
bridge link show|grep ${VXLAN_BR}|awk '{print $2}'|grep "^${VXLAN_DEV}\$" > /dev/null
if [[ $? -gt 0 ]]; then
ip link set ${VXLAN_DEV} master ${VXLAN_BR}
fi
}
deleteVxlan() {
local VNI=$1
local PIF=$2
local VXLAN_BR=$3
local FAMILY=$4
local VXLAN_DEV=vxlan${VNI}
ip link set ${VXLAN_DEV} nomaster
ip link delete ${VXLAN_DEV}
ip link set ${VXLAN_BR} down
ip link delete ${VXLAN_BR} type bridge
}
OP=
VNI=
FAMILY=inet
option=$@
while getopts 'o:v:p:b:6' OPTION
do
case $OPTION in
o) oflag=1
OP="$OPTARG"
;;
v) vflag=1
VNI="$OPTARG"
;;
p) pflag=1
PIF="$OPTARG"
;;
b) bflag=1
BRNAME="$OPTARG"
;;
6)
FAMILY=inet6
;;
?) usage
exit 2
;;
esac
done
if [[ "$oflag$vflag$pflag$bflag" != "1111" ]]; then
usage
exit 2
fi
lsmod|grep ^vxlan >& /dev/null
if [[ $? -gt 0 ]]; then
modprobe=`modprobe vxlan 2>&1`
if [[ $? -gt 0 ]]; then
echo "Failed to load vxlan kernel module: $modprobe"
exit 1
fi
fi
#
# Add a lockfile to prevent this script from running twice on the same host
# this can cause a race condition
#
LOCKFILE=/var/run/cloud/vxlan.lock
(
flock -x -w 10 200 || exit 1
if [[ "$OP" == "add" ]]; then
addVxlan ${VNI} ${PIF} ${BRNAME} ${FAMILY}
if [[ $? -gt 0 ]]; then
exit 1
fi
elif [[ "$OP" == "delete" ]]; then
deleteVxlan ${VNI} ${PIF} ${BRNAME} ${FAMILY}
fi
) 200>${LOCKFILE}
@KuasarCloud
Copy link

Thanks Wido for sharing this, can I ask you 2 specifics things regarding this?.

  1. My hypervisors are based on Ubuntu and the file modifyvxlan.sh already exists but in a different path, should I replace it or just copy in the path you suggest.
  2. Can I use one of the hyp as the route reflector and as vtep?, do you have a file example of the bgp evpn example I can use?

Many thanks!

@wido
Copy link
Author

wido commented Sep 1, 2021

Thanks Wido for sharing this, can I ask you 2 specifics things regarding this?.

1. My hypervisors are based on Ubuntu and the file modifyvxlan.sh already exists but in a different path, should I replace it or just copy in the path you suggest.

Just put the file in /usr/share and restart the cloudstack agent. It will detect the file there. (Also see the comments in the header of the file)

2. Can I use one of the hyp as the route reflector and as vtep?, do you have a file example of the bgp evpn example I can use?

Many thanks!

Not sure what you exactly mean. This is the relevant BGP configuration with FRR I use:

router bgp 4200100145
..
 address-family ipv4 unicast
  network 10.255.255.32/32
  neighbor uplinks activate
  neighbor uplinks next-hop-self
  neighbor uplinks soft-reconfiguration inbound
  neighbor uplinks route-map upstream-v4-out out
  neighbor uplinks route-map upstream-v4-in in
 exit-address-family
 !
 address-family ipv6 unicast
  network 2a05:xxxx:xxxx:2::32/128
  neighbor uplinks activate
  neighbor uplinks soft-reconfiguration inbound
  neighbor uplinks route-map upstream-v6-in in
  neighbor uplinks route-map upstream-v6-out out
 exit-address-family
 address-family l2vpn evpn
  neighbor uplinks activate
  advertise-all-vni
 exit-address-family

@KuasarCloud
Copy link

Thanks Wido, it's working well for me now, just one thing, if I delete a machine and recreate it with same IP, bgp entry
doesn't seem to update until I create another vxlan.

@wido
Copy link
Author

wido commented Oct 26, 2021

Thanks Wido, it's working well for me now, just one thing, if I delete a machine and recreate it with same IP, bgp entry doesn't seem to update until I create another vxlan.

Hypervisor you mean? Compute node. That is logical. See the script. It takes the IP which is on the loopback interface and is hardcoded to the VXLAN device.

Use unique IPs per host and do not try to change.

@hanisirfan
Copy link

@wido may I know is the script still works for ACS 4.19? Just to be straight, I don't know in detail of how VXLAN and EVPN works and now trying to implement in in my POC environment.

This is my FRR configuration on the host so far.

ip forwarding
ipv6 forwarding

interface ens3f0np0
    no ipv6 nd suppress-ra
exit

interface ens3f1np1
    no ipv6 nd suppress-ra
exit

router bgp 4200100005
    bgp router-id 10.0.118.1
    no bgp ebgp-requires-policy
    neighbor uplink peer-group
    neighbor uplink remote-as external
    neighbor ens3f0np0 interface peer-group uplink
    neighbor ens3f1np1 interface peer-group uplink
    address-family ipv4 unicast
        network 10.0.118.1/32
    exit-address-family
    address-family ipv6 unicast
        network 2407:e6c0:0:1::1/128
        neighbor uplink activate
        neighbor uplink soft-reconfiguration inbound
    exit-address-family
    address-family l2vpn evpn
        neighbor uplink activate
        neighbor uplink attribute-unchanged next-hop
        advertise-all-vni
    exit-address-family

I can see that I'm getting the EVPN routes from my leaf switches (but still don't understand what's RD RT etc).

I tried to use the script to create a VXLAN interface on cloudbr0 for the management network. Am I missing something?

[root@cmpt1 ~]# ./modifyvxlan.sh -o add -v 10027 -b cloudbr0
Usage: ./modifyvxlan.sh: -o <op>(add | delete) -v <vxlan id> -p <pif> -b <bridge name> (-6)
[root@cmpt1 ~]#

Thanks for the help :)

@wido
Copy link
Author

wido commented May 16, 2024

@wido may I know is the script still works for ACS 4.19? Just to be straight, I don't know in detail of how VXLAN and EVPN works and now trying to implement in in my POC environment.

This is my FRR configuration on the host so far.

ip forwarding
ipv6 forwarding

interface ens3f0np0
    no ipv6 nd suppress-ra
exit

interface ens3f1np1
    no ipv6 nd suppress-ra
exit

router bgp 4200100005
    bgp router-id 10.0.118.1
    no bgp ebgp-requires-policy
    neighbor uplink peer-group
    neighbor uplink remote-as external
    neighbor ens3f0np0 interface peer-group uplink
    neighbor ens3f1np1 interface peer-group uplink
    address-family ipv4 unicast
        network 10.0.118.1/32
    exit-address-family
    address-family ipv6 unicast
        network 2407:e6c0:0:1::1/128
        neighbor uplink activate
        neighbor uplink soft-reconfiguration inbound
    exit-address-family
    address-family l2vpn evpn
        neighbor uplink activate
        neighbor uplink attribute-unchanged next-hop
        advertise-all-vni
    exit-address-family

I can see that I'm getting the EVPN routes from my leaf switches (but still don't understand what's RD RT etc).

I tried to use the script to create a VXLAN interface on cloudbr0 for the management network. Am I missing something?

[root@cmpt1 ~]# ./modifyvxlan.sh -o add -v 10027 -b cloudbr0
Usage: ./modifyvxlan.sh: -o <op>(add | delete) -v <vxlan id> -p <pif> -b <bridge name> (-6)
[root@cmpt1 ~]#

Thanks for the help :)

Yes, this script should still work with 4.19, no problem at all.

Could you maybe ask this question on the CloudStack mailinglist? I can get back to it there. You can cc me in the e-mail :-)

@hanisirfan
Copy link

@wido I've posted a question on the mailing list. Thanks for checking it out later when you're free :)

I cc'ed you in there also.

@bradh352
Copy link

bradh352 commented Oct 6, 2024

Has any attempt been made to integrate this into cloudstack proper? If so, what were the issues?

I can see this obviously needs to do things like disable multicast for EVPN where cloudstack enables it in the provided script, but I'd think that could be fairly easily resolved. Ideas for resolution would be some sort of additional setting, or duplicating the vxlan protocol in cloudstack into a new protocol like "vxlan-evpn" which will simply pass a flag into the modifyvxlan.sh script to alter behavior.

I'd be interested in taking this on, but I'd like to know what kind of prior feedback there may have been.

@wido
Copy link
Author

wido commented Oct 8, 2024

Has any attempt been made to integrate this into cloudstack proper? If so, what were the issues?

I can see this obviously needs to do things like disable multicast for EVPN where cloudstack enables it in the provided script, but I'd think that could be fairly easily resolved. Ideas for resolution would be some sort of additional setting, or duplicating the vxlan protocol in cloudstack into a new protocol like "vxlan-evpn" which will simply pass a flag into the modifyvxlan.sh script to alter behavior.

I'd be interested in taking this on, but I'd like to know what kind of prior feedback there may have been.

Good suggestion! I have opened a Pull Request to at least add this script to the main repository: apache/cloudstack#9778

I have been using this script for 5y now without any issues, it just works as expected.

@cmachango
Copy link

Hi @wido, thank you for the great work you have on the script, it work flawlessly. VMs in Isolated or L2 Guest network type from different hypervisors can reach other without issues. But i have bumped into a challenge and i need guidance. When i create a Guest network of type Shared with public IP address(with VLAN/VNI), i cant get access to the internet or other physical machines in the same public IP range. Before the change from VLAN to VXLAN isolation method it was working fine.

@wido
Copy link
Author

wido commented Nov 18, 2024

Hi @wido, thank you for the great work you have on the script, it work flawlessly. VMs in Isolated or L2 Guest network type from different hypervisors can reach other without issues. But i have bumped into a challenge and i need guidance. When i create a Guest network of type Shared with public IP address(with VLAN/VNI), i cant get access to the internet or other physical machines in the same public IP range. Before the change from VLAN to VXLAN isolation method it was working fine.

What routers are you using? And they are the gateway for the shared network, correct? You would need to check if they learn the proper EVPN information through BGP. It seems that you might have an issue there.

This is not a script/CloudStack issue it seems, this is probably a local networking issue. (config).

Could be many, many, many things.

@cmachango
Copy link

Hi @wido, thank you for the great work you have on the script, it work flawlessly. VMs in Isolated or L2 Guest network type from different hypervisors can reach other without issues. But i have bumped into a challenge and i need guidance. When i create a Guest network of type Shared with public IP address(with VLAN/VNI), i cant get access to the internet or other physical machines in the same public IP range. Before the change from VLAN to VXLAN isolation method it was working fine.

What routers are you using? And they are the gateway for the shared network, correct? You would need to check if they learn the proper EVPN information through BGP. It seems that you might have an issue there.

This is not a script/CloudStack issue it seems, this is probably a local networking issue. (config).

Could be many, many, many things.

I use FRR as a router & BGP router reflector and VMs hosted in different KVM server can communicate with each other regardless of their underlying network. Internet access only work with Isolated network type since they are behind a VR but when using shared network with public IP address they can still communicate with each other but they cant break out to the internet.

@wido
Copy link
Author

wido commented Nov 18, 2024

Hi @wido, thank you for the great work you have on the script, it work flawlessly. VMs in Isolated or L2 Guest network type from different hypervisors can reach other without issues. But i have bumped into a challenge and i need guidance. When i create a Guest network of type Shared with public IP address(with VLAN/VNI), i cant get access to the internet or other physical machines in the same public IP range. Before the change from VLAN to VXLAN isolation method it was working fine.

What routers are you using? And they are the gateway for the shared network, correct? You would need to check if they learn the proper EVPN information through BGP. It seems that you might have an issue there.
This is not a script/CloudStack issue it seems, this is probably a local networking issue. (config).
Could be many, many, many things.

I use FRR as a router & BGP router reflector and VMs hosted in different KVM server can communicate with each other regardless of their underlying network. Internet access only work with Isolated network type since they are behind a VR but when using shared network with public IP address they can still communicate with each other but they cant break out to the internet.

The VMs in this shared network with a public address, can they reach their gateway? Or from the gateway router, can you reach (ping) these VMs?

@cmachango
Copy link

Hi @wido, thank you for the great work you have on the script, it work flawlessly. VMs in Isolated or L2 Guest network type from different hypervisors can reach other without issues. But i have bumped into a challenge and i need guidance. When i create a Guest network of type Shared with public IP address(with VLAN/VNI), i cant get access to the internet or other physical machines in the same public IP range. Before the change from VLAN to VXLAN isolation method it was working fine.

What routers are you using? And they are the gateway for the shared network, correct? You would need to check if they learn the proper EVPN information through BGP. It seems that you might have an issue there.
This is not a script/CloudStack issue it seems, this is probably a local networking issue. (config).
Could be many, many, many things.

I use FRR as a router & BGP router reflector and VMs hosted in different KVM server can communicate with each other regardless of their underlying network. Internet access only work with Isolated network type since they are behind a VR but when using shared network with public IP address they can still communicate with each other but they cant break out to the internet.

The VMs in this shared network with a public address, can they reach their gateway? Or from the gateway router, can you reach (ping) these VMs?

No they can not ping the gateway but then can ping each other.. am trying to figure out how does those VM attached to the br-vxlan2 break out of that bridge to the internet.

@wido
Copy link
Author

wido commented Nov 20, 2024

Hi @wido, thank you for the great work you have on the script, it work flawlessly. VMs in Isolated or L2 Guest network type from different hypervisors can reach other without issues. But i have bumped into a challenge and i need guidance. When i create a Guest network of type Shared with public IP address(with VLAN/VNI), i cant get access to the internet or other physical machines in the same public IP range. Before the change from VLAN to VXLAN isolation method it was working fine.

What routers are you using? And they are the gateway for the shared network, correct? You would need to check if they learn the proper EVPN information through BGP. It seems that you might have an issue there.
This is not a script/CloudStack issue it seems, this is probably a local networking issue. (config).
Could be many, many, many things.

I use FRR as a router & BGP router reflector and VMs hosted in different KVM server can communicate with each other regardless of their underlying network. Internet access only work with Isolated network type since they are behind a VR but when using shared network with public IP address they can still communicate with each other but they cant break out to the internet.

The VMs in this shared network with a public address, can they reach their gateway? Or from the gateway router, can you reach (ping) these VMs?

No they can not ping the gateway but then can ping each other.. am trying to figure out how does those VM attached to the br-vxlan2 break out of that bridge to the internet.

This is than an issue on your local network with the EVPN config on those gateways. Could be many things:

  • BGP to gateway routers not working properly
  • Incorrect route targets for the VNIs
  • Policy config issue
  • etc
  • etc

Again, this has nothing to do with this particular script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment