Skip to content

Instantly share code, notes, and snippets.

@mandarjog
Created November 21, 2023 18:08
Show Gist options
  • Save mandarjog/abe9aa83d0e31708f9782eeeaf60dc7a to your computer and use it in GitHub Desktop.
Save mandarjog/abe9aa83d0e31708f9782eeeaf60dc7a to your computer and use it in GitHub Desktop.
vpp crash
ep 21 15:24:02 ip-10-4-184-136 vpp[50360]: received signal SIGSEGV, PC 0x7fb729955590, faulting address 0x3c
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #0 0x00007fb728ea8ea2 0x7fb728ea8ea2
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #1 0x00007fb728a42520 0x7fb728a42520
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #2 0x00007fb729955590 virtio_show + 0x90
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #3 0x00007fb72995dec2 0x7fb72995dec2
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #4 0x00007fb728e19044 0x7fb728e19044
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #5 0x00007fb728e18d17 0x7fb728e18d17
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #6 0x00007fb728e183bd vlib_cli_input + 0x7d
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #7 0x00007fb728e96ac8 0x7fb728e96ac8
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #8 0x00007fb728e40557 0x7fb728e40557
Sep 21 15:24:02 ip-10-4-184-136 vpp[50360]: #9 0x00007fb72ab51914 0x7fb72ab51914
Sep 21 15:24:11 ip-10-4-184-136 systemd[1]: vpp.service: Main process exited, code=dumped, status=6/ABRT
Sep 21 15:24:11 ip-10-4-184-136 systemd[1]: vpp.service: Failed with result 'core-dump'
#1423 0x00007f77e624bec3 in format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1424 0x00007f77e5213342 in format_ip_adjacency () from /lib/x86_64-linux-gnu/libvnet.so.23.06
#1425 0x00007f77e62486a9 in va_format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1426 0x00007f77e624bec3 in format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1427 0x00007f77e4bf6634 in ?? () from /lib/x86_64-linux-gnu/libvnet.so.23.06
#1428 0x00007f77e62486a9 in va_format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1429 0x00007f77e624bec3 in format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1430 0x00007f77e469167a in format_vlib_trace () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1431 0x00007f77e62486a9 in va_format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1432 0x00007f77e624bec3 in format () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1433 0x00007f77e4692705 in ?? () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1434 0x00007f77e4619044 in ?? () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1435 0x00007f77e4618d17 in ?? () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1436 0x00007f77e46183bd in vlib_cli_input () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1437 0x00007f77e4696ac8 in ?? () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1438 0x00007f77e4640557 in ?? () from /lib/x86_64-linux-gnu/libvlib.so.23.06
#1439 0x00007f77e629a914 in clib_calljmp () from /lib/x86_64-linux-gnu/libvppinfra.so.23.06
#1440 0x00007f77a0961a80 in ?? ()
#1441 0x00007f77e463811a in ?? () from /lib/x86_64-linux-gnu/libvlib.so.23.06
@mandarjog
Copy link
Author

First give vpp gateway a second nic for the datapath. We need an interface for vpp to takeover and still be able to ssh in. Check that you’re setup by running:

$ sudo lshw -class network -businfo
Bus info          Device           Class          Description
=============================================================
pci@0000:27:00.0  enp39s0          network        Elastic Network Adapter (ENA)
pci@0000:28:00.0  enp40s0          network        Elastic Network Adapter (ENA)

Install VPP:

$ curl -s https://packagecloud.io/install/repositories/fdio/release/script.deb.sh | sudo bash
$ sudo apt-get update
$ sudo apt-get install vpp vpp-plugin core vpp-plugin-dpdk

Be careful to install at least these packages. You’ll get weird “config parsing” errors if, say, the dpdk plugin is not installed. You might see errors in the logs related to installing vpp-plugin-core like plugin/load: /usr/lib/x86_64-linux-gnu/vpp_plugins/nsh_plugin.so: undefined symbol:``gre4_input_nodeerrors. They seem to be harmless.

Installing the package triggers a bunch of setup activities, like configuring hugepages on the system and new group creation. At this point the vpp process is started via systemd but probably won’t run successfully. Check:

$ systemctl status vpp
× vpp.service - vector packet processing engine
     Loaded: loaded (/lib/systemd/system/vpp.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2023-09-15 13:57:17 UTC; 25min ago
    Process: 4451 ExecStartPre=/sbin/modprobe uio_pci_generic (code=exited, status=1/FAILURE)
    Process: 4452 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf (code=exited, status=1/FAILURE)
    Process: 4453 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
   Main PID: 4452 (code=exited, status=1/FAILURE)
        CPU: 8ms

Sep 15 13:57:17 ip-10-4-184-136 systemd[1]: vpp.service: Scheduled restart job, restart counter is at 5.
Sep 15 13:57:17 ip-10-4-184-136 systemd[1]: Stopped vector packet processing engine.
Sep 15 13:57:17 ip-10-4-184-136 systemd[1]: vpp.service: Start request repeated too quickly.
Sep 15 13:57:17 ip-10-4-184-136 systemd[1]: vpp.service: Failed with result 'exit-code'.
Sep 15 13:57:17 ip-10-4-184-136 systemd[1]: Failed to start vector packet processing engine.

There’s some funky configuration in /etc/vpp/startup.conf. I ended up minimizing the config to the following:

💡 Be careful to use the PCI device id for your dataplane interface
unix {
  nodaemon
  log /var/log/vpp/vpp.log
  full-coredump
  cli-listen /run/vpp/cli.sock
  gid vpp
}

api-trace {
  on
}

dpdk {
  dev 0000:28:00.0 {
    name vpp-eth0
  }
}

After that new config is in place, this should work:

$ sudo systemctl restart vpp
$ sudo vppctl show int
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count
local0                            0     down          0/0/0/0
vpp-eth0                          1     down         9000/0/0/0

Configure the vpp interface to have the private ip of the management interface.
Note the /20 which comes from the subnet its in. Then bring it up.

$ sudo vppctl set int ip address vpp-eth0 10.4.185.175/20
$ sudo vppctl set int state vpp-eth0 up
$ sudo vppctl show int addr
local0 (dn):
vpp-eth0 (up):
  L3 10.4.185.175/20

From here you should be able to ping between between the vpp interface and another host (and vis versa):

$ sudo vppctl ping 10.4.177.243
116 bytes from 10.4.177.243: icmp_seq=1 ttl=64 time=.2111 ms
116 bytes from 10.4.177.243: icmp_seq=2 ttl=64 time=.2113 ms
116 bytes from 10.4.177.243: icmp_seq=3 ttl=64 time=.1989 ms

Aborted due to a keypress.

Statistics: 3 sent, 3 received, 0% packet loss

To set up tunnels, start with these constants

ID=20
ID2=30
SPI=200
SPI2=300
KEY=5aafd8bd3f819c90a42f67e323b1e5b1
SALT=3d5ff08b
GW_PRIV=10.4.185.175
EAST_PRIV=10.4.180.2
WEST_PRIV=10.4.177.243

Configure west traffic generator ↔ vpp gateway tunnel on gateway (protection mode):

💡 The ipip tunnel `src` and `dst` IPs are significant! `src` must be local. See https://www.mail-archive.com/[email protected]/msg15498.html
$ sudo vppctl create ipip tunnel src $GW_PRIV dst $WEST_PRIV 
$ sudo vppctl ipsec sa add $ID spi $SPI crypto-key $KEY salt 0x$SALT crypto-alg aes-gcm-128
$ sudo vppctl ipsec tunnel protect ipip0 sa-in $ID sa-out $ID
$ sudo vppctl set interface unnumbered ipip0 use vpp-eth0
$ sudo vppctl set int state ipip0 up

Now the east traffic generator ↔ vpp gateway on gateway:

$ sudo vppctl create ipip tunnel src $GW_PRIV dst $EAST_PRIV 
$ sudo vppctl ipsec sa add $ID2 spi $SPI2 crypto-key $KEY salt 0x$SALT crypto-alg aes-gcm-128
$ sudo vppctl ipsec tunnel protect ipip1 sa-in $ID2 sa-out $ID2
$ sudo vppctl set interface unnumbered ipip1 use vpp-eth0
$ sudo vppctl set int state ipip1 up

Now the tunnel on west traffic generator:

$ sudo ip xfrm state add src $WEST_PRIV dst $GW_PRIV \
    proto esp spi $SPI \
    reqid $ID \
    mode tunnel \
    aead 'rfc4106(gcm(aes))' 0x${KEY}${SALT} 128
$ sudo ip xfrm state add src $GW_PRIV dst $WEST_PRIV \
    proto esp spi $SPI \
    reqid $ID \
    mode tunnel \
    aead 'rfc4106(gcm(aes))' 0x${KEY}${SALT} 128

$ sudo ip xfrm policy add src $WEST_PRIV dst $EAST_PRIV dir out \
      tmpl src $WEST_PRIV dst $GW_PRIV \
      proto esp reqid $ID mode tunnel
$ sudo ip xfrm policy add src $EAST_PRIV dst $WEST_PRIV dir fwd \
      tmpl src $GW_PRIV dst $WEST_PRIV \
      proto esp reqid $ID mode tunnel
$ sudo ip xfrm policy add src $EAST_PRIV dst $WEST_PRIV dir in \
      tmpl src $GW_PRIV dst $WEST_PRIV \
      proto esp reqid $ID mode tunnel

And on east traffic generator:

$ sudo ip xfrm state add src $EAST_PRIV dst $GW_PRIV \
    proto esp spi $SPI2 \
    reqid $ID2 \
    mode tunnel \
    aead 'rfc4106(gcm(aes))' 0x${KEY}${SALT} 128
$ sudo ip xfrm state add src $GW_PRIV dst $EAST_PRIV \
    proto esp spi $SPI2 \
    reqid $ID2 \
    mode tunnel \
    aead 'rfc4106(gcm(aes))' 0x${KEY}${SALT} 128

$ sudo ip xfrm policy add src $EAST_PRIV dst $WEST_PRIV dir out \
      tmpl src $EAST_PRIV dst $GW_PRIV \
      proto esp reqid $ID2 mode tunnel
$ sudo ip xfrm policy add src $WEST_PRIV dst $EAST_PRIV dir fwd \
      tmpl src $GW_PRIV dst $EAST_PRIV \
      proto esp reqid $ID2 mode tunnel
$ sudo ip xfrm policy add src $WEST_PRIV dst $EAST_PRIV dir in \
      tmpl src $GW_PRIV dst $EAST_PRIV \
      proto esp reqid $ID2 mode tunnel


To validate the current config, we can (on the west generator) run tcpdump with ping $EAST_PRIV:

$ sudo tcpdump -i any -n esp
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
13:32:25.430208 enp39s0 Out IP 10.4.177.243 > 10.4.185.175: ESP(spi=0x000000c8,seq=0x3), length 120
13:32:26.432687 enp39s0 Out IP 10.4.177.243 > 10.4.185.175: ESP(spi=0x000000c8,seq=0x4), length 120
13:32:27.456689 enp39s0 Out IP 10.4.177.243 > 10.4.185.175: ESP(spi=0x000000c8,seq=0x5), length 120

And on the gateway we see:

$ sudo vppctl clear trace
$ sudo vppctl trace add dpdk-input 5
$ sudo vppctl show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:01:28:997053: dpdk-input
  vpp-eth0 rx queue 0
  buffer 0x9fb69: current data 0, length 154, buffer-pool 0, ref-count 1, trace handle 0x0
                  ext-hdr-valid
  PKT MBUF: port 0, nb_segs 1, pkt_len 154
    buf_len 2176, data_len 154, ol_flags 0x80, data_off 128, phys_addr 0x8d9edac0
    packet_type 0x10 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
    rss 0x0 fdir.hi 0x0 fdir.lo 0x0
    Packet Offload Flags
      PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid
      PKT_RX_IP_CKSUM_NONE (0x0090) no IP cksum of RX pkt.
    Packet Types
      RTE_PTYPE_L3_IPV4 (0x0010) IPv4 packet without extension headers
  IP4: 16:9e:cc:3c:f1:a7 -> 16:bc:bb:68:27:ad
  IPSEC_ESP: 10.4.177.243 -> 10.4.185.175
    tos 0x00, ttl 64, length 140, checksum 0x0f33 dscp CS0 ecn NON_ECN
    fragment id 0xab62, flags DONT_FRAGMENT
00:01:28:997067: ethernet-input
  frame: flags 0x3, hw-if-index 1, sw-if-index 1
  IP4: 16:9e:cc:3c:f1:a7 -> 16:bc:bb:68:27:ad
00:01:28:997076: ip4-input-no-checksum
  IPSEC_ESP: 10.4.177.243 -> 10.4.185.175
    tos 0x00, ttl 64, length 140, checksum 0x0f33 dscp CS0 ecn NON_ECN
    fragment id 0xab62, flags DONT_FRAGMENT
00:01:28:997081: ip4-lookup
  fib 0 dpo-idx 7 flow hash: 0x00000000
  IPSEC_ESP: 10.4.177.243 -> 10.4.185.175
    tos 0x00, ttl 64, length 140, checksum 0x0f33 dscp CS0 ecn NON_ECN
    fragment id 0xab62, flags DONT_FRAGMENT
00:01:28:997085: ip4-receive
    IPSEC_ESP: 10.4.177.243 -> 10.4.185.175
      tos 0x00, ttl 64, length 140, checksum 0x0f33 dscp CS0 ecn NON_ECN
      fragment id 0xab62, flags DONT_FRAGMENT
00:01:28:997088: ipsec4-tun-input
  IPSec: remote:10.4.177.243 spi:200 (0x000000c8) sa:0 tun:0 seq 26 sa 1269849108
00:01:28:997090: esp4-decrypt-tun
  esp: crypto aes-gcm-128 integrity none pkt-seq 26 sa-seq 26 sa-seq-hi 0 pkt-seq-hi 0
00:01:28:997114: ip4-input-no-checksum
  ICMP: 10.4.177.243 -> 10.4.180.2
    tos 0x00, ttl 64, length 84, checksum 0xb3a7 dscp CS0 ecn NON_ECN
    fragment id 0x0d04, flags DONT_FRAGMENT
  ICMP echo_request checksum 0x5d6e id 41
00:01:28:997116: ip4-not-enabled
    ICMP: 10.4.177.243 -> 10.4.180.2
      tos 0x00, ttl 64, length 84, checksum 0xb3a7 dscp CS0 ecn NON_ECN
      fragment id 0x0d04, flags DONT_FRAGMENT
    ICMP echo_request checksum 0x5d6e id 41
00:01:28:997118: error-drop
  rx:ipip0
00:01:28:997119: drop
  ip4-local: unknown ip protocol

$ sudo vppctl show node counters
   Count                  Node                              Reason               Severity
         3             arp-reply                       ARP replies sent            info
         1             arp-reply             ARP request IP4 source address lear   info
         2          esp4-decrypt-tun                  ESP pkts received            info
         2          ipsec4-tun-input                good packets received          info
         1             ip4-glean                      ARP requests sent            info
         2             ip4-local                     unknown ip protocol           error

You can see in the trace the ICMP packets are being decrypted correctly. And the node counters also show successful tunnel decryption.

The same can be done on the east generator (pinging to west generator).

Finally, set up routes between east and west generators:

$ sudo vppctl ip route add $EAST_PRIV/32 via ipip1
$ sudo vppctl ip route add $WEST_PRIV/32 via ipip0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment