UDP服务器性能优化:Perf和GCP的对比

RTC服务器是UDP协议,存在以下几个难点:

  1. UDP包数目众多,包普遍比较小。比如一个视频关键帧,可能会被分成几十个UDP发送。比如每个Opus包,几十到一百多字节不等。
  2. 不同协议需要复用端口(才能支持K8S云原生平台),每个包都需要找到对应的Session处理,客户端地址可能还会变更。
  3. 高实时性,每个Session要即时的收发数据,不能做主动聚集包后收发,每个Session短时间就一两个包处理,没有太多可以批量处理的包。
  4. 内核对UDP协议的性能优化,不如TCP高,优化方式也不如TCP多。
  5. 需要加密和解密,除了CPU消耗,还导致内存拷贝。

尽管这样,还是有不少可以做的,详细可以看下面的链接:

优化过程中,最关键的是压测工具srs-bench,以及Perf+GCP

发现Perf和GCP的数据有点差距,比如67%左右CPU使用时:

top - 14:58:57 up 25 days,  1:58,  4 users,  load average: 0.66, 0.76, 0.73
Tasks:  92 total,   2 running,  90 sleeping,   0 stopped,   0 zombie
%Cpu(s): 30.1 us,  5.1 sy,  0.0 ni, 61.8 id,  0.0 wa,  0.0 hi,  3.1 si,  0.0 st
KiB Mem :  8008964 total,   460028 free,  1390824 used,  6158112 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  6311680 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 8375 root       0 -20 1120556 992436   4192 R  68.1 12.4  24:14.17 srs
 8462 root      20   0  312104  36364   3800 S   1.0  0.5   0:25.25 perf
 6745 root      20   0  150332   6664   2380 S   0.7  0.1   0:15.11 dstat
    6 root      20   0       0      0      0 S   0.3  0.0  49:03.07 ksoftirqd/0

SRS的统计信息:

Hybrid cpu=70.00%,969MB, cid=47984,8, timer=24421,4394,19973, clock=0,45,4,0,0,0,0,0,0, 
objs=(pkt:0,raw:0,fua:0,msg:0,oth:401,buf:0,drop:0), 
cache=(pkt:20-31w,raw:109113-69w,fua:32227-41w,msg:1-41w,buf:19-34w)

RTC: Server conns=401, rpkts=(47734,rtp:47726,stun:1,rtcp:7), 
spkts=(1710,rtp:117,stun:1,rtcp:1592), rtcp=(pli:0,twcc:3982,rr:398), 
snk=(39826,a:19913,v:19913,h:0), rnk=(2,2,h:2,m:0), 
fid=(id:0,fid:5272,ffid:42461,addr:1,faddr:47734)

对比Perf的Top37函数:

Overhead  Shared Object       Symbol
  10.13%  srs.4.0.77          [.] sha1_block_data_order_avx2
   4.37%  srs.4.0.77          [.] bitvector_left_shift
   2.96%  libpthread-2.17.so  [.] __recvfrom_nocancel
   2.51%  libc-2.17.so        [.] __memcpy_ssse3
   2.51%  srs.4.0.77          [.] heap_delete
   2.49%  srs.4.0.77          [.] SrsHourGlass::cycle
   2.39%  srs.4.0.77          [.] SrsRtpPacket2::decode
   2.19%  srs.4.0.77          [.] SrsRtpObjectCacheManager<SrsRtpPacket2>::recycle
   2.16%  srs.4.0.77          [.] SrsRtpPacket2::recycle_shared_buffer
   1.79%  [kernel]            [k] finish_task_switch
   1.71%  srs.4.0.77          [.] SrsRtcPublishStream::on_rtp
   1.56%  [kernel]            [k] system_call_after_swapgs
   1.56%  [kernel]            [k] free_hot_cold_page
   1.52%  srs.4.0.77          [.] srtp_get_stream
   1.47%  [kernel]            [k] copy_user_enhanced_fast_string
   1.39%  srs.4.0.77          [.] aesni_ctr32_encrypt_blocks
   1.33%  srs.4.0.77          [.] operator delete[]
   1.32%  [kernel]            [k] _raw_spin_unlock_irqrestore
   1.19%  srs.4.0.77          [.] SrsRtcRecvTrack::do_check_send_nacks
   0.99%  srs.4.0.77          [.] OPENSSL_cleanse
   0.94%  srs.4.0.77          [.] SrsRtpRingBuffer::set
   0.93%  srs.4.0.77          [.] std::less<unsigned int>::operator()
   0.89%  srs.4.0.77          [.] srtp_unprotect
   0.88%  srs.4.0.77          [.] heap_insert
   0.85%  srs.4.0.77          [.] SrsRtcPublishStream::check_send_nacks
   0.85%  srs.4.0.77          [.] SrsRtpNackForReceiver::get_nack_seqs
   0.83%  srs.4.0.77          [.] SrsRtcPublishStream::get_audio_track
   0.81%  srs.4.0.77          [.] SrsRtcTrackDescription::has_ssrc
   0.72%  srs.4.0.77          [.] SrsResourceManager::find_by_fast_id
   0.69%  srs.4.0.77          [.] SrsSharedPtrMessage::count
   0.68%  srs.4.0.77          [.] EVP_MD_CTX_cleanup
   0.67%  srs.4.0.77          [.] SrsRtcPublishStream::do_on_rtp_plaintext
   0.64%  srs.4.0.77          [.] SrsBuffer::require
   0.63%  libc-2.17.so        [.] epoll_ctl
   0.61%  [kernel]            [k] udp_recvmsg
   0.60%  srs.4.0.77          [.] operator new[]
   0.58%  srs.4.0.77          [.] SrsUdpMuxListener::cycle

而GCP的top37函数:

[root@iZbp12af7ajnkuducj2u8rZ ~]# ./objs/pprof objs/srs gperf.srs.gcp 
(pprof) top37
Total: 17795 samples
    2397  13.5%  13.5%     2397  13.5% __recvfrom_nocancel
    1894  10.6%  24.1%     1894  10.6% sha1_block_data_order_avx2
     746   4.2%  28.3%      746   4.2% bitvector_left_shift
     501   2.8%  31.1%      511   2.9% heap_delete
     485   2.7%  33.8%     2315  13.0% SrsHourGlass::cycle
     440   2.5%  36.3%      440   2.5% __GI_epoll_wait
     429   2.4%  38.7%     1136   6.4% SrsRtpObjectCacheManager::recycle
     424   2.4%  41.1%      424   2.4% __memcpy_ssse3
     417   2.3%  43.5%      516   2.9% SrsRtpPacket2::recycle_shared_buffer
     373   2.1%  45.6%     1146   6.4% SrsRtpPacket2::decode
     321   1.8%  47.4%      321   1.8% __GI_epoll_ctl
     287   1.6%  49.0%     4914  27.6% SrsRtcPublishStream::on_rtp
     270   1.5%  50.5%      270   1.5% aesni_ctr32_encrypt_blocks
     245   1.4%  51.9%      698   3.9% SrsRtcRecvTrack::do_check_send_nacks
     218   1.2%  53.1%      218   1.2% srtp_get_stream
     200   1.1%  54.2%     1338   7.5% SrsRtpRingBuffer::set
     199   1.1%  55.3%      199   1.1% std::less::operator
     185   1.0%  56.4%      923   5.2% SrsRtcPublishStream::check_send_nacks
     180   1.0%  57.4%      180   1.0% heap_insert
     179   1.0%  58.4%      206   1.2% SrsRtpNackForReceiver::get_nack_seqs
     175   1.0%  59.4%      175   1.0% __sendto_nocancel
     150   0.8%  60.2%      237   1.3% SrsResourceManager::find_by_fast_id
     149   0.8%  61.1%      149   0.8% OPENSSL_cleanse
     143   0.8%  61.9%      143   0.8% srtp_unprotect
     141   0.8%  62.6%      141   0.8% std::vector::size
     130   0.7%  63.4%      130   0.7% EVP_MD_CTX_cleanup
     127   0.7%  64.1%      264   1.5% SrsRtcPublishStream::get_audio_track
     118   0.7%  64.8%      118   0.7% SrsFastCoroutine::pull
     118   0.7%  65.4%      118   0.7% SrsRtcTrackDescription::has_ssrc
     114   0.6%  66.1%      114   0.6% SrsBuffer::require
     113   0.6%  66.7%     3272  18.4% SrsRtcPublishStream::do_on_rtp_plaintext
     110   0.6%  67.3%      377   2.1% SrsRtpObjectCacheManager::allocate
     106   0.6%  67.9%     8985  50.5% SrsUdpMuxListener::cycle
      96   0.5%  68.4%      634   3.6% _st_vp_check_clock
      94   0.5%  69.0%     1151   6.5% SrsRtcConnection::notify
      84   0.5%  69.4%       84   0.5% PackedCache::KeyMatch (inline)
      84   0.5%  69.9%       84   0.5% std::_Rb_tree::_M_begin
上一篇:用户和组配置及文件管理命令


下一篇:FFmpeg进行音频的解码和播放