Linecount

From software-crisis

This page is to measure the number of lines of source code of various types of program. That is useful for spotting extremely lean/minimal/simple programs and detecting insanity/bloat/excessive complexity.

The measurements are usually .c then .h.

If anything is wrong please feel free to update it or inform me.

Script[edit]

Here is a script to count lines of code:

   find "$1" -name "$2" -exec wc -l {} \; | awk '{ SUM += $0 } END { print SUM }'

Thanks to http://stackoverflow.com/questions/1358540/how-to-count-all-the-lines-of-code-in-a-directory-recursively/16212299#16212299

It's important not to use xargs: there might be more files than command line handles. This is silently ignored and you get wrong results.

Shells[edit]

In the case of scsh we measure .scm, .c, .h

   gnu bash:     138227,  13746
   zsh:          135589,   5698
   shivers scsh: 118475,  27131, 1985
   templeos:     119115,      0
   mirbsd mksh:   29223,   2562
   obsd ksh:      23700,   1439
   debian dash:   16503,   2084
   freebsd sh:    15453,   1622
   es shell:       9017,   1436
   plan9 rc:       5989,    327
   execline:       3794,    117

bash has the highest linecount here. I think that is part of why the shellshock vulnerability happened.

Kernels[edit]

   linux-4.6-rc5: 15441922, 3878574
   openbsd:        1963369,  911364
   xnu-3247.1.106: 1001825,  276698
   plan9:           229635,   25469
   gnu hurd:        226383,   94836
   haiku:           170842,   10483, 4613
   templeos:        119115,       0
   mezzano:          59763
   msdos 2:          51686
   sortix:           28838,   11103
   seL4:             25916,   13017
   freeDOS:          23944,    4924, 13192
   minix3:           19689,    4206
   toaruos:           9845,    9789, 381
   xv6:               8173,    1181

C Compilers[edit]

  • clang also includes LLVM and CFE, is also in C++.
  • gcc also contains some C++ code, this is in the third column (first are .c and .h).
  • These probably contain some assembly files, these were not included.
  • 8cc is only 7000 lines without tests. Maybe we should delete tests before counting?
   gcc-6.1.0:    2854201, 379272,   598
   clang-3.8.0:  1523165, 169972
   pcc-20160429:  112891,   9855
   tcc-0.9.26:     36689,  39835
   lcc-4.2:        25504,   3637
   8cc:            10874,    718

web engines[edit]

  • webkitgtk and dillo is c++
  • servo is in rust, we count .rs then: find . -name '*.rs' -exec grep 'unsafe' {} \; | wc
   webkitgtk-2.12.2:  1294484, 783733
   netsurf-all-3.5:    461755,  76886
   servo-...6254b401:  201351,   2682
   lynx2-8-8:          158075,  26888
   w3m-0.5.3:           57378,   5621
   dillo-3.0.5:         39804,  14784

build system[edit]

   cmake:            259555 lines of c
                     215398 lines of c++
                      89601 lines of cmake
   apache-ant-1.9.7: 268172 .java
   tup:              249258 lines of c
   scons:             99570 .py
   cabal:             86932 .hs
                     (77950 with tests deleted)
   GNU make-4.2       39643 .c
   waf:               39205 .py
   meson:             35569 .py
   autoconf:          25240 .m4
   ninja:             16921 .cc (c++)
   openbsd make:      18481 .c
   fac:                6113 .python
                       5893 .c
   rake:               4662 .ruby
   plan 9 mk:          3673 .c

revision control systems[edit]

   svn:   1086689 lines of c
   bzr:    353643 lines of python, 111773 lines of c
   fossil: 373707 lines of c
   git:    210857 lines of c
   hg:     134862 lines of python
   cvs:    105141 lines of c
   darcs:   58111 lines of haskell
   gitfs:    3370 lines of c (for plan9)


Note from OriB, gitfs author, comparing it to libgit2:

In other words, depending on how you count, there’s between 1.2% and 2.4% the amount of C code to maintain, update, and understand in this implementation. And sure, this implementation will grow, but it’s likely to stay within the single digit percentages of libgit2’s size.

TLS implementations[edit]

   openssl       324264 c, 173849 perl
   gnutls        317542 c
   + nettle       66007 c, 14300 asm
   libressl      316954 c, 76132 perl
   boringssl     129458 c, 59317 c++, 78461 h
   Botan          93713 c++, 48248 h
   tlse           43486 c
   + libtomcrypt  77221 c
   forge          46175 js
   golang crypto/ 44365 go, 12406 s
   mitls F*       31774 fst, 8063 fs
   s2n            21106 c   (* depends on openssl/libcrypto)
   ocaml-tls      18406 ml  (including nocrypto and x509)
   hs-tls         10625 hs

See also: https://www.cryptologie.net/article/457/about-disco-again/

Misc. Case Studies[edit]

redis vs pydis[edit]

This is a good data point, a lot of our beliefs about software are based on "educated guesses" (translation we just have no clue and make everything up).

redis is 100,000 lines of .c code plus 50,000 lines of deps (jemalloc mostly, and lua, and then linenoise which is tiny). It runs (roughly) 2x the speed of pydis.

pydis is 250 lines of .py which is very impressive.. but it runs on top of python which is 400,000 lines of .c code and 777,460 lines of .py

I would like to see the kind of performance a golang implementation in roughly 250 lines would get. How close to 1.0x performance might it achieve?