Linecount
This page is to measure the number of lines of source code of various types of program. That is useful for spotting extremely lean/minimal/simple programs and detecting insanity/bloat/excessive complexity.
The measurements are usually .c then .h.
If anything is wrong please feel free to update it or inform me.
- This site has line counts for lots of software: https://www.openhub.net/
- funny page where they boast about having loads of lines of code https://www.f35.com/about/life-cycle/software
Script[edit]
Here is a script to count lines of code:
find "$1" -name "$2" -exec wc -l {} \; | awk '{ SUM += $0 } END { print SUM }'
It's important not to use xargs: there might be more files than command line handles. This is silently ignored and you get wrong results.
Shells[edit]
In the case of scsh we measure .scm, .c, .h
gnu bash: 138227, 13746 zsh: 135589, 5698 shivers scsh: 118475, 27131, 1985 templeos: 119115, 0 mirbsd mksh: 29223, 2562 obsd ksh: 23700, 1439 debian dash: 16503, 2084 freebsd sh: 15453, 1622 es shell: 9017, 1436 plan9 rc: 5989, 327 execline: 3794, 117
bash has the highest linecount here. I think that is part of why the shellshock vulnerability happened.
Kernels[edit]
- for plan9: Just the stuff inside `plan9/sys/src/9/`
- for minix3: `minux/kernel`, which according to http://wiki.minix3.org/doku.php?id=developersguide:overviewofminixuserland contains the minix kernel, in the minix-specific portion of the NetBSD source tree that minix uses
- for Mezzano by froggey, .lisp only, count the whole OS (includes a compiler, desktop programs).
- sortix is .cpp.
- toaruos, haiku and freeDOS is .c, .h, .S (assembly).
- msdos 2.0 is thanks to http://www.computerhistory.org/atchm/microsoft-research-license-agreement-msdos-v1-1-v2-0/ and the code is in .ASM
- xnu-3247.1.106 fetched from http://opensource.apple.com/release/os-x-1011/
linux-4.6-rc5: 15441922, 3878574 openbsd: 1963369, 911364 xnu-3247.1.106: 1001825, 276698 plan9: 229635, 25469 gnu hurd: 226383, 94836 haiku: 170842, 10483, 4613 templeos: 119115, 0 mezzano: 59763 msdos 2: 51686 sortix: 28838, 11103 seL4: 25916, 13017 freeDOS: 23944, 4924, 13192 minix3: 19689, 4206 toaruos: 9845, 9789, 381 xv6: 8173, 1181
C Compilers[edit]
- clang also includes LLVM and CFE, is also in C++.
- gcc also contains some C++ code, this is in the third column (first are .c and .h).
- These probably contain some assembly files, these were not included.
- 8cc is only 7000 lines without tests. Maybe we should delete tests before counting?
gcc-6.1.0: 2854201, 379272, 598 clang-3.8.0: 1523165, 169972 pcc-20160429: 112891, 9855 tcc-0.9.26: 36689, 39835 lcc-4.2: 25504, 3637 8cc: 10874, 718
- "GCC Soars Past 14.5 Million Lines Of Code" - http://www.phoronix.com/scan.php?page=news_item&px=MTg3OTQ
web engines[edit]
- webkitgtk and dillo is c++
- servo is in rust, we count .rs then: find . -name '*.rs' -exec grep 'unsafe' {} \; | wc
webkitgtk-2.12.2: 1294484, 783733 netsurf-all-3.5: 461755, 76886 servo-...6254b401: 201351, 2682 lynx2-8-8: 158075, 26888 w3m-0.5.3: 57378, 5621 dillo-3.0.5: 39804, 14784
build system[edit]
cmake: 259555 lines of c 215398 lines of c++ 89601 lines of cmake apache-ant-1.9.7: 268172 .java tup: 249258 lines of c scons: 99570 .py cabal: 86932 .hs (77950 with tests deleted) GNU make-4.2 39643 .c waf: 39205 .py meson: 35569 .py autoconf: 25240 .m4 ninja: 16921 .cc (c++) openbsd make: 18481 .c fac: 6113 .python 5893 .c rake: 4662 .ruby plan 9 mk: 3673 .c
revision control systems[edit]
svn: 1086689 lines of c bzr: 353643 lines of python, 111773 lines of c fossil: 373707 lines of c git: 210857 lines of c hg: 134862 lines of python cvs: 105141 lines of c darcs: 58111 lines of haskell gitfs: 3370 lines of c (for plan9)
Note from OriB, gitfs author, comparing it to libgit2:
In other words, depending on how you count, there’s between 1.2% and 2.4% the amount of C code to maintain, update, and understand in this implementation. And sure, this implementation will grow, but it’s likely to stay within the single digit percentages of libgit2’s size.
TLS implementations[edit]
openssl 324264 c, 173849 perl gnutls 317542 c + nettle 66007 c, 14300 asm libressl 316954 c, 76132 perl boringssl 129458 c, 59317 c++, 78461 h Botan 93713 c++, 48248 h tlse 43486 c + libtomcrypt 77221 c forge 46175 js golang crypto/ 44365 go, 12406 s mitls F* 31774 fst, 8063 fs s2n 21106 c (* depends on openssl/libcrypto) ocaml-tls 18406 ml (including nocrypto and x509) hs-tls 10625 hs
See also: https://www.cryptologie.net/article/457/about-disco-again/
Misc. Case Studies[edit]
redis vs pydis[edit]
This is a good data point, a lot of our beliefs about software are based on "educated guesses" (translation we just have no clue and make everything up).
redis is 100,000 lines of .c code plus 50,000 lines of deps (jemalloc mostly, and lua, and then linenoise which is tiny). It runs (roughly) 2x the speed of pydis.
pydis is 250 lines of .py which is very impressive.. but it runs on top of python which is 400,000 lines of .c code and 777,460 lines of .py
I would like to see the kind of performance a golang implementation in roughly 250 lines would get. How close to 1.0x performance might it achieve?