Stefan Boos

My personal GitHub pages

Follow me on GitHub

Applied Software Forensics - Hotspot Analysis for HospitalRun

21 Feb 2022

Table of Contents generated with DocToc

Results

Let me start with showing the results of the hotspot analysis. You can reproduce these results on your own as described in the next sections.

The following module versions were used to create the hotspot analysis:

These tags were associated with the newest versions of frontend and server as of Feb. 21, 2022.

The analysis time period spans one year - from Nov. 7, 2019 up to and including Nov. 7, 2020.

What has been excluded from the analysis?

  • CHANGELOG.md - is touched by every release tag.
  • README.md - is touched frequently but updates are usually directly connected to code changes.
  • package.json - a huge number of commits changed this file in order to keep dependencies up to date. The files themselves are rather trivial.

Hotspots Overview

This interactive diagram shows the hotspots of HospitalRun as large, dark red circles.

Top 10 Hotspots

The following table shows the top 10 hotspots in HospitalRun.

Hotspots
Revisions LOC Module
46 165 frontend/src/patients/view/ViewPatient.tsx
39 141 frontend/src/__tests__/HospitalRun.test.tsx
36 228 frontend/src/patients/patient-slice.ts
34 65 frontend/src/HospitalRun.tsx
34 273 frontend/src/__tests__/patients/view/ViewPatient.test.tsx
30 104 frontend/src/patients/related-persons/RelatedPersonTab.tsx
28 367 frontend/src/__tests__/patients/patient-slice.test.ts
27 129 frontend/src/user/user-slice.ts
27 141 frontend/src/__tests__/patients/new/NewPatient.test.tsx
27 103 frontend/src/patients/new/NewPatient.tsx

Executing a Hotspot Analysis For Individual Repositories

Prerequisites

Before you can start you need the following tools in a folder:

In the following I have these tools in $HOME/source/learn/your-code-as-a-crime-scene.

Prepare Complexity (LOC) and Effort (Change Frequencies)

# Prepare target folder
mkdir analysis
echo "package-lock.json" > analysis/cloc-exclude-files.txt
echo "package.json" >> analysis/cloc-exclude-files.txt
echo "CHANGELOG.md" >> analysis/cloc-exclude-files.txt
echo "README.md" >> analysis/cloc-exclude-files.txt

# Select which module you want to analyze:
export SUT=server

# Collect git history
cd "$SUT"
git log --pretty=format:'[%h] %an %ad %s' --date=short --numstat --after=2019-11-06 --before=2020-11-08 > "../analysis/${SUT}_evo.log"

# Count lines of code
cloc ./ --by-file --csv --exclude-dir=node_modules,bin,.nyc_output --exclude-list-file=../analysis/cloc-exclude-files.txt --quiet "--report-file=../analysis/${SUT}_lines.csv"
# Note: For me this command showed errors for files named acorn-loose.js, acorn-loose.mjs, acorn-loose.es.js

# Inspect the data
cd ../analysis
docker run -v "$PWD":/data -it code-maat-app -l "/data/${SUT}_evo.log" -c git -a summary

# Collect change frequencies
docker run -v "$PWD":/data -it code-maat-app -l "/data/${SUT}_evo.log" -c git -a revisions > "${SUT}_freqs.csv"

# Remove excluded files from change frequencies
for FILE in $(cat cloc-exclude-files.txt); do sed -i '' "/$FILE,/d" "${SUT}_freqs.csv"; done

Merge complexity (LOC) and effort

# Set the folder containing your https://github.com/adamtornhill/maat-scripts python3 branch checkout
export MAAT_SCRIPTS=$HOME/source/learn/your-code-as-a-crime-scene/maat-scripts
python "$MAAT_SCRIPTS/merge/merge_comp_freqs.py" "${SUT}_freqs.csv" "${SUT}_lines.csv" > hotspots.csv

Visualize hotspots

# Set the folder containing Adam Tornhill's sample zip (extracted) from https://adamtornhill.com/code/crimescenetools.htm
export MAAT_SAMPLE=$HOME/source/learn/your-code-as-a-crime-scene/sample
python "$MAAT_SCRIPTS/transform/csv_as_enclosure_json.py" --structure "${SUT}_lines.csv" --weights "${SUT}_freqs.csv" --weightcolumn 1 > "${SUT}_hotspot_proto.json"
mkdir d3
cp "$MAAT_SAMPLE/hibernate/d3/d3.min.js" "$MAAT_SAMPLE/hibernate/d3/LICENSE" ./d3/
cp "$MAAT_SAMPLE/hibernate/hibzoomable.html" "./$SUT.html"
sed -i '' "s/hib_hotspot_proto/${SUT}_hotspot_proto/g" "./${SUT}.html"
python -m http.server 8888

# Now open http://localhost:8888/

Executing a Combined Hotspot Analysis For Multiple Repositories

Note: In the following I am using the GNU version of head. On macOS you can install these using brew install coreutils. Then use the GNU version of the respective tool by adding the g prefix. On Linux just replace ghead by head in the following.

In the previous section we have created multiple git history files named *_evo.log. In this section we will run a hotspot analysis on a set of repositories.

# Combine the analysis results into a single file
# and concatenate the lines of code into a single file
echo "entity,n-revs" > all_freqs.csv
echo "language,filename,blank,comment,code" > all_lines.csv

for SUT in frontend server components; do \
  awk "FNR > 1 {print \"$SUT/\" \$0}" ${SUT}_freqs.csv >> all_freqs.csv; \
  ghead -n-1 ${SUT}_lines.csv | awk -F "," "FNR > 1 {path=\$2; sub(/\\.\\//, \"$SUT/\", path); print \$1 \",\" path \",\" \$3 \",\" \$4 \",\" \$5}" >> all_lines.csv; \
done

Now export SUT=all and apply the code from sections Merge complexity (LOC) and effort and Visualize hotspots to all_freqs.csv and all_lines.csv.