Applied Software Forensics - Hotspot Analysis for HospitalRun
21 Feb 2022
Table of Contents generated with DocToc
- Results
- Executing a Hotspot Analysis For Individual Repositories
- Executing a Combined Hotspot Analysis For Multiple Repositories
Results
Let me start with showing the results of the hotspot analysis. You can reproduce these results on your own as described in the next sections.
The following module versions were used to create the hotspot analysis:
- HospitalRun Frontend v2.0.0-alpha.7 (Nov. 7, 2020)
- HospitalRun Server v2.0.0-alpha.5 (Nov. 7, 2020)
- HospitalRun Components v3.0.3 (Oct. 24, 2020) - used by frontend v2.0.0-alpha7.
These tags were associated with the newest versions of frontend and server as of Feb. 21, 2022.
The analysis time period spans one year - from Nov. 7, 2019 up to and including Nov. 7, 2020.
What has been excluded from the analysis?
CHANGELOG.md
- is touched by every release tag.README.md
- is touched frequently but updates are usually directly connected to code changes.package.json
- a huge number of commits changed this file in order to keep dependencies up to date. The files themselves are rather trivial.
Hotspots Overview
This interactive diagram shows the hotspots of HospitalRun as large, dark red circles.
Top 10 Hotspots
The following table shows the top 10 hotspots in HospitalRun.
Revisions | LOC | Module |
---|---|---|
46 | 165 | frontend/src/patients/view/ViewPatient.tsx |
39 | 141 | frontend/src/__tests__/HospitalRun.test.tsx |
36 | 228 | frontend/src/patients/patient-slice.ts |
34 | 65 | frontend/src/HospitalRun.tsx |
34 | 273 | frontend/src/__tests__/patients/view/ViewPatient.test.tsx |
30 | 104 | frontend/src/patients/related-persons/RelatedPersonTab.tsx |
28 | 367 | frontend/src/__tests__/patients/patient-slice.test.ts |
27 | 129 | frontend/src/user/user-slice.ts |
27 | 141 | frontend/src/__tests__/patients/new/NewPatient.test.tsx |
27 | 103 | frontend/src/patients/new/NewPatient.tsx |
Executing a Hotspot Analysis For Individual Repositories
Prerequisites
Before you can start you need the following tools in a folder:
- Clone Github: adamtornhill / code-maat and build the docker image according to the instructions
- Clone Github: adamtornhill / maat-scripts and checkout the
python3
branch - Download and extract the samples (
sample0.2.zip
) from Code as a Crime Scene: The Tools
In the following I have these tools in $HOME/source/learn/your-code-as-a-crime-scene
.
Prepare Complexity (LOC) and Effort (Change Frequencies)
# Prepare target folder
mkdir analysis
echo "package-lock.json" > analysis/cloc-exclude-files.txt
echo "package.json" >> analysis/cloc-exclude-files.txt
echo "CHANGELOG.md" >> analysis/cloc-exclude-files.txt
echo "README.md" >> analysis/cloc-exclude-files.txt
# Select which module you want to analyze:
export SUT=server
# Collect git history
cd "$SUT"
git log --pretty=format:'[%h] %an %ad %s' --date=short --numstat --after=2019-11-06 --before=2020-11-08 > "../analysis/${SUT}_evo.log"
# Count lines of code
cloc ./ --by-file --csv --exclude-dir=node_modules,bin,.nyc_output --exclude-list-file=../analysis/cloc-exclude-files.txt --quiet "--report-file=../analysis/${SUT}_lines.csv"
# Note: For me this command showed errors for files named acorn-loose.js, acorn-loose.mjs, acorn-loose.es.js
# Inspect the data
cd ../analysis
docker run -v "$PWD":/data -it code-maat-app -l "/data/${SUT}_evo.log" -c git -a summary
# Collect change frequencies
docker run -v "$PWD":/data -it code-maat-app -l "/data/${SUT}_evo.log" -c git -a revisions > "${SUT}_freqs.csv"
# Remove excluded files from change frequencies
for FILE in $(cat cloc-exclude-files.txt); do sed -i '' "/$FILE,/d" "${SUT}_freqs.csv"; done
Merge complexity (LOC) and effort
# Set the folder containing your https://github.com/adamtornhill/maat-scripts python3 branch checkout
export MAAT_SCRIPTS=$HOME/source/learn/your-code-as-a-crime-scene/maat-scripts
python "$MAAT_SCRIPTS/merge/merge_comp_freqs.py" "${SUT}_freqs.csv" "${SUT}_lines.csv" > hotspots.csv
Visualize hotspots
# Set the folder containing Adam Tornhill's sample zip (extracted) from https://adamtornhill.com/code/crimescenetools.htm
export MAAT_SAMPLE=$HOME/source/learn/your-code-as-a-crime-scene/sample
python "$MAAT_SCRIPTS/transform/csv_as_enclosure_json.py" --structure "${SUT}_lines.csv" --weights "${SUT}_freqs.csv" --weightcolumn 1 > "${SUT}_hotspot_proto.json"
mkdir d3
cp "$MAAT_SAMPLE/hibernate/d3/d3.min.js" "$MAAT_SAMPLE/hibernate/d3/LICENSE" ./d3/
cp "$MAAT_SAMPLE/hibernate/hibzoomable.html" "./$SUT.html"
sed -i '' "s/hib_hotspot_proto/${SUT}_hotspot_proto/g" "./${SUT}.html"
python -m http.server 8888
# Now open http://localhost:8888/
Executing a Combined Hotspot Analysis For Multiple Repositories
Note: In the following I am using the GNU version of head
. On macOS you can install these using brew install coreutils
. Then use the GNU version of the respective tool by adding the g
prefix. On Linux just replace ghead
by head
in the following.
In the previous section we have created multiple git history files named *_evo.log
. In this section we will run a hotspot analysis on a set of repositories.
# Combine the analysis results into a single file
# and concatenate the lines of code into a single file
echo "entity,n-revs" > all_freqs.csv
echo "language,filename,blank,comment,code" > all_lines.csv
for SUT in frontend server components; do \
awk "FNR > 1 {print \"$SUT/\" \$0}" ${SUT}_freqs.csv >> all_freqs.csv; \
ghead -n-1 ${SUT}_lines.csv | awk -F "," "FNR > 1 {path=\$2; sub(/\\.\\//, \"$SUT/\", path); print \$1 \",\" path \",\" \$3 \",\" \$4 \",\" \$5}" >> all_lines.csv; \
done
Now export SUT=all
and apply the code from sections Merge complexity (LOC) and effort and Visualize hotspots to all_freqs.csv
and all_lines.csv
.