- Joined
- Feb 15, 2016
- Messages
- 239
- Reaction score
- 58
Deadlocks are really irritating bugs, and it looks like the latest update is causing a number of them. So I've figured this script might come in handy for other server operators.
Script is fairly basic: searches a jstack report for relevant entries suggesting a deadlock, then tarballs server logs, jstack, and jmap dumps. Also reports actions to syslog, and can optionally send an email.
We've been running this in production since April 2016, and so-far we've seen zero false-alarms, and it has proven quite handy at both gathering reports for Schine to further debug, and just simply quickly recovering from a deadlock without an admin/operator needing to touch anything.
Script is fairly basic: searches a jstack report for relevant entries suggesting a deadlock, then tarballs server logs, jstack, and jmap dumps. Also reports actions to syslog, and can optionally send an email.
We've been running this in production since April 2016, and so-far we've seen zero false-alarms, and it has proven quite handy at both gathering reports for Schine to further debug, and just simply quickly recovering from a deadlock without an admin/operator needing to touch anything.
Code:
#!/bin/bash
#
# Server deadlocked, grab jstack, logs, and package them for sending to Schine.
#
# License: CC0
#
# Warranty: no warranties, no promise of support.
#
# Dependencies:
# CentOS7
# Functional SSMTP install.
#
# To add this entire file to /usr/local/bin/restartCrashReportStarmade.sh
# Make executable, and then add the following to crontab
# */5 * * * * /usr/local/bin/restartCrashReportStarmade.sh
#
# CHANGELOG:
# 18APR2016 Created initial script
# 20JUL2016 Removed attempts at clean shutdown of deadlocked instance
# 21SEP2016 Added PAFKA syslog hooks
#
# TODO:
# Does not support spaces in file/folder names. Maybe look into this someday.
#
# Put your preferred values in the following fields.
emailAddressRecipient=""
emailAddressSender="[email protected]"
pathSaveReports="/path/to/save/reports/"
pathTemp="/path/to/temp/folder/"
pathSMLogs="/path/to/starmade/logs/"
timestamp=$(date +%s)
jmap `pgrep -f "StarMade.jar"` > ${pathTemp}jmap-${timestamp}.txt
sleep 2
jstack `pgrep -f "StarMade.jar"` > ${pathTemp}jstack-${timestamp}.txt
# Test for and auto-mitigate deadlocks
# There's no point in attempting clean shutdown, server is non-responsive w/ deadlock
if (( `grep "Found.*Java-level deadlock" ${pathTemp}jstack-${timestamp}.txt |wc -l` >= 1 )); then
kill -9 `pgrep -f "StarMade.jar"`
sleep 10
tar -czf ${pathSaveReports}deadlockDump-${timestamp}.tgz ${pathTemp} ${pathSMLogs}
fileReportTarball=`md5sum ${pathSaveReports}deadlockDump-${timestamp}.tgz |cut -d " " -f 1`
mv ${pathSaveReports}deadlockDump-${timestamp}.tgz ${pathSaveReports}${fileReportTarball}.tgz
printf "[PAFKA] Deadlock detected, forced unclean restart. Tarball at ${pathSaveReports}${fileReportTarball}" |& logger -t SMPrd
if [ ! -z "$emailAddressRecipient" ]; then
printf "From:<${emailAddressSender}>\nSubject: StarMade Deadlock forced restart \n\nStarMade Deadlock forced restart" | /usr/sbin/ssmtp -4 ${emailAddressRecipient}
printf "[PAFKA] Deadlock alert sent to ${emailAddressRecipient}" |& logger -t SMPrd
fi
fi
# We're done, cleanup
rm -f ${pathTemp}jstack-${timestamp}.txt ${pathTemp}jmap-${timestamp}.txt