“Every K8s node is a Linux host. Every database that won’t start has a reason in the logs. Every ’no route to host’ is a firewall issue until proven otherwise.”

Context: This series is a deliberate return to Linux fundamentals before going deeper into kernel-level work — CentOS, MariaDB, SELinux, Ansible. The stuff that runs silently under every K8s cluster and eBPF probe. Owning it isn’t optional.

Four problems this week: a MariaDB instance that wouldn’t start due to a corrupted data directory, a backup automation task requiring passwordless SSH, a Tomcat WAR deployment with a custom port, and a two-layer port conflict that required both evicting a rogue Sendmail process and adding an iptables rule before Apache was reachable. All of it is documented here as a diagnostic reference.


Day 9: MariaDB Data Directory Recovery#

The Problem#

MariaDB was down. Systemd logs:

Aug 27 17:50:40 stdb01 systemd[1]: mariadb.service: Got notification message from PID 1457 (STATUS=MariaDB server is down)
Aug 27 17:50:40 stdb01 systemd[1]: mariadb.service: Main process exited, code=exited, status=0/SUCCESS
Aug 27 17:50:40 stdb01 systemd[1]: mariadb.service: Service restart not allowed.
Aug 27 17:50:40 stdb01 systemd[1]: Stopped MariaDB 10.5 database server.

Service restart not allowed is the key line — MariaDB exited cleanly from systemd’s perspective but the service wasn’t configured to restart on clean exit. First step: check why it exited.

Diagnosis#

sudo systemctl status mariadb
sudo journalctl -u mariadb -n 50
ls -ld /var/lib/mysql

/var/lib/mysql was missing entirely on first check, then on a second attempt the directory existed but was in a partially initialized state — not empty, not a valid MariaDB database. This is the failure mode mariadb-prepare-db-dir explicitly guards against.

mariadb-prepare-db-dir runs as ExecStartPre before the main process. Its logic:

  • Empty directory → initialize fresh system tables
  • Valid MariaDB directory → proceed
  • Partially initialized directory → refuse to start (intentional safety mechanism)

Half-initialized directories are worse than no directory — you can’t start, and you can’t safely auto-initialize because there might be partial data worth recovering. MariaDB does the right thing by refusing.

Resolution#

In this case there was nothing to recover — the directory was in a broken state from a failed previous setup. Clean slate:

sudo systemctl stop mariadb
sudo rm -rf /var/lib/mysql
sudo mkdir -p /var/lib/mysql
sudo chown -R mysql:mysql /var/lib/mysql
sudo chmod 750 /var/lib/mysql
sudo systemctl start mariadb
sudo systemctl status mariadb

The chown is not optional. MariaDB runs as the mysql system user and will not start if it can’t write to its data directory. Set ownership before starting, not after.

If you need to preserve data from a partially broken directory, use mysql_install_db to attempt reinitialization rather than wiping:

sudo mysql_install_db --user=mysql --datadir=/var/lib/mysql

Diagnostic Reference#

# Full status with log lines
sudo systemctl status mariadb -l

# Last 50 journal entries
sudo journalctl -u mariadb -n 50

# Check data directory state
ls -ld /var/lib/mysql
namei -l /var/lib/mysql

# Manual reinitialization (preserves existing files)
sudo mysql_install_db --user=mysql --datadir=/var/lib/mysql

Day 10: Backup Automation with Passwordless SSH#

The Task#

Automate backup of /var/www/html/ecommerce on App Server 2 — compress to zip, store locally at /backup/, copy to clint@stbkp01:/backup/ without password prompts.

Passwordless SSH Setup#

Passwordless SSH is the prerequisite. Without it the script blocks on a password prompt and automation breaks.

# Generate key pair on App Server 2 as the backup user
ssh-keygen -t rsa -b 4096

# Copy public key to backup server
ssh-copy-id clint@stbkp01

# Verify — this must return without a password prompt
ssh clint@stbkp01 "echo connected"

If ssh-copy-id isn’t available:

cat ~/.ssh/id_rsa.pub | ssh clint@stbkp01 "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

The Backup Script#

#!/bin/bash
# /scripts/ecommerce_backup.sh

SOURCE_DIR="/var/www/html/ecommerce"
BACKUP_DIR="/backup"
ARCHIVE_NAME="xfusioncorp_ecommerce.zip"
BACKUP_SERVER="clint@stbkp01"
REMOTE_BACKUP_DIR="/backup"

# Dependency check
if ! command -v zip &> /dev/null; then
    sudo yum install -y zip
fi

# Idempotent directory creation
mkdir -p "$BACKUP_DIR"

# Archive
cd /var/www/html
zip -r "$BACKUP_DIR/$ARCHIVE_NAME" ecommerce/

# Transfer
scp "$BACKUP_DIR/$ARCHIVE_NAME" "$BACKUP_SERVER:$REMOTE_BACKUP_DIR/"

if [ $? -eq 0 ]; then
    echo "[SUCCESS] $(date): Backup copied to $BACKUP_SERVER:$REMOTE_BACKUP_DIR/"
else
    echo "[ERROR] $(date): SCP failed"
    exit 1
fi
sudo chmod +x /scripts/ecommerce_backup.sh

The $? check after SCP is important — SCP exits non-zero silently on auth failures. Without it you’ll think the backup succeeded when it didn’t.

What production version of this needs:

  • Timestamped archives: xfusioncorp_ecommerce_$(date +%Y%m%d).zip
  • Rotation: find $BACKUP_DIR -name "*.zip" -mtime +30 -delete
  • Checksum verification after transfer: md5sum on both ends
  • Alert on failure: pipe errors to a monitoring endpoint or email

The script above works. The production version doesn’t trust the transfer silently.


Day 11: Tomcat WAR Deployment on a Custom Port#

The Task#

Install Tomcat on App Server 3, configure it on port 3004, deploy ROOT.war from the jump host.

Setup#

# SSH to the correct host first — verify before running anything
hostname
ssh banner@stapp03

sudo dnf install tomcat -y
sudo systemctl stop tomcat

Port Configuration#

sudo vi /etc/tomcat/server.xml

Find the HTTP connector and change the port:

<!-- Before -->
<Connector port="8080" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />

<!-- After -->
<Connector port="3004" protocol="HTTP/1.1"
           connectionTimeout="20000"
           redirectPort="8443" />

WAR Deployment#

# Find the webapps directory
sudo find / -type d -name webapps 2>/dev/null
# /var/lib/tomcat/webapps

# From jump host — copy WAR to the target server
scp /tmp/ROOT.war banner@stapp03:/tmp/

# On App Server 3 — move to webapps and set ownership
sudo mv /tmp/ROOT.war /var/lib/tomcat/webapps/
sudo chown tomcat:tomcat /var/lib/tomcat/webapps/ROOT.war

sudo systemctl start tomcat
sudo systemctl enable tomcat

WAR Naming Convention#

ROOT.war deploys to the root context — http://server:port/. Any other name deploys to a subpath — myapp.war becomes http://server:port/myapp/. This is Tomcat’s automatic deployment convention, not configuration.

Verification#

# Check Tomcat is listening on the right port
sudo netstat -tulnp | grep java

# Test from jump host
curl http://stapp03:3004

# If nothing comes back, check logs
sudo tail -f /var/log/tomcat/catalina.out

Common failure modes: wrong permissions on the WAR file, Tomcat not restarted after config change, port conflict with another service. Check in that order.


Day 12: Port Conflict Diagnosis — Sendmail vs Apache#

The Problem#

Apache unreachable on port 8085 from the jump host.

Layer 1 — Service Diagnosis#

sudo systemctl status httpd -l
Active: failed (Result: exit-code)
httpd[503]: (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:8085
httpd[503]: no listening sockets available, shutting down

Port 8085 is occupied. Apache can’t bind.

sudo netstat -tulnp | grep 8085
tcp  0  0  127.0.0.1:8085  0.0.0.0:*  LISTEN  442/sendmail: accep

Sendmail is holding the port. Nothing exotic — service misconfiguration where Sendmail was configured to listen on 8085.

sudo systemctl stop sendmail
sudo systemctl disable sendmail

# Verify port is clear
sudo netstat -tulnp | grep 8085

sudo systemctl start httpd
sudo netstat -tulnp | grep 8085
# tcp  0  0  0.0.0.0:8085  0.0.0.0:*  LISTEN  678/httpd

Apache is running. Test from jump host:

curl http://stapp01:8085
# curl: (7) Failed to connect to stapp01 port 8085: No route to host

Layer 2 — Firewall Diagnosis#

“No route to host” with a running service means the firewall is dropping the packet before it reaches the port. Service running locally does not mean service accessible remotely.

telnet stapp01 8085
# Trying 172.16.238.10...
# telnet: connect to address 172.16.238.10: No route to host

Confirmed at the network layer.

sudo iptables -L -n

Port 8085 had no ACCEPT rule in the INPUT chain.

sudo iptables -I INPUT 4 -p tcp --dport 8085 -j ACCEPT
sudo iptables -L -n | grep 8085
# ACCEPT  tcp  --  0.0.0.0/0  0.0.0.0/0  tcp dpt:8085
curl http://stapp01:8085
# HTML response

The Full Diagnostic Chain#

This is the pattern worth internalizing — two separate failure layers, each invisible until the previous one is resolved:

Service won't start
    → check logs (journalctl / systemctl status -l)
    → "address already in use" → find what's on the port (netstat)
    → stop conflicting service → start target service
    → local test passes (curl localhost:8085)

Remote test fails ("no route to host")
    → firewall is dropping the packet
    → check iptables rules (iptables -L -n)
    → add ACCEPT rule for the port
    → remote test passes

Every step in this chain is a distinct failure mode. Fixing the port conflict doesn’t fix the firewall. Fixing the firewall doesn’t fix a crashed service. Work through them in order.

iptables Note#

iptables -I INPUT 4 inserts at position 4, before any DROP rules that come later in the chain. If you append with -A instead of insert with -I, a DROP rule earlier in the chain will still block the traffic. Check the chain order before deciding where to insert.


Diagnostic Reference#

# Service state and recent logs
sudo systemctl status service_name -l
sudo journalctl -u service_name -n 50

# What's listening on a port
sudo netstat -tulnp | grep PORT
sudo lsof -i :PORT
sudo ss -tulnp | grep PORT

# Full path permission check (for database directories)
namei -l /var/lib/mysql

# iptables — view rules with line numbers
sudo iptables -L INPUT -n --line-numbers

# Insert rule at specific position
sudo iptables -I INPUT POSITION -p tcp --dport PORT -j ACCEPT

# Tomcat — check deployed applications
ls -l /var/lib/tomcat/webapps/
sudo tail -f /var/log/tomcat/catalina.out

What These Problems Have in Common#

MariaDB, Apache, Tomcat — three different services, same underlying discipline: read the logs before touching anything, understand what the error is actually telling you, fix the root cause not the symptom.

The Sendmail/iptables problem is the clearest example. The symptom was “Apache unreachable.” There were two root causes — a port conflict and a missing firewall rule — and they were stacked. Fixing one revealed the other. You have to work through the full chain.

This is what runs under Kubernetes. Every pod that fails to start, every service that can’t reach another service, every node that drops traffic — the diagnosis path is the same one. systemd, netstat, iptables, log files. The abstractions change, the primitives don’t.


Tags#

#Linux #Infrastructure #Networking #SysAdmin #Automation


About the Author#

Elijah Udom (elijahu) is an Infrastructure & Cloud Engineer based in Lagos, Nigeria. AWS, Kubernetes, eBPF security, AI/ML infrastructure. Building in the open.

Elijah Udom


← Previous: Days 5–8 | Next: Triforge →