Proxmox 8
These are my notes for building a new Proxmox 8 server.
Proxmox 8 Server Build (PVE2)
I’ve got new hardware and want to bring this server up at the same as as the original so need new IP addresses (which will become static)
Use balena etcher to create a USB bootable disk
Boot disks RAID1 - Mirror
RAID1 (Mirror) on m.2 drives
https://www.youtube.com/watch?v=lhRPhBDysVI
Going to separate out boot drive from data. So that if data drives fail I can boot.
# can see any errors
# zfs pool is called: rpool
zpool status
Networking
- Hostname FQDN - pve2.dave.home
- IP - 192.168.1.88 (was assigned by router DHCP, then made static). It may be easier to assign manually an IP next time outside the DHCP range.
- Gateway - 192.168.1.254 (the router)
- DNS - 192.168.1.254 (the router)
https://192.168.1.88:8006 - PM admin interface
Updates
Disable Enterprise repos
Add no subscription repos
PVE 8.2.4 running and updated on new hardware.
Data Disks - RAIDZ-1 - ZFS Datapool
I’ve got 4 data disks which I want to be running for resilience.
RAIDZ-1 (tolerates 1 drive loss) which needs at least 3 disks
I don’t need anything special (just want redundancy but not overly) so lets use RAIDZ-1
pve2, disks, zfs, create ZFS
- datapool - (tank is common name)
- compression on (saves on writes, and cpu is cheap)
- ashift - 12 (default)
zpool status
zfs list
Datadirectory on ZFS Datapool
DataDirectory
Now I can upload an ISO
Ubuntu 22.04.5 - my current standard build.
Create a VM
https://www.youtube.com/watch?v=sZcOlW-DwrU craft computing pm 8 - very good explanations as to what the different settings are.
BIOS
disks - give 80GB - on ZFS created with thin provisioning, so it only uses whats needed
max performance of the VM. Min of 4 threads/cores. I have 28 cores in 1 socket.
memory max - give 4GB at least only uses what needed (ballooning)
and
Start up the VM. It got an IP from the DHCP server of 192.168.1.251, and as I installed SSH I can just use my terminal to connect
# so pm can see the IP address
sudo apt install qemu-guest-agent
Firewall and DHCP Networking
I want to run multiple websites inside PM. I used to run pfSense for firewalls and internal DHCP.
Is it possible to use just the proxmox firewall and router DHCP?
Update - have decided to keep running pfSense because my DHCP server (router) is behaving strangely, and pfsense has worked for many years.
video - for running internal pm firewall this is good.
pfSense 2.7.2
I’m going to reinstall from scratch pfSense to understand it more
https://www.dlford.io/pfsense-nat-how-to-home-lab-part-3/ - great tutorial
Download pfSense - 2.7.2. Need to checkout (free) and unzip download into iso then upload to proxmox2. It was a BETA version but that was all I could get.
networking
proxmox2, system, network, create, linux bridge
- vmbr0 is there already. connected to en01 (wired ethernet). 192.168.1.88/24
- vmbr1 - comment: internal server network
reboot server
Create pfsense vm
create vm
Follow tutorial. TODO - see what seetting worked and tweak
Disk use datadirectory
WAN: vtnet0 LAN: vtnet1
Install CE
Disk - ZFS and GPT. Stripe. No redudancy.
Reboot
WAN: vtnet0 LAN: vtnet1 (don’t do this)
192.168.1.179 is allocated from my DHCP server to WAN side ie vtnet0
As pfSense uses 192.168.1.x by default on the LAN we can’t use this to start with.
Option 3? to reassign network ie disable LAN interface
Option 8 to go into shell
# turn off firewall (so can access admin from WAN)
# http://192.168.1.179
pfctl -d
# u: admin
# p: pfsense
- hostname: pfsense2
- domain: dave.home
- dns: 8.8.8.8, 1.1.1.1 (I’m was using 192.168.1.254 on pm host - overridden now)
Static IP - forcing just to use the DHCP allocated 192.168.1.179
Can’t Conntect
- 1) Assign interface - disable LAN
- 8) Shell - pfctld -d
- try another browser (Chrome does bad caching - Firefox is useful especially as we’re on http by default)
Add firewall rule to WAN to allow admin interface
Firewall > Aliases > Ports > Add
pfSense_Admin_Ports
80, 443, 22, 8080
Add a rule.
Couldn’t get the rule to reorder to go above teh
Interface, WAN
Uncheck block bogon networks
8080 for Admin Interface
So that 80 and 443 can be freed up for regular web traffic.
System, Advanced, Admin Access, TCP Port 8080, HTTPS
Add LAN Interface
Interfaces, Interface Assignments
vtnet1 (network port)
Interfaces, LAN, Enable
Static IPv4
172.16.44.1/24
System, Advanced, Networking, Kea DHCP (As ISC DHCP is Deprecated)
Services, DHCP Server, LAN
Range: 172.16.44.10 - 172.16.44.100 (I had to add the range then enable)
Enable
Make sure the proxmox side has the LAN range, so it can get access to the internet.
Template Server VM
Create a normal VM on the vmbr1 internal networking
I’ve got an IP address on 172.16.44.10
but can’t see the internet - was a problem with proxmox LAN range not enabled.
n: blue
u: dave
p: letmein
sudo apt update
sudo apt upgrade
sudo apt install -y qemu-guest-agent
sudo visudo
# add this under sudo ALL=(ALL:ALL_ ALL)
dave ALL=(ALL) NOPASSWD:ALL
Tick guest-trim
Shutdown VM, right click, Convert to Template
Create a new VM based off Template
Right click on template and create new VM
setup port forward so can SSH to it
Get MAC of new VM from UI
eg: BC:24:11:64:57:A8
Firewall, NAT, Port Forward
ssh pfsense2 -p 103
# run boilerplate
# https://www.dlford.io/secure-ssh-access-how-to-home-lab-part-5/#boilerplate
sudo bash -c "bash <(wget -qO- https://raw.githubusercontent.com/dlford/ubuntu-vm-boilerplate/master/run.sh)"
DHCP Reserve (so IP doesn’t change)
Services, DHCP Server, LAN, Edit Static Mapping (at bottom)
MAC and IP eg 172.16.44.103
Change the Port Forward
Firewall, NAT, Port Forward
Port forward for 80 and 443 from WAN (websites)
sudo apt install nginx
Lets open up 172.16.44.103 ports 80 and 443
Firewall, NAT, Port Forward (this adds associated filter rule)
Okay this did work by testing my hitting my public IP from http.
Mounts and Shares
# unmount from old PM
umount /dev/sdf
# or
umount /mnt/wd-2tb-usb
umount /mnt/windows-share
# mount back on old PM
mount /dev/sdd /mnt/black-256-stick
mount /dev/sdf /mnt/wd-2tb-usb
# new PM
# list usb devices - Western Digital Elements 2621 is my disk
lsusb
# look for sdb, sdc etc.. with correct size of disk - mine was sdc
# sde is 250GB stick
# sdf is 1.8TB drive
# this shows the mount if it is there
lsblk
# shows drives and mounted on
df -h
mkdir /mnt/black-256-stick
mkdir /mnt/wd-2tb-usb
mount /dev/sde /mnt/black-256-stick
mount /dev/sdf /mnt/wd-2tb-usb
8th Oct - having some strange issues with slowing down on backups (taking an hour) on my black-256-stick. The 2tb drive is okay.
Adding the mounted disk into PM
Survive a reboot
I’m not sure if windows share does survive a reboot. I had to do mount -h to get it to wake up again.
# find device eg /dev/sdf - this is my 2tb wd drive
lsblk
# find uuid of device rather than /dev/sdd (as this may change)
# eg ab6e004f-6c99-4d33-b145-4208201c404e
blkid
vi /etc/fstab
# UUID=<UUID> <mount_point> <filesystem_type> <options> <dump> <pass>
# dump 0.. used for backup utility
# pass 2... root filesystems use 1
# wd-2tb-usb
UUID=ab6e004f-6c99-4d33-b145-4208201c404e /mnt/wd-2tb-usb ext4 defaults,nofail 0 2
# to make sure system picks up new version of fstab
systemctl daemon-reload
# test the changes without rebooting
mount -a
# see if mount point is now active
df -h
Spin down USB External disk when not in use
Lets save energy (and less noise) by spinning down the drive when I don’t need it.
# check which drive in the usb to spin down.. /dev/sde
df -h
# put drive into standby mode
hdparm -y /dev/sde
# I go this error but it did go into standby mode
# issuing standby command
# SG_IO: bad/missing sense data, sb[]: f0 00 01 00 50 40 00 0a 00 00 00 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
It does seem to spin up again when I ask it to show backups
Lets see if on spin down state, it wakes up and does a backup at 0100 - yes it does.
# crontab -e
# add this line to crontab
0 3,15 * * * hdparm -y /dev/sde
New VM DHCP and NAT
It gave it a DCHP IP, so lets make it static.
1) Static IP - Services, DHCP Server, LAN, Edit Static Mapping (at bottom)
96:30:EC:A6:0C:93
172.16.44.104
2) Open so can SSH - Firewall, NAT, Port Forward
3) Open up port 80 to this new reverse proxy.
Change name of restored machine
# https://www.dlford.io/secure-ssh-access-how-to-home-lab-part-5/#boilerplate
sudo bash -c "bash <(wget -qO- https://raw.githubusercontent.com/dlford/ubuntu-vm-boilerplate/master/run.sh)"
# on dev - shortcut
sudo vim ~/.bash_aliases
alias 4='ssh pfsense2 -p 114'
Open share on Windows 10
right click on folder, properties, sharing, Advanced Sharing, Share this folder
create local user on win: bob password …. not an email user.. just local.
Windows features
Windows firewall - Allow an app through firewall. File Sharing both.
mkdir /mnt/windows-share
# I am inside proxmox in a 172.16 address range.. need a way to get out
mount -t cifs -o username=bob,password='xxxxxxx' //192.168.1.152/proxmox /mnt/windows-share
# automate mounting
vi /etc/fstab
//192.168.1.152/proxmox /mnt/windows-share cifs username=bob,password='secret',iocharset=utf8 0 0
# nofail - allow boot if mount not available
# automaout - systemd will try to mount the share on access rather than boot
//192.168.1.152/proxmox /mnt/windows-share cifs username=bob,password='secret',iocharset=utf8,nofail,x-systemd.automount 0 0
# **using this**
# notice no quotes on password
//192.168.1.152/proxmox /mnt/windows-share cifs username=bob,password=secret,iocharset=utf8,vers=3.0,nofail 0 0
# tests fstab
# good for reestablishing a windows share.
mount -a
# got some weird reboot issues...
# and could only see share when I manually when to it inside bash.. then proxmox could see it.
ZFS
- nvme0n1 - 465G
-
nvme1n1 - 465G
- sda 1tb
- sdb 1tb
- sdc 1tb
-
sdd 1td
- sde - 233G
- sdf 1.8TB
zpool status
- datapool - data
- rpool - mirror
# my rpool is full up (mirror)
df -k
du -h --max-depth=1
umount /dev/sde
# show mounts
mount
lsblk with black-256-stick and wd-2tb-usb mounted
SSD Wearout
2% wearout since 1st Sept (3 months).. went from 1 to 2% on 3rd Dec
3% now on 19th of Dec.
https://www.reddit.com/r/Proxmox/comments/ss07l4/guide_to_minimizing_ssdnvme_wearout_with_proxmox/
trim
/tmp/zfs_daily_report.sh
#!/bin/bash
# Variables
POOL_NAME="datapool"
LOG_FILE="/var/log/zfs_daily_iostat_$(date +%Y-%m-%d).log"
# Capture zpool I/O stats
zpool iostat -v $POOL_NAME > $LOG_FILE
echo "Daily ZFS I/O stats logged to $LOG_FILE"
cron.d
0 0 * * * /tmp/zfs_daily_report.sh
Lets monitor and see what read/writes are and get a baseline
other tips
https://www.reddit.com/r/Proxmox/comments/i3mupd/proxmox_standalone_or_cluster/
Disk space running out
No idea why I’m using 31GB of storage (out of 44GB assigned) - general ubuntu. With 12GB available. ie sitting at 74% used. My app fails when it is over 90% used (browsertrix webcrawler docker image)
Okay 70% now..
docker system prune
# 6.5GB in home dave23-6.5
# 4GB in the .cache
# pipenv - 1.4GB
du -h /home/dave/ | sort -rh | head -n 20
# where is the 16.5GB gone?
# lots in var
9.4G /var
6.0G /var/lib
5.1G /var/lib/docker/overlay2
5.1G /var/lib/docker
3.3G /var/log/journal
3.3G /var/log
2.4G /var/log/journal/dd03133706587e6d2ee267a56561951d
1.5G /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff
1.5G /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c
1.4G /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff/usr
1.3G /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22/diff
1.3G /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22
1.2G /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22/diff/usr
833M /var/log/journal/7e59cee94454465bcc02272f66f6c80e
762M /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff/usr/lib
748M /var/lib/docker/overlay2/2bc258eac131501c668ce3d972d374437086dd5206259949dbb1332df119fd22/diff/usr/lib
618M /var/lib/snapd
569M /var/lib/docker/overlay2/9002c508d1f0079ecd77e58ce1da246ddf97bbcb36a424a85a9054260344dc54/diff
569M /var/lib/docker/overlay2/9002c508d1f0079ecd77e58ce1da246ddf97bbcb36a424a85a9054260344dc54
402M /var/lib/docker/overlay2/4e642950a0ef1adac39237a3b4adf1758a1c360961dce9cb050eddc0b9af8d3c/diff/usr/share
VM Websites / APIs
http://mateer.hopto.org/ is the name I got from https://www.noip.com/ which my router talks to to provide dynamic DNS.
I’ve got an Nginx reverse proxy in front of all websites which forwards traffic to a specific internal IP via http. It also handles certificates via certbot
- hoverflylagoons.co.uk - Apache / PHP . MySQL - 2GB
- theskatefarmcic.co.uk - Apache / PHP / MySQL - 2GB
- API that runs a hatespeech AI called from https://osr4rightstools.org/
- website that runs firemap called from
Maybe move over
- https://osr4rightstools.org/ - .NET 1GB RAM. todo - certbot for certs
- https://auto-archiver.com/ - .NET 1GB RAM todo certbot for certs
- https://arlawesi.org.uk/ - .NET 1GB RAM todo certbot for certs
Test sites
- https://hmsoftware.org/ - test site #1. Have used for Rails
- https://hmsoftware.uk/
Sites hosted elsewhere
- https://hmsoftware.co.uk - static site hosted on netlify.com main company website which is being developed soon into a portfolio site
- homebrewbeer.netlify.app - Hugo based static site on Netlify
- https://davemateer.com/ - Jekyll blog on pages.github.com/
VM Workers
- General archiver eg auto-archiver.com/ - Python which runs fine in 4GB RAM
- Tweet service that polls an Azure database every minute to see if any Tweets to send out. Python. 1GB RAM
- YouTube screenshotter archiver - I’m using playwright.dev via a Proxy brightdata.com/ and using headful Firefox
Networking
Plusnet Two Router / DHCP is on 192.168.1.254
Proxmox is on a static IP allocation of 192.168.1.139 - port 8006 gives us the Proxmox web UI
pfSense VM has a DHCP (!fix this) alloction of 192.168.1.144 - port 8080 gives us the pfSense web UI
All port 80 and 443 traffic is routed to pfSense from the router.
Power Costs
Big server may draw 300W at idle. eg dual xeon
My UK provider good energy charges me 24.70 pence per kWh on 3rd Sept 2024. And 59.18 pence per day
So the xeon server would cost 0.3 kW x 24 hours = 7.2 kWh/day = £1.78 / day £53 / month
My 13th Gen Intel draws about 40W at idle running wifi, disks, etc..
RGB Machine
40W at rest it draws = 0.04 kW * 24 hours = 0.96 kWh/day = £0.237 per day = £7 per month
Storage
Drive Writes Per Day (DWPD) - how many times the entire capacity can be written per day over warrenty period. eg 1TB SSD DWPD rating of 1 - can write 1TB every day for 3 to 5 years.
SATA DOM (Disk on Module) - often used for boot drives. SATA SSD - 2.5” nvme Mechanical - for boot not recommended.. as SSD is cheaper, lower power and more reliable.
For boot drives approx 10 TB/year is a good number.. so shouldn’t wear out the SSD at all.
So after my drive failure(s) of consumer ssd’s, I want to make my server more resilient.
- m.2 2 * 1TB just for extra relaibility (check DWPD). He uses 2 x 256 AirDisk SSD
- SSD
Mirror the m.2 drives for the OS
## Logs
Look for problems eg
Sep 02 06:54:11 pve smartd[718]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 68 to 69
Sep 02 06:54:12 pve smartd[718]: Device: /dev/sdb [SAT], 8 Currently unreadable (pending) sectors
Sep 02 06:54:12 pve smartd[718]: Device: /dev/sdb [SAT], 8 Offline uncorrectable sectors
My sdb is a mechanical disk I’m not using.
## Disks
Have got 26% wearout on this Samsung SSD 840 Pro. This is my backup server which I’m rebuilding soon.
This is nothing to worry about and drive shoudl still have plenty of life left.
Servers Setup
BIOS settings for my silver server (which had SSD and potential hardware failture). Notice auto power on.
Router Setup
- http://192.168.1.254/ Router Admin
Router thinks that .139 is not connected (but it is as the PM Admin interface answers). 1 port having 2 IP addresses is confusing the GUI for the router.
There are 2 .139 addresses as the MAC address changed when I got the backup PM server.
(Static) means the local IP address is assigned by the device. It was initially assigned DHCP when Proxmox was installing, now it is fixed. Interestinly I can’t change the router to ‘always use this IP address’ as it isn’t assigned by DHCP anymore. I hope DHCP wont assign .88 anymore.
I did deleted some entries which were not used anymore, and the router did some strange things like being extremely slow (and management interface loading slowly) and needed a boot.
All 80 and 443 traffic is forwarded to pfsense on 192.168.1.144 (whc)
- https://192.168.1.139:8006/ PM Admin interface
- https://192.168.1.144:8080/ Pfsense Admin interface
This router can assign 64 IP addresses on DHCP
Lessons Learned
- SSD’s fail.
- Hardware fails.
- Standard routers are fickle (DHCP leases sometimes get confused / UI gets confused)
- Electricity cuts do happen (PM did recover fine)
- I want a backup PM server ready to go (and a testing server as well)
- You always need more disk space than you think on a VM
- Home connection with dynamic DNS has been great. Was 48Mbps down and 12MMbps up. Now on fibre at 500/110.
- Backups are super important
- Template VM’s useful for spinning up test VM’s quickly
- pfSense is hard
- System has been very reliable
- Been invaluable for archiving as many services now (eg YouTube) will block cloud based IP’s