NokiMo
SteamID
SteamID

patreon


Updates & Reliability and performance

Hi All,

Not sure where to start this post, For those who follow on discord you will see what an absolute nightmare the last few weeks and months have been for SteamID and my extremely bad luck, Ill start with the problems then ill jot down how this has been overcome – remember none of this is possible without YOU – the support for SteamID is what keeps this running, and new hardware.

Issues:

It started with the primary database server having an nvme drive fail, This was swapped out with a brand new unit , as the raid was being rebuilt the other drive failed. This resulted in 2 brands new nvme drives and the primary database server being recovered and built back from scratch. SteamID was back with limited functionality within 24 hours. Performance seemed worse and the server seemed to be struck with random database crashes.

While the above was going on the text searching server & servers also seemed to want to crash out – I suspect this is due to the new summary searching feature and growing amounts of data but nevertheless a pain in the back side.

On top of this, while carrying out maintenance I had a server knock out 2 drives in the same array in a raid 10 which I temporarily moved a few VMs to loosing my backups, Ones that I would be annoyed if I lost but its not service stopping – I can live without them. The next day my wifes computer had a drive fail that was in a raid (talk about bad luck eh!) – unrelated to SteamID

A 900GB sas drive failed within days of the above too but that was e easily replaced, I keep spares on site this didn’t affect anything. I also run a hot spare in the main server.

Somewhere along the line the avatar history data tables were showing incorrect avatars, I was still unable to pin point the cause, I have a good suspicion but this was also one table that kept needing to be recovered when the database crashed so I cant %100 confirm.

Whats changed:

Fast forwards to this last week, I have upgraded my connection. Added battery backup to the router drilled and run new Ethernet cables into the now Propper dedicated firewall which is well over spec’d for, The old firewall is ready to swap back into place and is more than capable should this fail.

We have a new server with another lot of 16 drives running, I’ve re-built one of my Frankenstein servers which has 8 disks. I bought and bult a new backup server with 8 x 3TB (although this is not enough) , The old backup server is there but im not quite sure what to do, I know it was the disks and not the server so even though they are working now, I will throw them out, cant risk it.

I managed to get another dell switch with a 10GB card fitted which I moved into my main switch and replicated the config over to the spare switch as that was a single point of failure so i can get back online quickly.

The battery backup (rack one) for the servers has new batteries fitted which is real nice to have as ive been running the rack like a glass cannon so at least I can power down gracefully in the event of a power loss.

The servers in the rack have had various hardware changes namely CPU / RAM

Today the main rented database server was moved over to a newer server because the current one was not stable, It just was not right – same OS , same config copied over and the new server is running sweet. It has a better CPU and more ram than before which is a huge bonus. I did see disk writes as high as 4.4GB/s which is much better than the old server.
Now we are running on the new database server I need to see how things go but I’m planning now to restore old copies of the avatar history data, it will need to be re-formatted as I changed the way it all works drastically (After the issue was found) so consume less storage and overall be “better” so that feature will be back at some point.

SteamArchive has another 2 servers in the farm, I’ve seen an increase in requests to remove screenshots from old “clean” recovered accounts and when looking into it I end up finding webs of trading accounts so I feel this is and important feature and I’ve had real good feedback about SteamArchive recently. (Rare but its so nice when someone takes time to complement it!)

Going forwards:

I will monitor the new SQL box see how we go, I don’t want to change too much at t he moment so I can monitor any issues.

Avatar History will make a comeback at some point, I need to restore the data and sort it out, It won’t be quick due to the scripts required to re-jig the data and match it up with what is there now.

I need to re-cable the entire rack here as it’s a total mess. I will migrate the text searching server over to enterprise NVME drives as its currently on consumer grade 1TB SSDS, Wont take long but its running spot on at the moment with 128Gb ram but noticed it has been creeping up already.

Hardware wise, I feel like I need another backup server and another 10GB card and a whole bunch of drives although there may not be so many changes with hardware in the coming months as I have cleaned my bank accounts out in October / November… although I got some absolute bargains which if I resold I would be in good profit but I use the gear.

I may have missed a lot from here but without the patron support none of this would have been possible, I would have just jacked it all in but just the thought that there are people here providing support which will help back fill the cost of everything has given me the motivation to sort this.

Once everything is stable, avatar history is back and I’ve had a few days rest I want to really start developing SteamID more. A good friend of mine looked into machine learning with a a cut of some SteamID data and the results were pretty interesting , Maybe more on that to come although it sounds like with the amount of data I have I’d need some RTX cards to help with that so wont be any time soon! 

I did explore moving everything into a proper cloud hosting provider but with the resource I have and storage requirements both fast storage and large archive storage its just not feasible. That would just end SteamID so I learn and adapt as best I can to be cost effective while trying to keep things running quickly with the amount of traffic SteamID receives.

As always, if you have any ideas for SteamID Please let me know, and if anyone happens to work with a hardware supplier OR IT recycler with any dell R630 / R640 or R730 or R740s please let me know.

Updates & Reliability and performance

Comments

I also forgot to mention server load of the database server - 12 cores were getting pinned where as of now i am seeing around %30 - %40 overall usage and that's with more servers powered on using the database, All seems positive so far

SteamID


Related Creators