Cas: mornin
Dan: I'm stuck in a sleep/eat/schoolwork cycle...
Cas: feck 😦
we going to have something to show on Friday?
Dan: I should have some time to work on the SSAO tonight.
I'm not sure what you'd call showable
Most of the stuff we have are technical improvements.
Cas: that's ok so long as we can talk about it
besides if you get the SSAO working with the 2-stage stuff tonight maybe that's somethgin to show. Also MSAA
Dan: Performance improvements I should be able to get the correct normals working reduced memory usage MSAA.
Cas: doesnt matter how small progress is so long as it's progress 😃
Dan: I think I can cut the memory usage quite a lot by not needing an alpha channel.
Yeah I can summarize it.
Cas: yep
Dan: I managed to get some cool work done.
Cas: oh-ho how cool is cool?
Dan: Well not insanely uber cool
but pretty cool.
Slope-based depth-aware blur!
Cas: 😎
Dan: Normally you do a depth-aware bilateral blur on the SSAO
basically you make sure that the neighboring pixels have the same depth as the one you're blurring with
to prevent bleeding over edges.
Cas: nice
Dan: The problem is that I had two major problems:
- We need a really sensitive depth test to make sure we don't blur over voxel edges
- Having a sensitive depth test means that as soon as we're not looking at something directly the fact that the object is sloped means that the depth test will fail even for neighboring pixels belonging to the same triangle.
To solve this I store the X and Y slope of depth for each pixel together with the linear depth value.
So when blurring I modify the depth I compare with based on this slope.
Basically this means that for a perfectly flat surface it will blur 100% no matter the view angle because it compensate for it.
I could set the depth error threshold to 0.1% and still get accurate results.
had to do quite a bit of crazy stuff to fit all the data though.
ssao is 8 bits depth is 24 bits and slope is 2 half floats = 64 bits.
The addition of the slope translates to just one more add per sample in the blur shader too so minor cost there.
Cas: lovely
this means nice easy 2-level SSAO then?
Dan: Well it solves some problems for it.
For the large scale SSAO we REALLY want a blur
and this blur is precise enough that it doesn't ruin the small radius SSAO.
Cas: sounds like... all of the problems?
Dan: It opens it up a bit yeah.
I still need to add the normal support
Cas: that bits easy surely
Dan: or rather now I need to output the slope properly
yeah but possibly expensive...
Cas: thought normals would be cheap as chips to write?
Dan: I managed to win back quite a lot of performance by reducing the main MSAA buffer from GL_RGBA16F to GL_R11F_G11F_B10F halving memory usage there.
The slopes need half precision so another GL_RG16F multisampled buffer brings us back to the same amount of memory usage again.
My biggest worry is that the depth aware MSAA upscale of SSAO isn't good enough and we need to take the slope into account
which will slow down the merge even more...
Hopefully it all works out well.
The reconstructed normal looks like shit in certain cases where there's just not enough info to reconstruct it.
Cas: it can't fail 😄
Dan: *nervous laughter*
Cas: heh
hopefully soon have all this fancy stuff behind us and on to particles or something soon then eh
Dan: Yeah hopefully.
Current timings:
Frame 3650 : 4.573ms (100.0%) Camera 1 : 4.535ms (99.1%) Shadow map rendering : 0.397ms (8.6%) Shadow map 1 : 0.395ms (8.6%) Clear main buffer : 0.062ms (1.3%) Terrain rendering : 0.923ms (20.1%) Skybox : 0.001ms (0.0%) Post processing : 3.149ms (68.8%) SSAO rendering : 1.145ms (25.0%) Linearize depth : 0.226ms (4.9%) Generate depth mipmaps : 0.096ms (2.1%) Compute SSAO : 0.462ms (10.1%) Blur : 0.358ms (7.8%) Merge : 0.906ms (19.8%) Bloom : 0.652ms (14.2%) Tone mapping : 0.442ms (9.6%)
8xMSAA 1440p.
Linearize depth is technically not part of SSAO we'll need it for particle blending anyway.
Cas: way over target spec then is still doing good
1080p 4xMSAA is the high end i'm targeting
Dan: Hopefully I can successfully revert to the old branch to compare performance and see if there's anything I've missed.
1080p 4xMSAA: Frame 3640 : 2.52ms (100.0%) Camera 1 : 2.475ms (98.2%) Shadow map rendering : 0.391ms (15.5%) Shadow map 1 : 0.389ms (15.4%) Clear main buffer : 0.019ms (0.7%) Terrain rendering : 0.632ms (25.1%) Skybox : 0.001ms (0.0%) Post processing : 1.43ms (56.7%) SSAO rendering : 0.695ms (27.5%) Linearize depth : 0.102ms (4.0%) Generate depth mipmaps : 0.07ms (2.8%) Compute SSAO : 0.294ms (11.7%) Blur : 0.224ms (8.8%) Merge : 0.289ms (11.4%) Bloom : 0.295ms (11.7%) Tone mapping : 0.147ms (5.8%)
Cas: bearing in mind your card is pretty much top of the range
Dan: I mean there's the GTX 1080 TI but yeah lol
2.5ms on my card for the highest settings is pretty nice though.
Cas: 960 is sorta the smart gamer's choice
Dan: Hey!
lol
Cas: sensible money
Dan: HEY!!! >=((((
Cas: well if you've got money to burn ... 😄
Dan: HEYYYYYYYYYYYYYYYYYYY!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
lol
Trying out the lowest postprocessing/MSAA settings:
660FPS 75% GPU load Frame 9505 : 1.07ms (100.0%) Camera 1 : 1.033ms (96.5%) Shadow map rendering : 0.368ms (34.4%) Shadow map 1 : 0.367ms (34.3%) Clear main buffer : 0.021ms (2.0%) Terrain rendering : 0.388ms (36.2%) Skybox : 0.0ms (0.0%) Post processing : 0.248ms (23.2%) Merge : 0.031ms (2.9%) Bloom : 0.137ms (12.8%) Tone mapping : 0.077ms (7.2%)
1080p
Cas: no MSAA?
Dan: Nope.
Could maybe cut down that time by a bit more by reducing shadow map resolution
or disabling shadow maps complete.
Cas: plenty of options on the table.
Dan: Then we're talking ~0.7ms on my card.
Cas: bear in mind we've got to add 10000 particles and a few thousand models too 😃
well few hundred realistically
Dan: 10k particles shouldn't be a major issue. Overdraw is the bigger issue to worry about here.
1k models shouldn't be a major issue either we're talking vertex limitations at that point.
Oh those above minimum numbers were still with bloom so that's another 0.15ms or so gone.
Prolly 0.5ms at absolute lowest.
Even lower if you drop the resolution even more.
Cas: pretty remarkable
so normals tonight...?
Dan: We'll see I gotta take care of some more school crap
Cas: doh
I'll do my little tasks anyway
Dan: but one of my 3 projects ended up being pushed back until saturday and it's pretty much 75% done already
so I've got a bit more time than I was expecting which is nice.
Note that the ultimate performance will depend on the vertex count a lot.
You may want to have simplified models for the bots
not as LOD but rather as complete low-end alternatives.
Get what I mean?
If we determine that they are the major performance hog it may be worth it even though it'd require more resources.
Cas: probably not much need - they're going to be rather simple really
Dan: Could you maybe make an example robot so we can start with the robots at some point?
Cas: your typical bot has to fit into 16x16x32 ish voxels
Dan: I'd like to see how many tris it ends up being and so.
Cas: Ill make a simple one
maybe see if @Chaz will draw one for us
Dan: Cool.
Hehe
Just tried out the game at 1440p 8x SUPERsampling 16k x 16k shadow maps and SSAO
80+ FPS
Cas: bah
Dan: Crazy settings. xD
---
Dan:
Hey sorry been in school all day as we had a project deadline and presentation
and another one coming up tomorrow...
It's super stressful and I'm not sure what I can finish until tomorrow
Cas: No worries. Don't panic.
Dan: When exactly is the deadline for the blog post tomorrow?
I'm busy until 19:00 with projects AT LEAST
Cas: No particular time. Could leave it till Sunday
Dan: and probably going out for dinner afterwards to not die x___x
Cas: What we do have to do though is dial back the sprint contents to be more realistic taking into account school etc
I did a crap little robot btw
G101.vox
Not textured
And only one vox
Dan: Textured meaning...?
Cas: Well coloured
It's just grey
Dan: There would be small discrepancies in performance in that case as it doesn't need to sample a texture
but it should be minimal.
Cas: Well I will colour it a bit then
Really I need to separate the legs into a separate model
That'll be more representative
Dan: You may want to use some kind of special marker palette index to act as a face hider
So the top of the robot's leg section doesn't get faces that are never visible.
Get what I meaN?
Cas: Well... Yes. But hard with voxels
More trouble than worth.....
Also we blow the tops off sometimes and just leave smoking boots
Or have wreckages lying around
Dan: If we can save a decent chunk of triangles by doing it it could be the difference between the game running well and not .
Cas: For the small number of models on screen it's not worth it
Dan: Anyway one test model is good enough for future purposes.
Cas: Yup
I'll separate the legs
Dan: Gonna fire it up and check out the model real quick
It's not committed?
Cas: It's in the trunk branch
Dan: gah
Cas: Silly me
Dan: ... can you send it to me? x___x
I gotta get back to sadder and less fun stuff soon..
Cas: Yeah in a bit you go do school stuff first fun stuff later
Dan: OK see you sunday evening lol
Cas: Heh
Dan: I'm waiting for a response from someone so I got a moment to spare
Cas: Ok sec
https://cdn.discordapp.com/attachments/380372019251773444/388420012559302657/g101.vox
chop that in 2
as in separate the legs into a separate model
that's how 90% of the robots will be in the real game
some will be bigger
some made of more bits some less
Dan: It's much simpler than I expected tbh
Like smaller
Cas: told you 16x16 base for rank-and-file robots. Small.
if we were using 32x32 voxel tiles they'd be bigger
but the map would be 1/4 the area
Dan: If you export that as .obj what's the triangle count?
Cas: having to download and install Blender just to find out...
Dan: oh blender can do it?
Cas: probably
Dan: testing myself...
240 tris
but the triangulization they're doing is better than mine
Cas: even with 2000 of them on screen then thats only 480k
Dan: It'd probably be 300-350 for mine.
that's almost a million tris
well 600-700k
Cas: aye but thats a) the biggest battle the system allows and b) more robots than can fit on the screen even at max zoom out
Dan: Remember we got shadow maps to render too.
Cas: realistically we'll be dealing with a few hundred robots tops
and in most cases only a couple of hundred
however there will also be a few hunded small models scattered about....
trees rocks etc
Dan: we're at 2250k tris righw now with 3 lights.
Cas: 3 lights?
Dan: 2k robots x (1 cam + 2 shadow maps) = 2000*700*3 = 4 200 000
2000 x 700 x 3
discord formatted that due to *
Cas: `use backticks`
Dan: err?
Cas: `200*300*400`
Dan: 'err
''' err
Cas: single backtick `
then end with another `
for oneline snippets
Dan: yah don't got a key for that so it's annoying as fuck to write.
Cas: triple for multiline
oh yeah
i remember
doh
Dan: Need like 5 presses to write it
one
ANYWAY
Cas: ANYWAY
Dan: it's potentially a lot of tris is what I'm saying
Cas: don't worry
nah
you worry about the pathologically worst case
there will not be more than a couple hundred robots onscreen
plus another er say 100 smallish models
another 2.5m tris tops
Dan: 2.5 mill???
not that much of an improvement over 4.2 mill...
Cas: 500 models 500 tris each
Dan: = 250k?
Cas: yeah sorry 😃
x 3
Dan: *3 = 750k at worse?
Cas: for camera & lights
yeah
so that doesn't look too bad
Dan: That gives us a 750k high end 250k low end.
I think we can aim for and achieve <1m tris without shadows.
Cas: but but bu... we want shadows
Dan: I meant FOR mega low-end
Cas: oh right
Dan: The worst case scenario doesn't matter much if you can reduce the graphics settings
Cas: yeah no worries
Dan: It's good that the settings max out a modern GPU
means that you can throw more juice at it and get something out of it
Cas: bear in mind by the time we get to release this damned game everything will be 1.5x faster...
Dan: A very small fraction of people will have the absolute newest cards though
the biggest gain is the old hardware being phased out
Cas: yeah the AVERAGE computer will be 1.5x faster
Dan: hmmm...
Cas: we're already seing dualcores disappearing
Dan: Well we'll see
Cas: http://store.steampowered.com/hwsurvey
Dan: I gotta get back to school stuff
Cas: look at the ends of the graphs
kk
Dan: It sounds like it can work out decently.
Cas: it will look amazing
Dan: I don't think we can get any better triangle generation than what I've already managed to get btw
(The .obj is better technically)
well at least not without wasting a lot of memory.
THe texturing is based on quads
Cas: yeah
Dan: Hmm.
It could be doable I guess...
Is there any way to turn a model into a single voxel type in MagicaVoxel?
AKA turn a voxel model into a 1-bit model where each voxel is either on or off?
Cas: erm
no but its a trivial operation
Dan: You can use the Paint tool to do it it seems.
If we do that we can use MagicaVoxel's obj exporting to get a seemingly perfect mesh.
Then generate textures for it using the colored .vox model + the .obj model as input.
Cas: if you want to look at how it does it just go look at the source and port it to java
Dan: It might waste a tiny bit of memory but it's a drop in the ocean compared to the terrain.
If we can save 25% tris that way that'd be huge.
Going down to 240 tris instead of 300+ would be great.
Cas: we can preprocess vox models using some optimiser to our hearts content if we want
Dan: The vox models?
Oh you mea
yeah I get it
Yeah we got infinite time here.
Did a quick search couldn't find magica voxel source code
is it really open source+
?
Cas: it used to be it seems now its vanished hmm
still the algorithms i think are well understood
I bet your meshifier isn't far off optimal already
despite what you think
Dan: It is though
for some cases
A voxel meshifier may be good enough to fix those problems though
err
a mesh optimizer*
---
Dan: I woke with some a good idea today.
Cas: you home now?
Dan: The problem with MSAA is that postprocessing needs to be supersampled.
Nope on my way to school
Cas: doh
Dan: Won't be home until late evening
Anyway
Postprocessing per sample is expensive
Extra memory usage for intermediate buffers is very bad too.
Remember my initial idea?
Cas: er no
Dan: Do a non-MSAA prepass at the cost of a shadow map read SSAO/transparency while drawing the terrain!
A non-MSAA prepass would cost around 0.3-0.4ms and ELIMINATE the merge pass.
Cas: well do that then 😃
Dan: The terrain is rasterizer limited so the cost there is almost 0.
Cas: I was trying to get streaming working properly last night using the old sort mechanism but after hours I just couldnt get the fucking thing to work. It kept leaving holes in the terrain and I don't know why.
---
Cas: Hola
Been out all weekend
You been up to much?
Dan: Haven't had time last project deadline tonight...
Sorry just got finished with the last project...
Been one of the most hectic weeks of my life. x___x
Pretty sure got finished is an incorrect literal swedish translation. xd
Cas: Heh
Well I'm off to bed
Maybe we can do a patreon post tomorrow evening?
Dan: What do you want finished for it?
If I know what I need to do exactly I can try to plan for that.
Cas: I dunno... Just the fancier ssao and msaa I suppose
That's all we were planning to show
But if it takes a bit more effort there's no point in rusbing
Rushing
Dan: So in other words just: >two level SSAO >adding accurate normals output from the rendering pass >MSAA support for all of this <____<
Cas: Er yeah 😬
Don't rush
Can't be helped
Dan: Do you know of a game called Factorio?
Cas: Yes
Dan: You seen their Friday Facts posts?
Cas: Nope
Dan: They just post work in progress stuff optimization results and comparison images and stuff
I could probably at least get in accurate normals tomorrow. We could do a comparison of the two if you think it'd be interesting. >___>
I dunno how much the readers would appreciate it though...
Cas: I think it would be interesting
---
Cas: hiya
you home?
Dan: I am finishing up some school work...
Will need a bit more time...
Do you need anything urgently?
Cas: not urgently no just wondered if there was anything we could do this evening together on it
might take another swing at the streaming chunks problem
Dan: Sure I should have time in around 60-90 min and from then on.
Cas: ooh nice
Dan: Almost done here... q.q
Cas: yay
Dan: Alright I'm good to go.
Cas: so... normals writing then?
Dan: Well slope writing but yeah.
I'll probably do it in a prepass.
argh the hacks keep piling up
gonna take a step back and clean it up a bit
Cas: ohoh
yes
don't rush it with hacks
Dan: Gonna do no-MSAA first.
Cas: best to get it right
Dan: It's the cleanest one no need for a prepass.
Hmm having trouble with MRT again...
Probably my fault
How does the fragment output annotations work with inheritance?
Cas: it should inherit
Dan: Found another problem that was the cause
Slope is being output correctly finally.
Cas: nice
Dan: zero measurable cost
Cas: annotations working properly then?
Dan: This is without MSAA.
Checking
I moved them to the last class so there was no inheritance
checking
Yes seems to be working.
Cas: I suspect I didn't account for inheritance but it should be a simple fix
Dan: It did work so no problem.
Looks like SSAO normals are fixed for no MSAA.
Cas: cool
I suppose MSAA is vastly more fiddly?
Dan: Yeah doing a prepass messed up the frustum culling.
I don't want to do it twice
so I'll need to store data for it to work...
Need to restructure the code a bit.
Cas: how does it mess the culling?
Dan: Well render() currently does both the culling and rendering.
I want to do culling once and render twice.
Cas: yeah see... update() is supposed to do stuff like culling
and then render() should do the rendering based on what was computed in update()
or at least thats how I've generally structured it everywhere else
Dan: which won't work with multiple terrains/cameras/shadow maps.
I'll ultimately want to do all culling (multithreaded) and store the result for each job
then pick the right result when doing the rendering later.
Cas: yup
Dan: I know how to do it it's just extra work which is why I'm whining lol
Cas: hehe
Dan: hmm need to make the blur check the slope too... =___=
getting ugly lines.

At the edge of each staircase step you get some bleeding.
You see it?
Cas: eww
Dan: (the checkerboard pattern is intentional it's not visible.
the problem is the bleeding there.
IT will be 100x worse with actual MSAA as well since it'll stand out more.
Right now everything just looks like an aliased mess so......
You see the bleeding at least?
Right?
Cas: yes
can even see it when pic is shrunk down
Dan: Fixed it no more bleeding like that.
No significant cost increase to the blur.
Will have to do the same thing when upscaling MSAA...
hmm but I have no slope info then fuck.
not per sample
Damn the (better) slope aware blur looks awesome with high radius SSAO.


open original and switch between the two.
Cas: its rather hard to see a difference except the 2nd one is like a shade darker...
Dan: Look at the in shadow mountain walls.
Cas: ah yes!
Dan:


Cas: fantastic
Dan: Do you still want the small SSAO as well?
Cas: well....
hard to say till I see it for reals at home
Dan: I'll see if I can add it.
Cas: that last screenie seems to be the small radius SSAO though doesnt it?
Dan: No it's not.
It's still the same radius as the previous one.
Cas: oh
Dan: You do get some shaded edges of the bumps/holes in the ground as those surfaces are sloped.
The SSAO filters sample in a half-sphere going out from the surface.
For a vertical surface half the sphere is underground.
Hence the holes/walls get a lot of occlusion and therefore pop out.
That's kinda why I asked about it.
But sec almost got it working with two radii
It's unoptimized but...
When are you gonna be home?
Could use some discussion on this
Cas: leaving work in 20 mins so back home in an hour or so from now... but I'm making dinner tonight so I'll be a bit busy for a few hours
Dan: Able to test stuff right now?
Cas: you in the test branch still?
Dan: err yeah
Cas: ill check it out here at work ...
Dan: Pushed Voxoid.
It shows small scale SSAO on left side of screen large scale on right side and both combined in the middle right now.
So you can see which algorithm does what.
Cas: woo neat
sec
Dan:

preview
Cas: yeah combined does look good
Dan: Not much different to large scale only though
the only real difference at all is the small holes in the ground
which we probably won't have in the end.
Cas: well there might be a few
here and there
Dan: We pretty much need twice as many samples for it to look good...
Cas: what level MSAA is that?
Dan: none still not implemented with MSAA
Cas: uh
looks quite good already then which is promising
Dan: but it's rendered at 1440p so if you don't show it at full res it's pretty much supersampling
This with 32 samples 16 on each of the two levels
Hmm wait.
There's a bug in how I divide the samples...
sec
Cas: well im gonna get ready and go home now
ill be back online a bit later
Dan: Alright
fuuuuuck.
School just pulled another quick one on me and it seems like there's a risk I can't start on my master's thesis in January
I'm gonna need to focus more on my school work during christmas...
Cas: gahhh
i'm going to go and cook dinner
Dan: Pushed a small update fixed some SSAO issues.
Gotten a chance to test it yet?
Cas: Sec
Wrangling
Dan: WHY YOU LITTLE----
Cas: _syncs_
shimmers a lot without msaa!
Dan: sure does
Cas: hm or is it the MSAA
zoomed in it still shimmers as the mouse is moved
needs more samples?
Dan: Mouse is moved?
Cas: yeah the mouse light - turn it back on and see
Dan: That's the mouse light's shadows that shimmer.
Needs higher res shadow map and a bit more filtering.
I turned it off for that reason.
Cas: are you sure?
Dan: Hold G to turn off the SSAO.
Cas: coz its shimmering on a flat surface that's not got shadow
hm yes still does it with SSAO off
but looks oddly like its not the shadow
Dan: It's probably at an edge.
You're looking at the side of a mountain right?
The shadow map res is fairly low right now.
Cas: ah maybe it is
Dan: 2k x 2k
4k x 4k + filtering should be fine.
I think we can both agree that the large scale SSAO looks really good
but I'm not convinced the small scale SSAO is worth it.
Compute SSAO : 0.716ms (31.4%)
Half of that is the small-scale SSAO so if we only use the big-scale SSAO that'd be 0.35ms.
Entire frame: Frame 73423 : 2.278ms (100.0%) Camera 1 : 2.24ms (98.3%) Shadow map rendering : 0.193ms (8.4%) Shadow map 1 : 0.192ms (8.4%) Clear main buffer : 0.018ms (0.8%) Terrain rendering : 0.428ms (18.7%) Skybox : 0.0ms (0.0%) Post processing : 1.597ms (70.1%) SSAO rendering : 1.281ms (56.2%) Linearize depth : 0.062ms (2.7%) Generate depth mipmaps : 0.106ms (4.6%) Compute SSAO : 0.716ms (31.4%) Blur : 0.392ms (17.2%) Merge : 0.184ms (8.0%) Tone mapping : 0.129ms (5.6%)
Cas: the one on the left is small scale?
Dan: Left is small scale right is big scale.
(middle is both)
Cas: i have to say the middle one is the best
has the most detail
Dan: I don't think it adds too much but OK
We could add quality setting for it if you want.
Low quality SSAO = large scale only.
or something like that.
Cas: could even crank the effect up a bit - might play with the figures
Dan: It's hard to tweak the two individually now though.
They're all accumulated together.
I can separate them I think.
I'll disable the test mode too and let you tweak it
Cas: k
and i suppose that the MSAA version is next and is gonna be a bugger?
Dan: It'll take a bit of time yes.
I'm sorry shit has really hit the fan here.
Been throwing emails out all evening...
Cas: dont worry
im gonna hit the hay I think anyway... knackered up at 630am every day to go to work
Dan: Oh
=<
You got like 10 more mins or so...? .__.
Cas: sure
Dan: Pushed
Re-enabled bloom added separate strength variables to ssao.
See ssao.frag.
Cas: k
Dan: Too much SSAO makes the whole thing look dirty I guess.
Especially look at the mountains in direct sunlight.
Cas: jacking up the wide radius SSAO strength makes it look nice
still... we can fiddle with that when theres real graphics
---
Cas: dont worry if you think school sucks just imagine being at work
Dan: work only sucks at work
school sucks no matter where you happen to be at the moment
it never leaves you
I saw someone had written Drop out while you still can! on a whiteboard at uni today
Cas: lol