Hey Boris,
would it be feasible to implement optimizations for GPUs supporting FP16 double throughput (when half precision is enough)?
Even on GPUs where double throughput is disabled, FP16 saves power. So it could be a worthwhile optimization for everyone.
Thanks in advance!
TES Skyrim SE 0.394
Forum rules
new topics are not allowed in this subsection, only replies.
new topics are not allowed in this subsection, only replies.
- Author
- Message
-
Offline
- *blah-blah-blah maniac*
- Posts: 17559
- Joined: 27 Dec 2011, 08:53
- Location: Rather not to say
Re: TES Skyrim SE 0.394
Never heard that 16 bit vars ever worked for PC and never saw any performance increase even when sure they compiled correctly in disassembly. They were abandoned 10-15 years ago if i remember and maybe still used in things like Cuda, not sure.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
-
Offline
- Posts: 28
- Joined: 28 Oct 2013, 02:55
Re: TES Skyrim SE 0.394
I know more about GCN then nVidia architecture because a friend of mine wrote Vulkan libraries optimized for it. He said FP16 power savings where implemented in very early GCN, while double throughput was added with Vega (called rapid packed math). nVidia cards can do it too, but the double throughput is locked behind a price premium (used to be titan and quadro) as it is mainly used in deep learning afaik. It may also speed up ray tracing depending on the method, so maybe things have changed with 2000 series. They should at least be able to save lots of power when using FP16.
[edit]After a quick read up, tensor cores can be utilized for FP16 performance. Dunno anything about dx 11 implementation though.
[edit]After a quick read up, tensor cores can be utilized for FP16 performance. Dunno anything about dx 11 implementation though.
-
Offline
- *blah-blah-blah maniac*
- Posts: 17559
- Joined: 27 Dec 2011, 08:53
- Location: Rather not to say
Re: TES Skyrim SE 0.394
Okay, i'll measure performance, but i seriously doubt in directx this feature works. Anyway, i have no idea where to apply it inside the mod, nothing arithmetic hungry that much.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
-
Offline
- *blah-blah-blah maniac*
- Posts: 17559
- Joined: 27 Dec 2011, 08:53
- Location: Rather not to say
Re: TES Skyrim SE 0.394
Tested, no performance difference between half3 and float3 in huge cycle with +,*,pow operations (around 30000 simple arithmetics and 5000 pow).
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
-
Offline
- Posts: 28
- Joined: 28 Oct 2013, 02:55
Re: TES Skyrim SE 0.394
Hmm... your card may not fully support the feature. I asked a friend of mine how this works, but he only knows Vulkan. In Vulkan it is as easy as using fp16 and setting a flag. Dunno how dx11 handles it. According to an Anandtech article dx11 should support it with shader model 5.x.
Using it with pow probably requires custom libraries, it is usually only supported for + and *.
Using it with pow probably requires custom libraries, it is usually only supported for + and *.
-
Offline
- *blah-blah-blah maniac*
- Posts: 17559
- Joined: 27 Dec 2011, 08:53
- Location: Rather not to say
Re: TES Skyrim SE 0.394
It's not work for simple arithmetic as i tested on those which compile to add, mad, mul. I think it's not a big deal for anybody to do this test by writing cycle in enbeffect.fx, but i can't optimize what do not see by result. Also Vulkan, Cuda, DX12 are totally different things. I never in my life seen this feature working on amd or nvidia cards, just know that some very outdated like gf5200-5950 and same old ati have this in dx9.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
-
Offline
- Posts: 74
- Joined: 05 May 2017, 17:31
Re: TES Skyrim SE 0.394
Could you have been using using DynDoLOD's 3D tree LODs at the time? I think those should not only work with distant shadows (DynDoLOD creates them as regular static objects) but also have better lighting applied. I don't think the engine can cast dynamic lighting properly to billboards, I doubt they are drawn to the depth buffer.xirangeix wrote:I'm not sure but it doesn't seem the distant shadow is working on the tree lods. https://imgur.com/a/fXhNpNm. On and off either arent on at all and no change. I tried saving then applying changes both times. Not sure why I can't seem to get it to work. Hasn't seemed to work for awhile; but it used to work when you originally released it.
-
Offline
- *blah-blah-blah maniac*
- Posts: 838
- Joined: 10 Dec 2017, 17:10
Re: TES Skyrim SE 0.394
Boris, I'm not sure if this was cleared, but what about enabling underwater caustics by simply copying the caustics you see from the surface ?
Logically caustics are data and data can be copied and applied elsewhere ?
Three things should be needed: data that already exists (caustics), trigger to copy this data (getting wet or needing oxygen or something else that activates when getting under water), and coordinates to apply the data (bottom of a water body).
Anyway, it's just my logic or am I completely wrong ?
Logically caustics are data and data can be copied and applied elsewhere ?
Three things should be needed: data that already exists (caustics), trigger to copy this data (getting wet or needing oxygen or something else that activates when getting under water), and coordinates to apply the data (bottom of a water body).
Anyway, it's just my logic or am I completely wrong ?
-
Offline
- *blah-blah-blah maniac*
- Posts: 17559
- Joined: 27 Dec 2011, 08:53
- Location: Rather not to say
Re: TES Skyrim SE 0.394
There is no data when you are underwater and do not see water itself, you simply don't know what water looks like at many camera angles, so caustics turn off in that case. In old skyrim i tried to save this information, but something went wrong and there is no underwater caustics as you can see.
EDIT: remember that now, problem with cached caustics was with position of water above camera, many waters are drawed for entire world and they different height levels, so any of them can be catched by camera and have invalid data for applying caustics, i didnt find way to get just exactly water above camera (but this still not work right if water pool have different levels (like waterfall) and caustics will be applied to whole world sometime.
EDIT: remember that now, problem with cached caustics was with position of water above camera, many waters are drawed for entire world and they different height levels, so any of them can be catched by camera and have invalid data for applying caustics, i didnt find way to get just exactly water above camera (but this still not work right if water pool have different levels (like waterfall) and caustics will be applied to whole world sometime.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7