Resolved Question

graphics crash while idle leading to system lock

Well, I'm out of ideas.

For the last few months I have been dealing with an issue where I have random black screen crashes that require I manually reboot my PC. It will most frequently happen when idling but has also been known to happen while browsing the internet (chrome) or within 10 minutes of startup.
Often times the indicator lights will continue to function after teh crash and sound will still play, making me think the issue was graphics drivers, but after a few minutes the sound will buzz with a hard lock as well. Usually, I can find nothing in event viewer to clew me into what is happening, and log readers like WhoCrashed come up empty.

I've gone as far as completely reinstalling windows and it is still an issue.
It's probably something stupid and obvious, but im not seeing it.

http://speccy.piriform.com/results/eElEXR4tQw1NzBsNmaMjapx

Discussion is locked

Edwebb1 has chosen the best answer to their question. View answer
Follow
Reply to: graphics crash while idle leading to system lock
PLEASE NOTE: Do not post advertisements, offensive materials, profanity, or personal attacks. Please remember to be considerate of other members. If you are new to the CNET Forums, please read our CNET Forums FAQ. All submitted content is subject to our Terms of Use.
Reporting: graphics crash while idle leading to system lock
This post has been flagged and will be reviewed by our staff. Thank you for helping us maintain CNET's great community.
Sorry, there was a problem flagging this post. Please try again now or at a later time.
If you believe this post is offensive or violates the CNET Forums' Usage policies, you can report it below (this will not automatically remove the post). Once reported, our moderators will be notified and the post will be reviewed.
Comments

Best Answer

- Collapse -
Updated Speccy reading.

1. CPU core is still under 1V. Was 0.896 V, now 0.952 V Be sure to double check with CPUz.
https://valid.x86.fr/dh61aa shows about 1.199V or 1.2V. Yours is under 1.0V so it may occasionally crash, lockup, reboot. It's a common issue that I look out for.

2. The GPU is the R9 290 and once in a while these give the owner grief.
I can't see the GPU driver version but look at other complaints or use the version you think is best.
I use DDU (see google) to remove the GPU drivers so I can get a clean GPU driver install.

- Collapse -
16 hours in..

Optimistically 16 hours up time, under load or idling, seems to indicate that my situation my be resolved.

Increased voltage offset to +.25V

Moved memory to a preset XMP Profile.

A few things still bug me.

1. What caused a PC that previously had spent years able to run in excess of 30 days without any major stability issues to suddenly develop this issue? Only thing i can think of was some update to UEFI during q1 of this year.


2. Why was i only able to effectively change the voltage via offset? Manually setting the voltage at 1.1/1.2 resulted in DDebug readout giving a code 19 or booting with only .6V going to the processor. I have to believe that may be due to my very limited knowledge about manipulating component voltages in PCs, but part of me wonders if this is a symptom of a larger issue down the line. Regardless, with some 20 years of fiddling with hardware and software, i probably should have known more about about the effects of manipulating component voltage. Thanks for teaching me something new Happy

One last speccy for closure, and in case someone else can learn from this thread.

http://speccy.piriform.com/results/x7EHi2JmyKT2otVi0KWQEfY

- Collapse -
Here's a why.

Capacitors. These age and the power starts to get noisy. So if it ran at sub 1V but new and a nice clean power signal, as time passes the noise level rises and at some point it's too much. Pushing up the voltage can help us eek out more years.

Post was last edited on July 17, 2018 10:47 AM PDT

- Collapse -
Answer
A Speccy reading.

1. Driver Booster. Sorry folk but I am not a fan. I continue to have to remove this and try the driver install the old fashioned way. If you feel this should work (I don't) then I defer you to Driver Booster support.
In short, when I setup a PC, the BIOS is updated, then the OS installed and the drivers in the general order of motherboard, audio, video, LAN, WIFI and such. Then if the machine calls for machine specific apps for control we install those.

2. (Querying... ) (Windows shares) and Team Viewer. Disable and uninstall till the machine is stable.

3. CAN NOT DECODE WHAT VIDEO CARD IS INSTALLED.
Your story is in line with GPU issues but I can't decode what the GPU is yet. What is it?

4. CPU voltage looks low. Could be Speccy, BIOS or something else. Go get CPUz and double check but the BIOS SHALL BE CURRENT to avoid issues like this. Low CPU voltages can lead to very tough to figure out hangs like you describe (unless you know to check this and BIOS.)

5. BIOS is current.

6. The RAM should be tried in XMP mode.

7. COULD THE GPU BE THE R9 290?
https://www.techpowerup.com/vgabios/149273/ati-r9290-4096-131003
If so it's a well known cause of lockups. Fix it's cooling. Pick over https://www.google.com/search?q=ATI+R9+290+tomshardware+heatsink but at the very least replace the compound.
Read https://www.tomshardware.com/reviews/radeon-r9-290x-thermal-paste-efficiency,3678.html

8. Hitachi HDS72101 is starting to show signs that I've found to cause trouble. See Smart Value 01.
I remove drives like this all the time.

9. Killer networking. Due to a bum driver that leaks. Check for excess handles. Read:
https://www.daniweb.com/hardware-and-software/microsoft-windows/threads/516015/how-many-handles-did-you-find-i-found-over-906-000-on-some-app-more

Out of all that the GPU looks to be trouble but let's check the power supply. Is it a nice not too old SINGLE RAIL model of say 600W?

- Collapse -
hmm

1) i agree, it was step 9 in my attempt to identify if I had a bad driver prior to a full windows reinstall.

3 & 7) yes, it is a R9 290. I replaced the factory heatsink with something recommended by TOM a few months after I first bought it when I realized a fan on the factory cooler was gummy and not getting up to speed. Multiple monitors have never shown a spike in heat in the past with multiple readouts claiming it runs cool for its model. Cant hurt to take it out again and give it another inspection though.

4) HWM shows CPU input voltage on the MB between .896 and .904.. if I'm looking at the right thing. I've never bothered to overclock this processor.

Cool the drive is storage. nothing is accessing it that I know of that would cause a crash. would the make a difference?

9) i dont even use the killer networking port... Checking handles doens show anything with Killer, but it does show AMD ReLive: Host App runnig over 9k handles. Im sure i specificly chose not to install that.

Power supply is a Corsair cx750 750w that just turned three last month. I would hope i havent killed one this quickly.

- Collapse -
By the numbers.

4. Try adding 0.1V to the CPU. It can take a lot lot more but when it comes in low, very odd lockups and hard to trace. This is NOT overclocking. Just a search for stability. If we don't do this we swap motherboards since as they age, they get noisy. Then we change them but sometimes I can avoid that.

THE DRIVE IS STORAGE. Doesn't matter as Windows may defrag or scan it and cause very hard (unless you know about the values to check) to diagnose failures.

9. That's a lot of handles. Try using DDU (see google) and install the driver of your choice. If this is some optional AMD app, kill it. If it has an update, update it.

PSU is ample. At single rail and more than ample, let's forget this for now.

- Collapse -
CPU voltage

heh. what i meant by the comment about overclocking is that i have very little experience manipulating voltage of a CPU. I THINK i can do it through BIOS but the BIOS in this board is a bit squirrely.

- Collapse -
The goal here

Is to see if we can avoid new boards and such. Good luck.

- Collapse -
update

1) removed
2) still installed. I use it a lot for work
4) voltage offset by + .1V
Cool replaced and removed
9)no excess handles (beyond 3k)

GPU heat sink was cleaned and re-seated with new thermal compound.

PC was put under load for about 5 hours and everything was fine. While I was asleep, around midnight from the looks of the gap in event viewer, the crash occurred again.

I still suspect this is an issue with the drivers and not the hardware as the system can be left for long periods under safe mode and not experience problems. Could it be an issue with timeout recovery?

Updated SPECCY:

http://speccy.piriform.com/results/6Gz3Pb2CPjmk22l3LxjjbwU

- Collapse -
5 hours for such an old PC? Could be its best.

I didn't see anything beyond what I noted so far so if you feel it's drivers, work that.

I don't have a tool like Speccy for drivers but will note DDU (see google) to dust off all video drivers so I can install the one I want.

5 hours can be normal for some desktops on some homes as the power isn't that stable. Long term use may demand the usual UPS as well as working out old parts. The next usual is a new motherboard.

- Collapse -
Old PC.

The build is only 3 years old. I admit i only dig into hardware every few years when I start thinking about incremental upgrades, but i didn't think a 3 year old PC now qualified as THAT old.

- Collapse -
My mistake.

I'm working another discussion and their PSU was about a decade old. So I crossed the streams and ended up confusing you. I'm sorry about that.

To test for long term stability we can't do that without the UPS since any small outage and we can get a reboot, BSOD or hang.

THAT SAID YOU HAVE A BAD DRIVE IN THE MACHINE. This can lead to nearly impossible to track down issues. I've seen this far too many times. The owner says "it's a data drive" and the troubles continue. It could be that, it could be time for new parts elsewhere or drivers.

But I will never give a pass to a machine with high values in 01 and 07 of the SMART values.

- Collapse -
Bad drive was replaced last night

Ahh, ok. You had me worried that i was out of touch Happy

The Hitachi, the bad drive, has been removed and replaced with a new Toshiba. The SPECCY results look very similar to me, but i'm not familiar with interpreting SMART data.

Hitachi (OLD)

01
Attribute name: Read Error Rate
Real value: 0
Current: 94
Worst: 94
Threshold: 16
Raw Value: 00000F000B
Status: Good

Toshiba (New)

01
Attribute name: Read Error Rate
Real value: 0
Current: 100
Worst: 100
Threshold: 16
Raw Value: 0000000000
Status: Good

- Collapse -
BRB with a new reading.

Let's hope something shows up.

- Collapse -
Answer
PS. If you have the time

Please consider marking this solved.

CNET Forums

Forum Info