31 total posts
Update on the issue
I already tested the RAM with MemTest earlier, it took 5 hours and no errors were found. Should I test each stick seperately by removing the other?
All voltage monitoring software I have shows correct values in the lines. The one with highest deviation is +3.41 volts in +3.3V Standby line. Do I need a PSU tester for better than ms resolution?
I underclocked my GPU and the OCCT PSU test doesn't seem to cause any reboots.
Stock --> Gpu VCore:1.012V Core clock: 880 Memory Clock: 2100
Underclocked--> Gpu VCore: 0.95V Core Clock:500 Memory Clock:1500
Btw GPU temperature with either setup doesn't go above 55C if you are worried about overheating.
What does it mean? The software already saves plots of a wide variety of voltages/temperatures/clocks, as well as taking a screenshot showing all, so I can post them here if anyone is willing to take a look. Here is the screenshot for underclocked settings: http://imgur.com/PS6GmdD
I am currently running the OCCT PSU test for 3 minutes, since I was getting instant reboot originally, but in games my PC can reboot any time randomly. (Though it reboots sooner more likely)
With GPU underclocked, I overclocked the CPU and still didn't get a reboot during test: http://imgur.com/oXMxTbA
Stock CPU--> Max Vcore: around 1.2V, Core clock:3.3Ghz
Overcloked--> Max Vcore: 1.4V, Core clock:4.5Ghz
I also tried keeping GPU VCore at the stock value and only decreasing clocks, and again there were no reboots. I thought only the voltage requirement of the GPU would effect the PSU, am I wrong? Here is the screenshot with stock voltage gpu, but underclocked (VIN0 is GPU VCore): http://imgur.com/wChY1IT
Can I assume the issue is with my PSU or GPU most likely?
It could be the old issue of a motherboard and GPU that don't play well together. The temps and voltages looked OK to me when we take into account you are not using a Volt meter. I use a Volt meter when I need to measure. Onboard voltage readings are good things but not trusted.
Is the GPU from 2011? Sounds like it may have had its day.
Yes the GPU is from May 2011. The issues were persistent before and after I did the following: I recently reapllied thermal paste to GPU and changed the fan, removed the PSU from the PC case to clean the huge amount of dust collected on the fan, readjusted the plastic spacer under it to give the fan more breathing room. Changed the sockets that the 2 PCIe power connectors from the GPU were connected to. ( PSU has 4 PCIe sockets, I connected to the 2 that weren't used previously)
However, I didn't change the PCIe connectors.I have 2 previosuly unused PCIe connectors that came with the PSU, and I will try them soon and post again.
Thanks for telling this.
Sometimes a member will not share it's an elder machine.
Now that we are dealing with 4 year old parts I'd do the inspection for BAD CAPS (google bad caps for pictures and what to look for.) My criteria is PERFECTION. No bulges, leaks, blown bottoms, tilters or discolored electrolytic capacitors.
-> ALSO, I know this is DANGEROUS but I've brought back a few dead PCs by simply (?) unplugging and plugging everything back in. It scrubs the contacts.
Nothing really points me to any particular part but given the age my inclination without seeing the machine is the motherboard and/or GPU.
The motherboards have these onboard power supplies for CPU and GPU that used electrolytics and those do age and later show symptoms like what you wrote.
Sadly the only way to tell is to pop in a new board unless we get lucky and find bad caps.
Motherboard and Case
Which caps whould I be inspecting? I think opening PSU case may be dangerous as well as void warranty. Looking through the back panel of the PSU I see a white "paste" holding a capacitor and another component together, but I read those are put intentionally to decrease vibration.
I can deattach the heatsink from the GPU and check the capacitors on board, but doing so would require me to reapply thermal paste and last time I did that, I didn't notice any damage on capacitors. ( I wasn't really inspecting though)
I have been working close to the motherboard for a while as well without noticing any damage, again without thorough inspection. To inspect it completely I would need to remove CPU cooler which requires me to rebuild the PC from scratch, and it will require some reading as I didn't build it myself the first time. Pictures of the case and motherboard:
I can barely see the caps tn the first picture.
But my thoughts are it's the aging that has caught up with this machine. The symptoms to me sound like motherboard. The reduced clock rates whether it be GPU or such only lighten the load and loosen the timing to let it run without error.
Sorry I wish I had a clear answer but to me this reads like motherboard and/or GPU has aged and needs to be replaced.
I'll remove my comment about motherboard and GPU playing together since it's been fine for years.
Will keep you posted
I will try taking higher quality pictures as well as checking the capacitors myself. I am also trying a few different settings during PSU testing, as well as trying different software/hardware configurations so apologies for late responses.
20 min test unordianry behaviour
I tried running a test for 20 minutes with stock GPU Vcore (1.012V), but underclocked core (500Mhz) and memory clocks (1500MHz) . I noticed there are some interesting things happening with the plots. Mostly the CPU usage was 100%, but GPU Vcore was 0.95V, core clock 405 Mhz and memory clock 325 Mhz, but every 5 minute for 10-20 seconds the CPU usage would drop to 30%, GPU Vcore rise to 1.012V, and core and memory clocks would reach the set values (500MHz and 1500MHz).
Then I inspected the plots and noticed many unordinary things happening in those intervals. I will provide a link to all these plots at the end of the post, but here is a summary: It seems CPU Vcore rised from 1.35V to 1.4V when CPU usage dropped to %30. ( Note: I later tried with an overclocked CPU, with all throttling and monitor features turned off to make sure it wasn't intentionally done by mo-bo, and seems it wasn't.) Also the AVCC (+3.3V) seems to fluctuare between 3.33 and 3.31, but when the CPU usage drops to 30%, AVCC stays perfectly stable.
In the next test I did with the overclocked CPU (from 3.3Ghz to 4.5Ghz, with throttling and monitors turned off) the plots were very much alike. This behaviour might be intended by the test, but I don't think they would want to run only CPU or GPU at its max at a time when they are testing PSU stability. Anyway, here are all plots:
CPU at stock: http://postimg.org/gallery/2mjdqj31m/b3a80cbb/
CPU overclocked: http://postimg.org/gallery/sfxikng4/fc06245d/
A screenshot of the behaviour: http://postimg.org/image/xvifwrjcj/
All that tells me
Is that there is suspect parts. You're there so what's your call?
Need Confirmation of the PSU Test Results
I suspect the GPU, as I had other (driver/fan) problems with it, and underclocking it increases stability. That's why I already ordered a new one, but I will be testing it with a potentially unstable system when it arrives. So if there are issues, I won't know if the cause is the new GPU or my old system.
I tried finding out about what ordinary results of the OCCT PSU test should be or if anyone got similar results to mine, but I can't find anything. If someone can use the test I used and see if they don't get the same behaviour I got, it would ensure the results I got are symptoms of an issue. The test didn't cause any harm to my PC with default settings, but if you want to make sure you can set it to stop if certain condiitons are met in the settings menu.
My PSU testing is all about the PSU under load and watching the rails with an oscilloscope. In your case the PSU looks to be a good choice where you are not tapping over 75% of its rating so that usually gives it a 5 to 10 year life span. The symptoms point me to motherboard and/or GPU. But PSUs are pretty cheap so why not?
Solved For Now
Seems the issue is resolved. I reassambled the PC, and changed sockets and connectors with spare ones. Also changed the extension cord and the wall outlet. Don't know which was causing the reboots, but it seems fixed now. If I ever find the cause I will update this thread so people with the same issue can see. Thanks for your help.
Graphics Card Spec Correction
Correction : My current GPU is MSI gtx560-Ti Twin Frozr II/OC. My specs link show the ordered card. The correct specs link is : http://pcpartpicker.com/p/Xhn9D3
It is back
Apparantly the issue wasn't solved completely. I got my new gtx 970 gaming 4g, run some benchmarks (Heaven Unigine) etc. to make sure it works. But when I tried playing Star Wars battlefront today, my pc seems to reboot occasionally during game menus but never during the actual gameplay. I monitor CPU and GPU temps and they are always below 60 C. I read my graphics card may cause crashes while transitioning from load to idle due to sudden voltage drops but it shouldn't result in a reboot as far as I know.
I just started playing witcher 3 to see what fps I will get, now I will try actually playing the game to see if I get reboots in this game as well, and update this thread..
Btw Corsair accepted my RMA request for the PSU, but sending it will still take some time so I want to avoid it if possible. What do you think may cause the PC to reboot at game menus particularly?
Re-reading all the above.
It will be painful but try this with a single stick of RAM.
The RAM doesn't look to be on the QVL (which is often the case) but given not much seems to glare here it's starting to sound like the old incompatible RAM issue.
Some motherboards require changes from defaults to work with some memory sticks.
-> Just because the RAM fits doesn't mean it's compatible.
As there are so many boards today I just change boards than tinker with BIOS memory settings.
Doubt it is Ram
There was a XMP profile saved on the memory module that I use. I ran MemTest for 4 passes ( 5hours) with no errors. The mo-bo has a memory led that lights if the sticks aren't placed properly, so I doubt it is the RAM.
Memtest has passed for me and still
I've swapped out RAM to cure such ills. The go-nogo light, still same problem.
I've only built a few thousand PCs over the years (ran a PC shop for a long long time) and learned to distrust memory tests. Good that they pass but does not entirely make it something you don't suspect.
The single stick test is next in my book. Also, NO OVERCLOCKING. We must go stock until it's stable.
Need help with Kernel-Power Events
There are lots of kernel power entries in the event viewer regarding the unexpected reboots. Where can I get help with these event logs?
There are priors on that error.
But the causes run from hardware to drivers. I think we are close to calling it aged hardware and need those 2 hardware tests to get a clue as to what it might be.
At the shop I would have pulled the GPU already, done a BAD cap check and tried the single stick test.
Updates and Kernel Power details
Some updates first: Reverting the overclock on the graphics card seeems to fix the issue in my limited testing. But no stress tests or benchmarks caused any crashes, artifacts or throttling with the overclock settings which I copied from a guide.
I used to get only 1 kernel power entry when a reboot occured earlier. Now there are several as in this screenshot. ( The language is Turkish, which makes it harder to copy paste the general descriptions to google) :
I learnt that the Kernel Processor Power entries are due to disabled C states and not an issue. There are also 2 entries with "thermalzone" in their description which makes me worried as I removed the thermal pad of the intel p67 chipset not knowing what it was and replaced it with thermal paste. Though all temperatures in hardware monitors seem fine while idle, continuosly keeping an eye on all of them during gaming seems hard. There is also the event log that details the shutdown which may be useful.
Sorry but here you lost me.
Overclocking and failures go hand in hand. No one I know will dive into overclocked machines other than the put it to stock. This can really set some folk on fire but it's unsupportable and no testing other playing the game seems to tell you it's OK.
Re: unexpected reboots
https://support.microsoft.com/kb/2028504 tells what it means: unexpected shutdown. Now it's up to you find the cause of that. Most likely, as Bob says, some hardware issue.
Or it's the overclocking.
I re-read the thread and only later is the overclocking exposed. Overclocked machines may be fun. Until they crash.
Yeah, but this a reboot and I turned the option "restart after crash" in windows. Anyway I will try running the card at factory settings and hope for the best but I fear sooner or later this issue will revisit me.
That's a given. Machines don't last that long w/o repair.
As I feared
As I feared it started happening at stock settings too. Once it happens it also starts happening more quickly after I open a game.( 20 seconds instead of 20 minutes)
I will try reassebmling everything one more time and then just RMA the PSU.
Could it be temperatures?
Weirdly I am now suspecting my CPU may be causing the reboots. I was reverting every change I made since the PC seemed to work last, and one of them was reducing CPU cooler (Themaltake FRIO) fans from 100% to %70ish. I thought this was okay as CPU core and GPU temps were always below 65 in games with the new 70% setting. But after recent reboots there were some kernel power events with "thermalzone" in their description in the event viewer. Anyway I increase the CPU fans to 100% again, and haven't got any reboots within an hour. Of cource if I try OC ing the GPU again, I get reboots within minutes. ( CPU overclock is fine though)
Could it be that increasing GPU power usage cause fluctuations in the CPU voltage and increase the CPU package temperature which is casuing reboots? I forgot to put the washers between the cooler and the motherboard while installing the CPU cooler. It was without the washers since I had the PC, and the temperatures seemed fine, so I thought they were not really necessary. Next reboot I get, I am reinstalling the cooler and putting the washers.
It probably was the Cstates. Enabled c1e and disabled c3e and c6e. No reboots so far. Guess I shouldn't have trusted the guides, some even official, saying I should enable c1e and leave others at auto. The latest BIOS had one of c3e and c6e at disabled. It makes sense as fully loading both GPU and CPU at the same time didn't cause issues, but since games put varying load at cpu and gpu at different times it was probably causing problems.