How do I troubleshoot when I have no clue where to start?
Get a better idea.
You ain't going to win a battle without sufficient field information.
Describe your problem in detail so that you have a good idea of it, who knows it just happens once.
Track back in time what happened before and together with the problem, both you and your computer.
Think of the possible causes because sometimes it might be something that's not obvious.
Get more information whenever you have no idea of what's happening, this could range from Events, to SysInternals Tools, to Performance Analysis, to Debugging, to any other tool in your expertise.
Test your assumptions to be sure that your thoughts don't filter the cause away.
Divide and conquer.
Because that's how military defeat their opponent even when outnumbered.
Eliminate the possible causes, or you'll have a problem keeping track of the problem. This way, you will get closer and closer to the root cause of the problem, it allows you to solve the problem a lot easier.
For example, with hardware, disconnect and remove anything that you don't need for fixing your problem. This way, you might disconnect the component causing the problem. And then it's again a matter of inserting half the components in, checking if it reoccurs and repeat splitting till you have the bad component...
Testing something on another computer, if available, is also a good benefit towards solving the problem.
For example, with software, rebooting into safe mode, disabling start-up entries also helps. This also applies to enabling/disabling settings, trying the default configuration and so on...
Let's put it to the test.
I am currently encountering a problem with my new machine. On a few occasions the machine has just frozen; not accepting keystrokes, mouseclicks, or anything except the power on/off switch. Invariably I have been merely browsing the web; I have had a few (<= 6 other applications) running. None of these applications are major; and represent a mix of commercial programs and open source programs, typically migrated from Unix of some variety.
That's a proper description by itself, it doesn't just happen once either.
You know what happened together with the problem,
but haven't thought of things you or your computer did before the problem.I can't tell this, but you, your event log and recently modified files/folders could tell.
Possible cause is most likely to be CPU related, because it's the component that processes things.
More specific this could be a process, a driver or failing hardware (perhaps temperature problems?).
I know it's CPU, but don't know what. Events don't show this, Process Explorer would hang on DPC.
So, next step, I let trace analysis run which I close after the hang has occured.
I look into the trace, and I see that driver X is causing the problem!
No real assumptions are made. The CPU assumption is handled by our Divide & Conquer approach...
So, this is where I start dividing to conquer the problem, I stop once solved:
Problem with current version of the driver?
Update the driver to the latest version.Problem with newest versions of the driver?
Get a new trace. Update the driver to an older version different from the initial.Problem with the device? Configuration problem in the registry?
Get a new trace. Reinstall and/or disable the device if possible.Problem is random, is it the processor heating up?
Check the processor temperature, replace fan if needed.Problem is not the processor, are there other hardware and software influences?
Remove hardware and disable software from running, to nail down third-party influence.Problem is not in a removable part, it should be replaced.
In the worst case, if all else fails, you need to go for a replacement.
Getting new traces and removing hardware gives us more information, so we know where to look next.
Good logs and intuition - really.
- From day 1, keep track of everything you do to the system: app & OS updates, new installs, new or removed hardware or connections, the thunderstorm that "didn't cause a problem".
- When you first noticed the issue:
- What had you been doing?
- What else unusual happened recently?
- What have you done differently recently?
- From then on, keep aware of what you're doing so the next time it happens, you have a better handle on what had just preceded it.
- Snapshot the system logs.
- See if you can you reproduce it. Until you can reproduce it, you can't find it.
- Start partitioning the system: safe mode vs. running live, new account vs. your regular account, different keyboard and mouse than your regular ones (esp. bluetooth vs. wired), does it happen within a few minutes of starting or waking vs. only after an hour more of running (think thermal).
I usually start with the event logs and any logs that a program mmay create on its own. Programs will sometimes crete a log in the program folder.
Once you can identify the time, search the logs for events. Naturally windows logs may present with Stop errors that will be easy to identify.
Check all drivers and make sure they are current.
Patience will likley be required in large doses.