Why should I learn a microcontroller architecture?
Are 8051 and other low-bit microcontrollers still in use today?
Yes, nearly everywhere. They're small and easy, there's a lot of cores floating around that you can put into your custom silicon at low or no cost, there's mature compilers. This all makes the 8051 still one of the most popular core architecture amongst silicon manufacturers. ARM cores might be available in more different products, but then again, when you talk to someone who's building a lot of devices at a very strict pricing constraint, chances are he's going to prefer a cheaper/free 8051 core if it gets the job done. Just to oppose @Nitro2k01 claim of niche-only usage: Mouser has nearly 800 models of 8051 microcontrollers on stock¹. And the fact that these start, even at Mouser, at prices below 40ct might be an indication of what they're used for:
mainstream, low-performance, high-volume MCUs
thus:
...obviously , it won't be used by any industry to develop a product because of its simplicity...
is high-quality utter nonsense. Especially since you're delivering a counter-example yourself
My boss, who is in his mid 50's, said that he was using 8051 derivatives, and they were doing the job.
Exactly! They're used everywhere, they're well-proven and cheap, and they are sufficient; never underestimate the advantage of having a solution to a common problem in a drawer somewhere!
Of course, it's often the case that you might need a solution with let's say two typical automotive busses, a high-speed interface to an ADC, some reliable watchdog timers, three PWM units... and then you start piecing together something consisting of four 8051 and 8080 derivates.. uh. That's a bad situation, and could very likely be solved much faster and more reliable using a single, more versatile, more powerful MCU (e.g. an ARM). But that "we have company knowledge on how something works with old technology" vs "we are future-proof by having the ability to run on modern hardware" is a classical investment security tradeoff. If you encounter one of that kind of projects, I'd try to talk to the boss in that context. For easy small jobs, yeah, 8051.
Should I bother to learn about MCU architectures in general?
Yes! I think @jfkowes explains that very well. But honestly: this is a bit like asking "should I learn how the internal combustion engine works if I want to be a car mechanic"; the answer is "you might just live fine if you can just execute repair manuals well enough, but you will probably be a much better technician (leave alone engineer) if you understand what your hardware does.
As soon as you face a problem that can't be google'd, you'd be pretty much a turtle on your back if you didn't roughly understand how your processor works.
Should I bother to learn the 8051 architecture?
Probably not. In the sense that, yes, as long as cost is not your primary focus, you can most likely just use much mightier and versatile MCUs based on ARM cores or other, more modern architectures.
Then again, the 8051 core is so easy that I'd actually recommend understanding what its units are before trying to tackle a more modern, complex, MCU core. It's a nice example.
So if 8051 isn't the core I'm looking for in a low-volume application, what am I looking for?
So, personally: go for an ARM Cortex-M0, -M3, -M4F; these are abundant in all kinds of affordable microcontrollers, easy to program (yay, mature GCC support, CMSIS standard libs, lots of embedded OSes running on these), and commonly come with standard debug interfaces (which is a great plus).
ARMs are, from the outside, usually relatively easy to understand, as you'd typically map every peripheral into memory space, and that's it. Internally, they have varying degrees of sophistication, and speed/robustness/size optimizations, making them not perfectly easy to understand en detail, but I guess that might be a bit much to ask for unless you're into CPU design.
If you're into CPU design, I think (this is really a personal belief based on my observation of research activities and "promised" industry investments) we're currently observing the rise of a new important ISA – the RISC-V. There's various implementations of this architecture for FPGAs or silicon, and people like Nvidia seem to also play with the though of replacing their stream multiprocessors with these kinds of cores.
¹: It's very likely I'm missing more than half of the actual 8051s that mouser has (because, hey, I just selected all MCUs whose core name was *80*5*). Chances are that if you pick a random 8bit microcontrollers, it's likely that its core is at least partially derived from 8051. I mean, just look at wikipedia's "list of [8051] derivate vendors".
In general, here are some good reasons to learn (or at least have a working knowledge of) the architecture of the microcontroller you are using.
Caveat: in the context of your job, the company, the application, the associated hardware etc, there may be reasons why you should not learn the particular architecture you are using right now.
Debugging
When high-level libraries are working, you might not need to know the architecture. When you start having problems, knowing the internals of your microcontroller can help a lot to isolate and fix those problems quickly.
Code Efficiency and Simplicity
If you know the architecture, you may be able to move functionality from software to hardware. This has the potential to reduce software load and remove sources of bugs.
Cost Reduction
Knowledge of the architecture may reduce program and data memory usage and processor load. This may mean you can select a microcontroller with fewer resources, potentially reducing cost.
Increasing your usefulness
Even if you don't use the knowledge right now, discussions with colleagues/vendors/support engineers etc. might call on it. For example, something you know might help someone else out with a problem they're having. Saving the day is something people remember.
Knowledge is Power
Even if you don't need the knowledge in your current job, when you see an advert for a job that looks amazing that says "Knowledge of the <microcontroller family> architecture is required/preferred", you'll be in a better position to go for it.
Are 8051 and other low-bit microcontrollers still in use today?
Yes, although mostly for niche use cases. They are mostly used for simple tasks, in mass produced, cost-driven products, or where a proven track record is desirable. They are often licensed and integrated to a single chip solution. Because of their simple architecture, it's easy to integrate them with custom peripherals on the same chip. Another advantage is that they can be produced on a small area with older (and cheaper) semiconductor manufacturing techniques.
One such example is in the control chip in smart cards, which often use an 8051 or similar core with cryptographic hardware extensions. You would likely find 8-bit microcontroller cores in things like the controller for a smart electric toothbrush monitor. A vehicle ECU will often use an 8-bit microcontroller, along with a 32-bit one, as a watchdog because of their higher reliability and lower complexity.
Should you learn it?
Apart from the chance that you might actually end up in a situation where knowledge of that particular architecture is needed, I'd argue that it is a useful skill in general. Even if you program C in your day to day work, having a general understanding of what goes on "one level below" is useful. When troubleshooting weird bugs or performance problem, it may be a lot easier to pinpoint the problem if you have a general understanding of the underlying hardware. You could also more easily analyze the assembly language output from the C compiler. Learning one architecture will also make it easier to learn different ones in the future. These skills might also help you write better code even for more modern CPU cores.