Obviously, the camera entity is special - there can only be one 'active' at any time, and that one has the special privilege of having its view rendered to the screen. And in the case of the player (especially in networked games!) the issue of making players special and tied to a particular user is puzzling. Of course, you could always just keep some pointer to the camera entity, but then more questions arise - how do you choose the initial camera entity if there are multiple in your scene, how should I handle switching cameras, what should my engine do if there's no camera in the scene, blah blah blah.
My home grown engine doesn't do "there is only one true camera". It supports multiple cameras, I even need multiple cameras. As I have multiple scenes, and for example the UI/HUD live in different scenes with different cameras. I can also setup multiple viewports.
So, I create 1 or more viewports (which take up the whole or a portion of the screen). Attach a one or more cameras to each viewport, and that decides what and where to render to and from. Any other camera just gets ignored.
The details are a bit more complex. But in general, it works really well. Things I've done with this are local hotseat split screen, or a spaceship game where you have multiple view angles on your ship at the same time.
Automatically searching for the first camera if you have none sounds useful at first, but will cause annoying problems later on. So I decided to require a bit more initial setup to have more flexibility and predictability.