Warning - while you were typing a new reply has been posted. You may wish to review your post.
Just because the n64 could do it mean it's the same, the method, limitation and density are clearly not the same, I have been studying and trying to implement these techniques since I'm on Tigs and my earliest post was about grabbable edge on mario 64. I'm well aware of various way to do it and their shortcoming and how they have evolve through time. Though I'm not a super expert that does it all the time.
When you have only 6000 polygon on screen (not all for the scene) it's not the same than having millions, even with separation of the colliding mesh with the visual mesh, the complexity did goes up (the number of polygon of one character in one generation is the budget for the entire scene in the previous generation), and the collision algorithm scale quadratically (basically exponentially). Most Optimization comes from partitioning the data as much as possible (to curve it toward logarithmic shape), using tags and proxy objects and also deciding how the gameplay works (that's why rpg have simpler collision than platformer for example, you can make different assumption to fake things). For example in gta 5 to account for more stuff on screen they removed grabbable edge present in gta4 (see the video provided above).
So that bring to that scene in mario switch and zelda, they have character with complex behavior, which mean you will have a lot Line of sight raycast to account for changement, the simple fact that enemy can lose their sword or scan the environment to find stuff is consuming a bunch of raycast, in fact if the enemy couldn't lose the weapon nor pick one in the environement or be in flat area where the complexity could be controlled (see all zelda prior) you could save a bunch of raycast for other stuff (in wind waker you can see such an optimization with the big enemy pig with a lance, the physics of their flavor trail don't actually collide with the ground, they raycast down and use an assume plane, the rare tile you see them next to an incline you see it just lay flat and don't follow the ground). Also it depend on the nature of what you raycasting against, assumption can be made for heightmap terrain that is different from meshes.
In that mario game, it isn't the standard mario game with simple behavior right of the bat, at least the city scene, you have character walking and circulation around and reacting to change in the scene, you have more complex interactable geometry with much more potential for chaos and occlusion, that's key because it's not gta or a rpg, it's a mario, they must have very consistent response to command and not have explosion like in skyrim that send you flying away. That generally mean more check for safety. Obvious trade off is its a small scene so not having to load stuff asynchronously like in zelda help massively in stability. BTW you can see hint of partition in zelda with the fire on ground, they follow a grid, I bet the interaction check are aligned to that grid. I also admit I was massively impressed by enemy being scared by falling boulder pushed down a hill, you just have no idea how such a simple interaction is a big deal. The game is also a bit sparser in interaction desnity than that mario game, I bet enemy aren't too localized around stream seam and they also seems to exist in a specially design area with wiggle room around that I wonder if they can actually leave that area (I bet not).
I had to deal with these problem and it's one of the reason my sonic clone is actual vaporware, because I use a complex environment, ie the assumption of "down is the ground" cannot be made as it is a 360° platformer like mario galaxy, it makes thinks more complex and I had a lot of slowdown on a machine that is more powerful than the 64 because I did it naively (also because physix is a generic engine that do too much for my sake). And speaking of mario 64, I have been studying the physics of that game for a long time and I'm still baffle by all of the exception they are having.