If your rtspsrc stream is already encoded in H264, just write to mp4 container directly, instead of doing codec process.
Here is my gst-launch-1.0 command for recording rtsp to mp4:
$ gst-launch-1.0 -e rtspsrc location=rtsp://admin:[email protected]/rtsph2641080p protocols=tcp ! rtph264depay ! h264parse ! mp4mux ! filesink location=~/camera.mp4
If you want to do something like modifying width, height (using videoscale), colorspace (using videoconvert), framerate (using capsfilter), etc., which should do based on capability of video/x-raw type, you should decode from video/x-h264 to video/x-raw.
And, after modifying, you should encode again before linking to mux element (like mp4mux, mpegtsmux, matroskamux, ...).
It seems like you are not sure when to use video decoder. Here simply share some experience of using video codec:
- If source has been encoded, and I want to write to the container with the same encode, then the pipeline will like: - src ! ... ! mux ! filesink
 
- If source has been encoded, and I want to write to the container with different encode, or I want to play with videosink, then the pipeline will like: - src ! decode ! ... ! encode ! mux ! filesink 
src ! decode ! ... ! videosink
 
- If source hasn't been encoded (like videotestsrc), and I want to write to the container, then the pipeline will like: - src ! encode ! mux ! filesink
 
Note: It costs high cpu resources when doing codec ! So, if you don't need to do codec work, don't do that.
You can check out src, sink, mux, demux, enc, dec, convert, ..., etc. elements using convenient tool gst-inspect-1.0. For example:
$ gst-inspect-1.0 | grep mux
to show all available mux elements.
     
    
h264parseat all. 2) does the stream play at all?