My videos get dragged because I talk for 45 minutes, but I have a niche group that seems to appreciate the attention to detail and honestly I just enjoy talking about schematics/circuits/history that kind of thing.
Personally I don't think recording to a phone is good. If you do, you need to have a really bright room with a window or a flood light or something. The audio quality will be awful because for the phone to be close enough to hear you speaking, your amp will be far far too loud and it will end up compressed by the software on your phone or on youtube (or both). The result is that you're talking, then the first note you play is super loud and quickly squished, and then every tone example after that until you start talking again sounds like crap because of the compression.
I'm a little stuck on trying to do everything analog, but I'm not too fussed with comparisons so I try a few different mic arrangements here and there. I would like to do more comparisons in the future but I just don't have the kind of time to dedicate to this hobby.
If I were you though, I would buy a loadbox, a lapel microphone, and a used camera like a Panasonic G7 or similar. Record your voice through the lapel mic on the recording app on your phone (ex. voice memos on iphone), record your guitar through the loadbox - if you are demoing an amp, use the amp and show the settings, if you are showing a guitar, take a few b-roll shots of the guitar or talk while you move the camera around at what you want to show, if you are showing your playing, set the camera up on a table or tripod and either talk to the camera or point the camera at the fretboard. Then combine the audio and mute the parts you don't want - this is what I do at least - I use the audio from the actual camera to line up the waveform of my talking mic and my guitar mics, then I mute the camera audio. Then I just delete sections of the audio that I don't want doubled (ex. I delete the talking mic parts where I'm playing the guitar, that way you only get the guitar audio straight from the IR/cab).
I know that's a lot of work, but if you want it to look good, separating things and taking quality recordings and then combining the best parts is what you need to do. A lot of these tiktok/instagram/facebook videos people upload are "pretending" to just be a phone recording but they actually have a whole process for this. Even stuff that looks like little gimmicks or stupid cuts/zooms/effects are all done in post. My videos are much longer but for example if I upload a 45 minute video, it took me an hour of prep work setting up mics, cameras etc, and probably 8 hours or so editing/layering/tracking in Premiere. And my videos don't even really look that good. I think people seriously underestimate the amount of time it takes to do right.
The other alternative is to buy some kind of interface and combine all of your channels and use something to input that audio to your phone. For example, I have a Focusrite 18i20, I could have a couple of mics plugged in and send all of that out through the line out to a headphone jack adapter on my iphone, and control what you hear (my voice, or guitar after effects) by flipping switches on the Focusrite. That's probably what I'd do to streamline things if all I cared about was 30 second tiktok clips.