[ad_1]
Over the previous couple of weeks I’ve been experimenting with chaining collectively massive language fashions.
I dictate emails & weblog posts usually. Just lately, I began utilizing Whisper for drafting emails and paperwork. (Initially there have been some points with reminiscence administration, however I’ve since discovered a compiled model that works properly on my Mac referred to as whisper.cpp)
After tying Google’s Duet I questioned if I might replicate one thing comparable. I’ve been chaining the Whisper dictation mannequin along with LLaMA 2 mannequin from Fb. When drafting an e-mail, I can dictate a response to LLaMA 2, which can then generate a reply utilizing the context from my authentic e-mail.
To this point it really works generally, however there are some clear limitations:
First, the default tone of the generated emails is way too formal.
Second, if I immediate LLaMA 2 to make use of a extra informal tone, it usually goes too far within the different course. The issue is a scarcity of nuanced context – the suitable stage of familiarity varies significantly between emails to shut colleagues versus board communications or potential buyers. With out that nuance labeled and included into the coaching knowledge, it’s onerous for the mannequin to strike the precise tone.
Third, in multi-party e-mail threads issues can get complicated. If Lauren introduces Rafa to me, then Rafa bccs Lauren on the e-mail, LlaMA 2 usually replies as Lauren.
Fourth, determining precisely the precise settings for the mannequin could be robust. Typically I dictate lengthy emails, by which case the context home windows (how a lot the pc listens to earlier than transcribing) ought to be very lengthy so the system can keep in mind what I’ve stated beforehand.
Different instances I’m simply returning a really quick e-mail. A fast see you quickly or thanks very a lot. Through which case a protracted context window doesn’t make sense and I’m left ready for the system to course of.
I’m questioning whether or not small errors within the first mannequin compound within the second mannequin. Unhealthy knowledge from the transcription -> inaccurate immediate to the LLM -> incorrect output.
General the potential is thrilling, however there are nonetheless challenges round tone, context, and multi-party interactions that must be addressed earlier than this may change into a seamless productiveness instrument. In machine studying techniques, attaining an 80% answer is fairly speedy. The marginal 15% – the magic behind ML – takes an enormous quantity of effort, knowledge, & tuning.
[ad_2]