raspi alexa

November 1, 2023, 10:21

k9t33n

this isn't entirely to do with the pi because I don't really need help with that. im just wanting to make a Alexa replacement (like everyone else) but it needs all the features and more

_krazy.

Make the features + you can now chat with it

k9t33n

so I hate the normal Alexa because it's just listening for buzzwords so it can't understand English, I want mine to be made with llms so it understands English

k9t33n

so instead of volume being hard coded, the llm should understand if we don't use proper grammar and for example I would say "volume high" which would just turn the volume up instead of "turn the volume up" being hard coded

_krazy.

Rent a VPS and use any +300B model and link it to Pi, and when you speak to the pi and not begin the query with "Raspberry" it'll send it to the model, if not it'll try to do what you program it to do

k9t33n

yes "raspberry", "raspi" and stuff will be hardcoded, but that's about it

k9t33n

the thing is I would need bard to tell my program when to turn the volume up when it's understood that the user wants it to, but I'm afraid it won't be consistent so should we need a classification model as well or something?

k9t33n

oh I didn't realise you mentioned

_krazy.

pseudocode:

lua
result = someModelFunctionThatGetsTheMeaningOfWhateverIsQueryedToIt("hey so uhh i want you to to turn off, no no turn on the volume higher")
print( result )

>> volume+ or smth

_krazy.

This is just an example of me showing you that they already exist

_krazy.

so you don't have to repeat yourself

k9t33n

ok well I'm gonna do some research and tinkering

_krazy.

👍

_krazy.

And also you can give commands in a dictonary like this:

_krazy.

command: function

d
{
    "turn volume up": turnVolumeUp,
    "turn volume down": turnVolumeDown,
    "open the lights": openLights,
}

_krazy.

And then the model tries to get the highest percentage of relativeness

_krazy.

If it's %50+ do the command, and if there's any higher relativeness percentage found run it instead

k9t33n

that's much better of a way than what I was gonna do

k9t33n

very complicated tho

_krazy.

Not really, it just requires some computing power, which is considerable when working with Pis

k9t33n

yeah

ampueromalo

Ok Googled enough , I got everything I need , thanks !

_krazy.

I apologize for not responding to your question fast enough, and thanks for looking up that you've wanted, I hope you learned more than what you would if I answered

ampueromalo

ok i think it might not needed but some "test data" might be needed https://mycroft-ai.gitbook.io/docs/mycroft-technologies/adapt function calling is also promising https://platform.openai.com/docs/guides/gpt/function-calling but I want to avoid paid solutions

k9t33n

yeah me too

k9t33n

tell me if youve found a solution, im only just getting on with speach to text models

ampueromalo

this https://mycroft-ai.gitbook.io/docs/mycroft-technologies/padatious and https://mycroft-ai.gitbook.io/docs/mycroft-technologies/adapt look promising you can also use the mycroft solution and add your own skills, I dont want to use mycroft since I already made a "dummy" alexa that parses phrases with regex thats why I want something that parses intents intelligently

k9t33n

exactly

ampueromalo

and the most interesting would be to build your own models using tensorflow or scikit learn but you need much more training data

k9t33n

and I need much more knowledge and power

ampueromalo

yup but as the other one said you can leverage that onto a vps

k9t33n

but I don't want to pay

ampueromalo

there are free tier vps providers

k9t33n

oh I will check them out

k9t33n

bye bye now

_krazy.

Just manipulate the model into being your personal assistance to your liking and it'll be, the good thing about having your own AI is changing it to whatever you want, even if it may sound bad

k9t33n

im still a little confused on how to classify my stuff, I want to classify my text into groups (question answering, music, volume etc) so I can separate it into several smaller models, I know you told me a bit how to do it but im still really confused. could you just explain it again simpler and with more detail <@266512529746952192>?

_krazy.

First, it depends on how you're running the model. Second, text generation models should have context that they follow which is usually "You're a helpful assistance, you block this bad stuff and you stay always on topic, be respectful and think outside the box.", and many other things. This is a way to edit the mode to your liking but not that deal breaker. Third, you can train the model of certain inputs and outputs (parameters), it takes time but its results are incredible, many have trained models on even image boards and it started conversing like them, it's really incredible.

_krazy.

I hope this clarifies the topic a bit, I'm really bad at it so don't expect a pro answer.

k9t33n

ah ok

k9t33n

I've already got a pretty bad llm model running and I can tune that with the system input, got it working as a discord bot, and I made it so I can talk to it and and it talks to me

k9t33n

spent the entire week on that stuff

k9t33n

tiny lama

ampueromalo

im actually considering the gpt3.5 function calling, its cheap af, 100k tokens permonth is less than 60 cents

k9t33n

yeah, I still don't want to use it, maybe I can test myself with another model

ampueromalo

yup, what krazy said above is the most accurate I can think of, but it will depend how good the model is

k9t33n

yeah

k9t33n

im gonna use different models for different things so hopefully it's not going to be bad at one single thing