Commit Graph

58 Commits

Author SHA1 Message Date
Brian S. Stephan 033631e5c2 no longer encode/decode UTF8 stuff when going to/from database
seems safe so far (famous last words)
2012-07-27 16:34:57 -05:00
Brian S. Stephan e1356496eb Markov: don't encode('utf8') the stuff out of the database
it seems unnecessary now? i guess i have to change this in all
the modules now, including this one because i probably missed something
2012-07-27 15:24:56 -05:00
Brian S. Stephan 7bd5558f05 ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_bin for case-sensitivity 2012-07-27 14:57:41 -05:00
Brian S. Stephan 1a36becead convert to a MySQL backend
WARNING!
there's no going back now. this change is *huge* but it was overdue.
WARNING!

the database backend is now mysql. modules that should use a database
but don't yet were left untouched, they'll come later. scripts haven't
been converted yet, though i'm pretty sure i'll need to soon.

while i was going through everything, connection/cursor idioms were
cleaned up, as were a bunch of log messages and exception handling. this
change is so gross i'm happy things appear to be working, which is
the case --- all modules are lightly tested.
2012-07-27 02:18:01 -05:00
Brian S. Stephan 9654f4de98 switch to use python's logging, with config file i'm not entirely happy about 2012-07-15 21:32:12 -05:00
Brian S. Stephan 2b0b7abd58 Markov: unicode fixes and improvements 2012-07-15 01:11:21 -05:00
Brian S. Stephan 2650824dbd Markov: correct the documentation on min_size/max_size in _generate_line 2012-07-14 09:22:37 -05:00
Brian S. Stephan d94d7f0c88 Markov: register ._generate_line as markov_generate_line 2012-04-05 21:24:41 -05:00
Brian S. Stephan 07744a0f66 indicate recursion better by adding _recursing to Event
for simplicity's sake, this was added to the extlib/irclib rather
than subclassing. because i'm lazy. anyway, check that flag instead
of doing the event._target = None hack, since that hack was breaking
Markov.

for an unrelated reason (what to learn and not learn), update Markov

also remove an unused method that was getting in my way while coding this
2012-03-29 20:07:32 -05:00
Brian S. Stephan 7d41564d02 Markov: allow for auto-context insertion
this should result in no chains having a null context --- if no pre-existing
context is created, one is created for the channel/nick and used. this makes,
for example, arbitrary queries "private" to that nick (again unless that has
been overridden). shouldn't affect much of anything, but adding this made
the context-less learning code obsolete, which is fine since it was never used
anyway
2012-03-19 00:12:29 -05:00
Brian S. Stephan 26bc8bec34 Markov: rebuild the tables, use the context stuff in a better fashion this time
the module will drop your old tables if you have them, so if there's data there,
be sure to back them up and figure out some migration strategy (probably annoying
and probably having to script it).

the big change is that each line is associated to a context now, and channels
are also associated to contexts. this should allow for a better partitioning
of multiple brains, and changing which channels point to which brain.

also caught in the wake is some additional logging verbosity, and a change to
no longer lower() everything learned.

the script to dump a file into the database has also been updated with the above
changes
2012-02-28 23:23:14 -06:00
Brian S. Stephan 8c1ffc54ba Markov: drop the max id stuff, get a bunch of chains and pick one randomly. cooler this way. 2011-10-21 17:01:09 -05:00
Brian S. Stephan e3ef3f48dc Markov: add support for temporarily disabling chatter by supplying a negative chance 2011-10-21 16:59:57 -05:00
Brian S. Stephan cda1d43606 Markov: index on (v, context) and other enhancements for the last commit
reduce some infinite loop possibilities, and add an index with the old <= id trick
to speed up the searching for backwards chains
2011-10-16 21:13:27 -05:00
Brian S. Stephan 42962bc48d Markov: add support for starting in the middle of a chain and working backwards
this only makes sense if we have a target word set, which we usually do.
start with the target word and go backwords, finding k2s that lead to it
(and that lead to that k2, and so on) until we get to the start-of-chain
value, when we know we're done working backwards. then resume the normal
appending logic

probably needs some work, probably a bit slow on huge databases. analysis
pending, but this appears to work
2011-10-16 20:19:51 -05:00
Brian S. Stephan 50fbbbfedd Markov.py: tweaking the shut up check, this has been pretty good for a while 2011-09-20 01:20:27 -05:00
Brian S. Stephan 4566d1734e change the default sqlite timeout to 30 seconds
this should make the bot wait longer for table locks, assuming i
read the docs right
2011-07-01 18:42:49 -05:00
Brian S. Stephan a51f0cb54c Markov: refer to the actual target from a chatter target when shutting up 2011-07-01 18:42:04 -05:00
Brian S. Stephan 678350fe5d Markov: trivial change to allow for more advanced randomness later 2011-06-22 19:00:01 -05:00
Brian S. Stephan 7220025f0a Markov: randomly say something to a list of approved channels
check interval is every 10 minutes, rows in markov_chatter_target
have a 1 in chance chance of leading to a line being generated,
every 10 minutes. (so an interval of 144 = 10 min * 6 * 24 = one line
per day, on average)
2011-06-20 22:49:25 -05:00
Brian S. Stephan 1e87fe59d8 even more close connections from get_db() 2011-06-20 22:34:27 -05:00
Brian S. Stephan 152ef2a1ad Module: remove the timer stuff, since individual modules can do this better themselves
Markov, Twitter: switch to forking a thread ourselves, and check every
second whether or not to quit. this is the "better" part above, as
now we can instantly quit the thread rather than waiting for all
the timers to fire and expire
2011-06-20 21:18:55 -05:00
Brian S. Stephan df3de56c4c Markov: don't add chains if the context is null
that should only be possible on non-pub/privmsgs, or if there
is a [subcommand] being analyzed. in any event, don't learn it.
2011-06-16 21:25:22 -05:00
Brian S. Stephan a8031909b4 Markov: bite the bullet and make each markov chain automatically assigned a context (channel/query)
still kind of testing this, but i think it's easiest
2011-06-15 12:29:18 -05:00
Brian S. Stephan a0588869f3 Markov: add selecting by context, in order to segregate chains by channel
adding chains by context has existed for a while, this should allow for
querying for chains with null context or the current context. lightly
tested
2011-06-14 22:10:57 -05:00
Brian S. Stephan 57be7f8026 Markov: remove some cruft that is now obsolete 2011-06-14 21:08:01 -05:00
Brian S. Stephan 90be2d1855 Markov: trying a simpler form of shut up check 2011-05-03 22:13:49 -05:00
Brian S. Stephan 5e8e93beba Markov: clean up the whole "need to create our own db object" thing 2011-05-01 10:41:59 -05:00
Brian S. Stephan 03d0d6bc2d Markov: shut up if we've been too chatty in too short a period of time.
track all lines seen and all lines said by Markov. every 30 seconds,
if there have been more than 20 such lines, and Markov is responsible
for roughly half of them, then shut up for 30 seconds, because the
bot probably got stuck talking to another bot.

this should mean that such a reply infinite loop can't happen for
more than a minute.

i'm not entirely sure on the 30 sec/20 lines ratio. this may need
tuning.
2011-05-01 10:38:46 -05:00
Brian S. Stephan 7692d295f6 Markov: don't clobber existing database objects in the forked thread 2011-05-01 10:26:06 -05:00
Brian S. Stephan a73aec8ff0 Markov: remove debugging noise that snuck in via 42d414a0a4 2011-05-01 10:11:04 -05:00
Brian S. Stephan 1945637752 Markov: add support for chatter targets, channels we log messages to or randomly speak in 2011-05-01 10:05:37 -05:00
Brian S. Stephan 14f2a027fe Markov: preliminary support for the bot to conditionally shut it self up (and recover from that) 2011-04-30 15:43:59 -05:00
Brian S. Stephan 42d414a0a4 Markov: consolidate _reply_to_line and _reply into _generate_line 2011-04-30 15:37:16 -05:00
Brian S. Stephan 9ec73c4aa6 Markov: this is kind of embarrassing. remove a duplicate index. 2011-04-27 21:38:52 -05:00
Brian S. Stephan 6070ddc950 Markov: when looking up the start-of-sentence chain, get one random one
when finding a key for (__start1,__start2), instead of fetcihng all
(which can be a lot, in chatty channels and/or over time), get the
max ID in the table, pick a random ID between 1,max, and pick the
first id >= to it, and use that. just as random, nowhere near as
intensive.
2011-04-23 21:24:23 -05:00
Brian S. Stephan 6ef7865dba Markov: remove unused _get_chain_beginnings 2011-04-23 20:59:26 -05:00
Brian S. Stephan 7f922dd2c9 Markov: remove the 'starts' dictionary 2011-04-23 16:27:07 -05:00
Brian S. Stephan 116251398e Markov: index on markov_chain(k1,k2) 2011-04-23 16:25:01 -05:00
Brian S. Stephan 305625044a Markov: track the context of said lines
a context is a meta-classification ('banter, 'secrets', whatever)
based on targets (channels or nicknames). when a line is being
learned from a known target, the chains are placed in that context.

this is for allowing one brain to have multiple personalities, in
a sense, for large networks or cases where there may be a more
sanitized set of channels and a couple channels where everyone lets
it rip. a later enhancement would have sentence creation choose from
context-less chains (and contexts matching the current target), but
i need to go back to the drawing board on that one a bit.

ramble ramble ramble
2011-04-23 16:07:32 -05:00
Brian S. Stephan 5885983afd Markov: when learning lines, don't include the part direct addressing
e.g. if i say 'dr_botzo: hello dude', he only learns 'hello dude'.
this is mainly being done because the bot's name being in the brain
so many times was getting kind of silly, especially in channels that
have lots of conversations with the bot
2011-04-22 19:40:36 -05:00
Brian S. Stephan 5913a95165 Markov: append a stop if we have nothing to append from a chain
somehow a chain led us down a path where there are no values for
the keys in the chain. if that happens, just abort.

i'm not quite sure how this could happen
2011-03-17 17:24:11 -05:00
Brian S. Stephan 2b8f0d2843 Markov: don't crash when learning a sentence that's only whitespace 2011-03-14 13:14:56 -05:00
Brian S. Stephan 7a53aaa9a1 Markov: properly output unicode chains 2011-02-25 20:59:57 -06:00
Brian S. Stephan 87073d7fd3 Markov: cache the first word in markov chains
this eliminates the expensive database hit on every request for a line.
the cache is loaded when the module loads and learning new lines should
add the appropriate word to the list. seemed like a pretty good compromise
2011-02-24 21:06:29 -06:00
Brian S. Stephan 1712a7db53 Markov: use sqlite backend for brain
this keeps us from having the entire markov chain in memory and
having to do the pickling and so on. in many ways, this is a good
thing.

in one way, this is a bad thing. each line on irc will create a
__start1,__start2 item in the database, which means starting a
chain will be an expensive process. (approx 3 seconds, from irc
logs of 600,000 K lines). following selects run much faster, but
the first one is dog slow. a later commit should hopefully fix this.
2011-02-24 20:39:32 -06:00
Brian S. Stephan 2aa369add7 rewrite recursion/alias code for the 500th time.
more of a moving of the code, actually, it now exists in (an overridden)
_handle_event, so that recursions happen against irc events directly,
rather than an already partially interpreted object.

with this change, modules don't need to implement do() nor do we have a
need for the internal_bus, which was doing an additional walk of the
modules after the irc event was already handled and turned into text. now
the core event handler does the recursion scans.

to support this, we bring back the old replypath trick and use it again,
so we know when to send a privmsg reply and when to return text so that
it may be chained in recursion. this feels old hat by now, but if you
haven't been following along, you should really look at the diff.

that's the meat of the change. the rest is updating modules to use
self.reply() and reimplementing (un)register_handlers where appropriate
2011-02-17 01:08:45 -06:00
Brian S. Stephan 28f450ab5d Markov: improve min_size by implementing min_search_tries
if the end of a chain has been reached via __end, but min_size
has not been satisfied, discard the last couple elements in the
chain and try again. use min_search_tries so we don't do this
forever.
2011-01-25 20:42:52 -06:00
Brian S. Stephan 7b4b86dc0d Markov: add support for requesting desired min/max size of a reply
note that since the min_size support is kind of crude at the moment,
this only partially works
2011-01-25 20:25:15 -06:00
Brian S. Stephan c732466129 Merge branch 'master' of git.incorporeal.org:dr.botzo 2011-01-24 16:51:52 -06:00