1. Gmail: deleting big amount of old messages problem

    You have 3 million of emails in your gmail and cleaning of them doesn’t work? I had the same situation :)

    Some time ago, when I was testing one of our software, I pushed some hook to get know when user was making some action. Server was sending some email notifying me about that action made by user.
    I added the rule to gmail to filter such messages by subject, pass inbox and label it.
    Everything was debugged and went to production, but emailing was not removed from code :). And in some time we got a wave of users… I got a bunch of emails, but was not seeing that as label was somewhere in the list. And when it was detected I got 3 millions of email in that label.

    image

    Ok, what to do next – just delete it, lol
    Coming to that folder, clicking on Select All on page checkbox

    image

    And then:

    image

    Well,.. I was not happy in what I got on this :). Gmail said “Loading”, “Still working…”, “Oops… the system encountered a problem (#944) - Retrying in 1s…”, the same for 793 problem, etc.

    image

    Then my gtalk was disconnected, gmail in browser asked to reconnect, etc. Reconnect tries gave nothing.

    image

    Closed tab and tried to sign in from other tab into gmail. And it said me I have problems with account ( error 500), there is a technical problem that should be resolved soon, please try again later. It was looking like this, but not sure about Numeric code, as it was different few times: 93, 52, etc

    image

    Well,.. this “later” took few hours. I was trying to do it few times in other days, cause I was sure that Google is the power that could not be broken and this is temporary problem really :)

    The hope on google’s support died once I came to support site, I will not describe this much.. I think, you know what I mean. Well,.. There is a support page on google’s support site(https://support.google.com/mail/answer/116775?p=oops793&rd=1), and there is a link to contact form(https://support.google.com/mail/contact/gtag_server).

    image

    I was writing there, describing my problem and with asks to remove that shit,.. but the only one thing they were doing in few hours – returning ability to login again, but maybe it was done automatically by stoping deletion operation by some timeout, of course (and all messages were as they were before deletion process, so ask to remove them was ignored :) ).

    So, then I understood that problem could not be solved easily.. so decided to install some real mail client to work over IMAP to delete that messages, the task was becoming more interesting.
    Was trying few most popular applications, all of them were crashed or got probs in some time after adding account and starting the syncronization. For example, the hope on Thundermail died in 10 mins when it was completely hanged, started to eat a lot of memory and cpu without any ability to stop process, so the only taskmanager helped me to “solve the prob” with it.
    And what to do next? So if it is not possible to solve this by usual way, lets go directly to IMAP.
    The idea was to connect to IMAP server, get only IDs of messages and try to delete them one by one.. yeah it’ll take long, but why not? :)
    So, Python was used as programming language. imaplib lib was used as library to work with gmail server.

    import imaplib
    import time
    
    chunks_num_delete = 250
    
    def mail_connect(): #gmail credentials
        mail = imaplib.IMAP4_SSL('imap.gmail.com')
        mail.login('xxxxxxxxx@gmail.com', 'xxxxxxxxxxxxxxx')
        mail.select("callback")
        return mail
    
    
    def chunks(arr, num): #used to make a bulk request
        for i in xrange(0, len(arr), num):
            yield arr[i:i + num]
    

    To get the list of labels from server, try this:

    print mail.list()

    It is needed to do, as name of your label may differ from what you see in gmail, for other label I was playing with initially I had this situation(capital letters and blank symbols in label name).
    Now you need to connect to needed folder(label), by default it is Inbox, so please be careful :)
    My task was to do that for label called “callback”, so I set to connect to that folder and apply filter to find not deleted messages:

    mail = mail_connect()
    
    mail.select("callback") # connect to folder.
    
    result, data = mail.search(None, "UNDELETED")
    ids = data[0]
    id_list = ids.split()
    
    counter = 1
    length = len(id_list)
    print "Messages to delete: " + str(length)

    But in real life, you may need to delete messages before 2011 year or just from some abusing email or domain(Hotmail? :)), for instance. For all search commands over IMAP look on this link: http://www.example-code.com/csharp/imap-search-critera.asp
    Then processing goes through the list of identifiers of messages by such cycle with using the function of getting the list of IDs to be deleted on each iteration:

    for i in list(chunks(id_list, chunks_num_delete)):

    and running deleting by applying flags to all found messages, to delete message, you need to apply \DELETED flag to message:

    mail.store(",".join(i), '+FLAGS', '\\Deleted')

    and also, if you need just to move to trash folder – just do \Trash here..
    After marking messages, by IMAP specs you need to run:

    mail.expunge()

    I do this once in 10 operations to completely remove that messages, but as I read somewhere Gmail has own specification on server side, so it may ignore this, as gmail does this automatically in some time by some cleaner. Just added to the code this, but didn’t see any effect in:

    image

    Someone in stackoverflow was writing about enabling some option in Gmail Labs for IMAP protocol for your account, so this will work, but I didn’t check that.

    To not stop process on some disconnect (Google does this on so long operations like I had), this code was pushed into TRY block and with ability to reconnect on occurred problem (you’ll see in the full code block in the end of post)
    So, on my task it took ~8-10 hours to delete it completely. As for number of messages to be deleted at once,… tried 500, 1K and even 5K, gmails IMAP was accepting that, but on each run I detected that not all of requested messages are being marked as \DELETED,.. so experimentaly I went down to 250 messages on one flag apply. And now I have: “There are no conversations with this label.”.
    But still waiting for gmails cleaner to free like 3GB:

    image

    The full program to solve the prob:
    #!/usr/bin/python
    
    import imaplib
    import time
    
    chunks_num_delete = 250
    
    def mail_connect(): #gmail credentials
        mail = imaplib.IMAP4_SSL('imap.gmail.com')
        mail.login('xxxxxxxxx@gmail.com', 'xxxxxxxxxxxxxxx')
        mail.select("callback")
        return mail
    
    
    def chunks(arr, num): #used to make a bulk request
        for i in xrange(0, len(arr), num):
            yield arr[i:i + num]
    
    
    
    mail = mail_connect()
    
    mail.select("callback") # connect to folder.
    
    result, data = mail.search(None, "UNDELETED")
    ids = data[0]
    id_list = ids.split()
    
    counter = 1
    length = len(id_list)
    print "Messages to delete: " + str(length)
    for i in list(chunks(id_list, chunks_num_delete)):
        try:
            mail.store(",".join(i), '+FLAGS', '\\Deleted')
            print "another {} done".format(chunks_num_delete)
           # time.sleep(1)
        except:
            try:
                time.sleep(5)
                flag_var = 1
                while flag_var :
                    if (flag_var >100): exit()
                    print "trying to reconnect...\n"
                    try:
                        mail = mail_connect()
                        flag_var = 0
                    except:
                        flag_var +=1
                print "reconnected, trying to continue\n"
                mail.store(",".join(i), '+FLAGS', '\\Deleted')
            except(Exception):
                print Exception
                print "processed " + str(counter * 1000) + "/" + str(length) + "\n"
                print "Cant continue, died\n"
                exit()
        if counter % 10 == 0:
            mail.expunge()
            print "processed " + str(counter * chunks_num_delete) + "/" + str(length) + "\n"
    
        counter = counter + 1