2019-04-08 20:18:48 (edited by keithwipf1 2019-04-08 20:19:50)

Hi,
I figured someone should create a thread where we can discuss useful Python modules, for making games or anything else.
So I made a pack file module, like in BGT.
So far it works nicely. It can add and extract files. It also encrypts the pack data with a key, so that people don't get a hold of it. It uses zlib to compress the data, because it was easy to add and it might help with the size a bit.
The full documentation and the code is at:
http://www.noplacelikehere.net/pythonstuff
The documentation is right below the download link, so don't miss it.
The download link is really, really easy to find, it's the only link on the page besides the navigation bar, big_smile
Please tell me how useful or otherwise this pack file is and if it should do something differently.

2019-04-09 00:18:48

See, that is something I don’t understand. If I encrypt my sounds, how can I play them without extracting them to a file directory?  If I can just access the folder, why can’t the normal user not do that?

2019-04-09 05:47:18 (edited by Ethin 2019-04-09 06:23:48)

@2, if your not going through an intermediary like this library, pretty much every crypto library offers a way to decrypt encrypted data streams into RAM without dumping stuff on the disk. It won't stop someone from freezing the execution of the python interpreter and extracting the key but it will stop the casual FS snooper who just wants your decrypted sounds and knows where the game puts them. That is, of course, depending on if the person has the know-how to figure out where the game extracts its sounds to.
Looking through this code, I've noticed several issues with the code already:
* You use a hardcoded nonce! Don't do this! Ever! Pull from os.urandom()!
* Not really sure why you use a hardcoded zlib compression level... definitely not major, but it takes control away from the user. (Don't let the user give you a nonce; assume that all your users are stupid idiots, even if they aren't.)
* You call print() in get_closest_offset(). What exactly are you trying to accomplish with that? You do know that the value of num and valueffaf will be printed side-by-side to sys.stdout, right?
* You have an add() method. Why? This is entirely unnecessary here, since add_file() does what your intending, so your imposing an unnecessary amount of overhead. Furthermore, you check file.__class__.__name__. Keep in mind that when doing type checks, you have type(), which I'd say is (probably) far more reliable than checking file.__class__.__name__.
* Once again, you use a hardcoded key and attempt to mask it in awhile loop. This is not going to work and is inherently dangerous. (As some might say, this is suicidal.) Given the fact that the key and nonce are constant, you are completely and utterly nullifying the security that AES would normally give you. You are not giving any form of security whatsoever doing this because your giving the attacker the keys to the kingdom.
* Why do you hash the key? What is that going to do? It does not provide any form of advantage, see my earlier point.
* You, once again, hardcode the nonce in the AES.new() call directly. So now you have 2 (or more) copies of the nonce floating around. That's a treasure trove for an attacker.
* You use EAX mode. Why not GCM or ChaCha20, both of which are TLS cipher suites? I hesitate to throw an accusation out at the developers of cryptodome, but is the EAX algorithm they use EAX or EAXprime? EAXprime has been broken (quite nastily, too).
* You use pickle more than once, and I'm confused on exactly what your attempting to do. Mind explaining that?
Overall, most of the decisions that you made during the development of this library were *very bad* ones. Some suggestions:
1. Stop using hardcoded data in cryptography. Avoid that at all costs. Never hardcode a nonce or cryptographic key, and never try hiding it -- an attacker (even a novice) will figure you out very quickly. The way you've used these algorithms is worse than even ECB (and that's saying something)!
2. (Perhaps) offer the user control over the compression level.
3. Remove all of these debug print statements. Perhaps, even, offer an interface that compresses the files into an archive and then encrypt the archive itself, rather than the files within it, and let the user extract the files into RAM instead of dumping them to disk?
4. Remove add(). Its entirely useless.
5. Remove the key hashing. This offers no form of security whatsoever (and given that you've hardcoded the key, the hash will *always* be the same, which would make it easy for an attacker to fool the system while giving it malicious data).
6. Switch to CBC, GCM, or ChaCha20/XChaCha20.
7. You don't have to do this, but try and avoid storing the key in unencrypted form. One way (if you ever use AWS SDKs) is to generate a data key per each encrypt, store the encrypted key alongside the nonce and encrypted data, and retrieve the key from AWS KMS when you need it. This will stop an attacker from retrieving the key from disassembly or hex dumps since it won't be hardcoded into your app.
Oh, and one last thing... before you go writing crypto libs, or interfaces to crypto libraries, please, please do your research on the crypto algorithms your using and best practices for securely storing keys both application-wise and file-wise. And do as much research on general cryptography as you can. Don't try and throw together wrappers like his without knowing what your doing; you can very easily give a user a false sense of security, and once a user learns the critical vulnerabilities that make their security null and void, the likelihood of them trusting you ever again with anything they consider sensitive is pretty much zero. I'd even go as far as to avoid using low-level crypto libraries like pycryptodome until you understand cryptography at at leas an intermediate level. Use higher-level interfaces that are known to be secure until then. There is a reason the cryptography library has a warning telling users not to use the low-level APIs until they know what their doing, after all.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-04-09 18:22:18 (edited by keithwipf1 2019-04-09 18:26:42)

Hi,
Thanks for that, sad, and sorry it's a bit of a mess.
First, the reason I hashed the key was so that you can use a key of any length.
Pickle is used to store data that tells the pack file how to retrieve files that are stored in the pack.
The key is to my knowledge *not* hardcoded. It takes a key from the user, turns it into bytes, hashes it so you can use a key of any length, and then it uses that hash as the key.
I did just make some changes, redownload from post 1:
Add_file() removed, use add() instead
Nonce now random, not really stored in the code it's self.
You can pass an optional parameter, level, to the constructor. It's used when compressing the data the higher the smaller data and the longer the time.
Removed all print statements that were there for testing, whoops!
Removed delete, it's broken and wasn't listed in the docs anyway. It's supposed to remove a file, but it removes you're ability to read data from the pack file instead.
By the way, you can index a pack file instance like a dictionary to get a raw bytes object that holds the data of a file, if you don't want to extract it.
I'm not sure what'd you could do with this to play a sound. You'd need to find a sound lib that takes stuff straight from memory or add that you're self.


I hope the pack now works better!

2019-04-09 19:43:19 (edited by Ethin 2019-04-09 19:49:42)

Your nonce is still hardcoded. Line 9.
Line 11: change this to self.level = level, so that the alteration of the level changes.
I'm not sure what your doing in generate(), with the following lines of code:

  requiredlength=range(len(data), len(data)+16, 16)[-1]
  while len(data)<requiredlength:
   data+=b'a'

Also, hashing the key does not ensure that the user can pass in arbitrary key lengths. The hash may be too large for the cipher your using, for example. SHA-256 is good but still...
AES allows keys up to 128, 192, and 256 bits, which is 16, 24, and 32 bytes. I'm not sure how hashing the key is any more secure than just passing the key as plaintext.
Also, don't delete the nonce. The nonce is a number used once. Its supposed to be public. People are supposed to know about it. The key is the one you want deleted. Even so, Python offers no safe way to zero memory, so this library still leaks data, and would even if you had patched every security vulnerability it had purely because its not written in C/C++.
Second, lines 91-98: searching for the nonce like this isn't the wisest idea since you don't know if pycryptodome has already added the nonce or not. It could, though this level most likely doesn't. You also risk getting odd data in there. Generally how I do it is I write the nonce in the file first, then append the encrypted data to the end of the nonce. So this is how it works for decryption:
* Read() call: read the first 24 bytes into memory; this is the nonce. (The encryption library I use uses a 24-byte nonce.)
* read() call: read the next 32 bytes into memory; this is the encrypted key. (This is optional but recommended if you can figure out a solution for secure storage.)
* Send the encrypted key (base-64 encoded, possibly) to a remote service for decryption and get back the plaintext.
* Initialize the cryptographic context with the retrieved nonce and key. Immediately zero-out the key.
* Begin reading the file in 'blocks', with each 'block' being 8,192 bytes in length. Decrypt each chunk and do with it what you like.
* Repeatedly read until no more data is available. Stop just before the last 16 bytes end of the file.
* Read these 16 bytes. This is your message authentication code (MAC). Verify it. Generally you do this by reading that 16-byte stream into a buffer and passing that buffer  to the context finalization function.
And this is how it works for encryption:
* Set up an AWS KMS client and provide AWS credentials. (Replace this with a service you trust, I'd go with AWS since their well-known for their integrity and security.)
* Send two remote requests to AWS. Verify that each one completes. The first is called a generate data key request. The request contains the following elements:
- the Key ARN, which is in the form arn:aws:kms:region:account_id:key/key_id, where region is a region like us-east-1, us-east-1a, etc., account_id is the ID for your account, and key_id is a GUID identifying your key.
- The length of the data key, which is usually 32-bytes for AES-256.
You will get back:
- The encrypted key. Keep this around for now in secure memory.
- The plaintext copy of the key. Again, keep this around in secure memory; you'll need it soon.
The second request is called a generate random request. This will get you your nonce. This only has one required field, the number of bytes. Set this to 24. Get back your nonce, and get the plaintext from the buffer. Now is probably a good time to add run-time assertions that your plaintext key is, in fact, 32 bytes, and that your nonce is, in fact, 24 bytes.
* Set up your buffers. One is your input, one is your output, and one is your MAC.
* Set up a cryptographic context and initialize it with your nonce and key. Just as with decryption, immediately wipe the key from RAM. (The context will manage its own safe deallocation, but you need to manage yours too.)
* Open your input file and attempt to read from it. Verify that this is indeed possible.
* Write your nonce, then your encrypted key, to the output file. Don't use any deliminator.
* Then, begin reading and encrypting the file in chunks. If writing to disk, immediately write to the output buffer (which should be a file output stream) the encrypted data -- you won't be able to after the next steps.
* Verify that your not at the end of the input stream. If the stream is bad, finalize the crypto stream, clean up and terminate; this means an error occurred. Do not write the MAC!
* No matter whether your done or not, immediately zero out the input buffer so you don't keep using up RAM.
* One your done reading the input, finalize the context, write the MAC, and close the input and output streams. Then clean up AWS and all that.
You don't have to do it this way. But this is a very safe way of storing data that may be potentially sensitive.
That's all I can recommend; I think Cartertemm might have more advice. Perhaps I've gotten some details wrong, perhaps my method is incorrect. Feel free to educate me too -- just telling you how I generally do things!

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-04-09 20:08:37 (edited by Ethin 2019-04-09 20:16:15)

Update: the above advice works. Avoid storing your keys in your app and all that. But it has one major drawback: you need to be on the internet!
So, what if you want your app to be used in an offline setting? Well, that's generally what most apps let you do. So how do you encrypt and decrypt data in that way? There is no "right" way to answer this issue. One source (https://medium.com/poka-techblog/the-be … 8a6807d3ed) suggests storing secrets in environment variables. That works... but isn't very safe.
But there is one problem of storing your key in your code, in environment variables, in a database, or in an encrypted form in your code: they all rely on the same variable -- that a server must decrypt the data. But a server can leak information, especially if it has backups! (Logic would suggest not... right? After all, backups are supposed to be good things. Right? Well, that depends on *who* is doing the backups, and whether *they* are trustworthy.)
Sadly, if your app is used offline, you don't have very many options open to you. One way to alleviate the headache with crypto is to use cryptography's Fernet interface. This forces you to go against what I've already told youand to store your key in your code. But it will encrypt your data, and you can write it, and you won't need to do any manual reads. It'll save you a lot of time. It won't be safe, since the key is i your code and Python is generally bad at secure memory management. But it does work. (If someone has some kind of solution to this problem -- making a crypto system that works both online and offline -- please let us all know!) Sometimes you just have to suck it up and deal with the fact that the key is in your code. Sadly with offline apps, this is inevitable. You can do some messing with it and encode it in various ways and make tat your key, but people will still get at it. Or, even better, you can cythonize yourencryption/decryption work so its not in Python, which should make things a bit more safe for you.
Edit: This article provides some very interesting solutions for generating keys -- generate them on the fly, don't hardcode them. I wonder if PUFs can be implemented in Python...

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-04-09 21:19:51

Hi,
Thanks for once again helping.
Line 9. That's not actually used anywhere, I figured it would maybe fool a newbie who is just looking for easy sounds into thinking they've got the nonce, big_smile
Line 11. Oops, easy fix.
In generate, that code is trying to make sure that the resulting data is a multiple of 16 bytes long so that I can pass it to encrypt.
Though, if it's a stream I probably don't need that code but the extra characters are ignored when loading the pack so it doesn't matter.
Using a hash is probably 5% more secure than just passing the key, at least someone will have to hash the key before decrypting. Anyway, it's better than just adding a character to the end of the key until it's the right length, because that might make brute forcing more effective if someone can see the source for this script.


When looking for the Nonce, I read backward 16 characters from the end of a file.
Since I called the os.urandom() or what ever it is with 16, as long as it was put at the end of the file, finding the Nonce is reliable as long as no one puts data in there and ruins there pack file in the process.
If I do add code to auto generate a key, as long as it's not anything more than user convenience, how would I make sure the right person is decrypting the key?
Internet access, just, isn't reliable. You may have none, or you may be so filtered that you can't even access a service (unlikely) or it could take for ever and everyone might think that you're program is really slow and then you must deal with HTTP libs, and all that fun stuff.
Maybe, just a silly idea maybe, you should do things multithreaded? And throw the key around? Or do something so that if one process pauses the others will wipe the key and die?
Just an idea.

2019-04-09 23:01:41

keithwipf1 wrote:

If I do add code to auto generate a key, as long as it's not anything more than user convenience, how would I make sure the right person is decrypting the key?
Internet access, just, isn't reliable. You may have none, or you may be so filtered that you can't even access a service (unlikely) or it could take for ever and everyone might think that you're program is really slow and then you must deal with HTTP libs, and all that fun stuff.
Maybe, just a silly idea maybe, you should do things multithreaded? And throw the key around? Or do something so that if one process pauses the others will wipe the key and die?

First: "If I do add code to auto generate a key, as long as it's not anything more than user convenience, how would I make sure the right person is decrypting the key?" You don't. That's the answer. You cannot determine who is decrypting your file and who is not without adding in internal tracking and all of that, and you'd need a remote service, and even that wouldn't be reliable. If you want that kind of control, you'll need something far more complex than simply encryption, something like ED DSA signing.
Second: "Internet access, just, isn't reliable. You may have none, or you may be so filtered that you can't even access a service (unlikely) or it could take for ever and everyone might think that you're program is really slow and then you must deal with HTTP libs, and all that fun stuff." This is not true. AWS, for example, is well known for being very fast and responsive (does the fact that governments rely on AWS tell you something?). Furthermore, AWS has a Python SDK to interact with their APIs. They've got SDKs in many languages, in fact, that take care of the complex plumming for you. Its called botocore and boto3. Both of which you'll need. I'll provide an example after I'm done with this part. Plus, HTTP requests that involve small things such as retrieving crypto keys usually only take a few seconds, if not less than that.
Third: "Maybe, just a silly idea maybe, you should do things multithreaded? And throw the key around? Or do something so that if one process pauses the others will wipe the key and die?" No. Just no. Not unless you want to master IPC and (somehow) securely pass the key between threads without causing data races, deadlocks, and so on. GCM is already parallelized -- there is absolutely no need, whatsoever, for you to attempt to parallelize it any more than it already is. Keep in mind that the more threads and/or processes that access sensitive data like crypto keys, the less secure the overall process environment is, purely because you are passing that sensitive data all over the place. That leaves traces everywhere, and for a language like Python... yeah, not a good idea.
Now, for a sample of AWS SDKs...

import boto3
import botocore
kms_client = boto3.Session(aws_access_key_id="key_id", aws_secret_access_key="access_key", region_name="region").client("kms")
# get a data key. Requires you to create a customer master key (CMK)
# either through the AWS console or through the API.
data_key = kms_client.generate_data_key(KeyId="key_id", NumberOfBytes=32)
assert len(data_key["Plaintext"])==32
# Get us a random nonce
nonce = kms_client.generate_random(NumberOfBytes=24)["Plaintext"]
assert len(nonce)==24
# replace write() with your actual file IO routines
write(nonce, 24)
write(data_key["CiphertextBlob"], len(data_key["CiphertextBlob"]))
# read data...
"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-04-10 17:36:27

Hi and thanks for that example.
I don't think it is really worth making this online compatible for two reasons.
1 Even if I do all that work, it all comes back to nothing if the user has no internet access. Then it might just throw an exception, or quit, or decrypt the file insecurely.
2, They just *might* be able to capture packets if they were determined enough. It might be possible to decrypt them, and then they have a key.
Also, I'm curious. What's to stop someone from freezing the Python program and pulling out the key, AWS or not, right as you're passing it to the decrypt function?
Also, do you think it's a good idea to scatter bytes of the key or other required data in certain locations?

2019-04-10 17:47:34

Your points are very reasonable ones.
"2, They just *might* be able to capture packets if they were determined enough. It might be possible to decrypt them, and then they have a key." Not necessarily true. Given that AWS requires taht you use HTTPS and TLS, the likelihood of someone being able to extract anything useful out of the data stream between the client and server is just as likely as someone managing to break AES-256 in GCM mode (or ChaCha20+poly1305) within the next hour of this post being written, with all of the cryptographic parameters set to their most secure settings.
"Also, I'm curious. What's to stop someone from freezing the Python program and pulling out the key, AWS or not, right as you're passing it to the decrypt function?" Nothing. This is one of those instances where timing is the key though. Someone would need to know:
* how long it takes the key to be retrieved; and
* the exact time -- down to the nanosecond -- when the key is in memory but has not been wiped and just before it is passed into the initialization function.
You could also debug the interpeter or the code using PDB, but that's assuming that you call breakpoint() somewhere. There are other factors I'm forgetting though.
"Also, do you think it's a good idea to scatter bytes of the key or other required data in certain locations?" In what way? Usually I would say no, purely because it increases the attack surface, which in return means you have more regions to protect.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-04-10 20:12:08

I was thinking, scatter bytes of a key through a file, and return a string or bytes that contains information on how to retrieve the key to the user.
Then, the user passes back this string, which is used to retrieve the key from the data.
Another idea, if you have the key in memory, it might be possible to protect it a little bit by turning it into an alternating string of valid key bytes and random bytes.
I have never gotten anywhere at all with testing debuggers on anything, so I don't know much at all about how it works.

2019-04-11 16:32:31

Oh, by the way, I was thinking of coming up with a wrapper or something for networking, and I found stockings on Git Hub.
It says it can send and receive complete messages, which sounds super nice, given what the how to on sockets says.
Am I on the right track?

2019-04-12 16:57:02

I believe so

best regards
never give up on what ever you are doing.

2019-04-19 10:02:00

I don't want to bring up old postas but i can't download the file, chrome tells me there is a server problem

best regards
never give up on what ever you are doing.

2019-04-25 17:02:09

Hi
That was a handler error.
It looks fixed now.
I plan to cythonize the extension, that could take a day or two though because I don't have a computer with the compiler here now.

2019-05-09 22:14:11

Hi guys,
I updated the pack file, with an experimental function to load .pyc files from inside the pack its self, and another function, 'get_bytes()', which returns the decrypted decompressed bytes of a given file inside the pack.
I also updated the docks at Here
Enjoy!
There might be an error though and if so please send me the info.

2019-05-11 13:39:21

Are there examples?

best regards
never give up on what ever you are doing.

2019-05-13 16:58:06 (edited by keithwipf1 2019-05-13 17:10:40)

No, but it's not very hard.
I'll put some up on the web page now.
Edit: I put up 2 commented examples on =here