AWS Lambda not importing LXML
I faced the same issue.
The link posted by Raphaël Braud was helpful and so was this one: https://nervous.io/python/aws/lambda/2016/02/17/scipy-pandas-lambda/
Using the two links I was able to successfully import lxml and other required packages. Here are the steps I followed:
- Launch an ec2 machine with Amazon Linux ami
Run the following script to accumulate dependencies:
set -e -o pipefail sudo yum -y upgrade sudo yum -y install gcc python-devel libxml2-devel libxslt-devel virtualenv ~/env && cd ~/env && source bin/activate pip install lxml for dir in lib64/python2.7/site-packages \ lib/python2.7/site-packages do if [ -d $dir ] ; then pushd $dir; zip -r ~/deps.zip .; popd fi done mkdir -p local/lib cp /usr/lib64/ #list of required .so files local/lib/ zip -r ~/deps.zip local/lib
Create handler and worker files as specified in the link. Sample file contents:
handler.py
import os
import subprocess
libdir = os.path.join(os.getcwd(), 'local', 'lib')
def handler(event, context):
command = 'LD_LIBRARY_PATH={} python worker.py '.format(libdir)
output = subprocess.check_output(command, shell=True)
print output
return
worker.py:
import lxml
def sample_function( input_string = None):
return "lxml import successful!"
if __name__ == "__main__":
result = sample_function()
print result
- Add handler and worker to zip file.
Here is how the structure of the zip file looks after the above steps:
deps
├── handler.py
├── worker.py
├── local
│ └── lib
│ ├── libanl.so
│ ├── libBrokenLocale.so
| ....
├── lxml
│ ├── builder.py
│ ├── builder.pyc
| ....
├── <other python packages>
- Make sure you specify the correct handler name while creating the lambda function. In the above example, it would be- "handler.handler"
Hope this helps!
I have solved this using the serverless framework and its built-in Docker feature.
Requirement: You have an AWS profile in your .aws folder that can be accessed.
First, install the serverless framework as described here. You can then create a configuration file using the command serverless create --template aws-python3 --name my-lambda
. It will create a serverless.yml file and a handler.py with a simple "hello" function. You can check if that works with a sls deploy
. If that works, serverless is ready to be worked with.
Next, we'll need an additional plugin named "serverless-python-requirements" for bundling Python requirements. You can install it via sls plugin install --name serverless-python-requirements
.
This plugin is where all the magic happens that we need to solve the missing lxml package. In the custom->pythonRequirements section you simply have to add the dockerizePip: non-linux
property. Your serverless.yml file could look something like this:
service: producthunt-crawler
provider:
name: aws
runtime: python3.8
functions:
hello:
# some handler that imports lxml
handler: handler.hello
plugins:
- serverless-python-requirements
custom:
pythonRequirements:
fileName: requirements.txt
dockerizePip: non-linux
# Omits tests, __pycache__, *.pyc etc from dependencies
slim: true
This will run the bundling of python requirements inside a pre-configured docker container. After this, you can run sls deploy
to see the magic happen and then sls invoke -f my_function
to check that it works.
When you've used serverless to deploy and add the dockerizePip: non-linux
option later, make sure to clean up your already built requirements with sls requirements clean
. Otherwise, it just uses the already built stuff.
Extending on these answers, I found the following to work well.
The punchline here is having python compile lxml with static libs, and installing in the current directory rather than site-packages.
It also means you can write your python code as usual, without need for a distinct worker.py or fiddling with LD_LIBRARY_PATH
sudo yum groupinstall 'Development Tools'
sudo yum -y install python36-devel python36-pip
sudo ln -s /usr/bin/pip-3.6 /usr/bin/pip3
mkdir lambda && cd lambda
STATIC_DEPS=true pip3 install -t . lxml
zip -r ~/deps.zip *
to take it to the next level, use serverless and docker to handle everything. here is a blog post demonstrating this: https://serverless.com/blog/serverless-python-packaging/