Introduction

Machine learning is used extensively throughout the world. From identifying potential diseases through identifying factors that put people at risk to counting endangered species via satellite images. But by far the most widespread use, at least in recent months, has been Large Language Models, or LLMs, with OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude to only name a few.

These models are constantly in competition with producing the best results, more efficiently, and the leaderboards are shifting constantly. Take a look at this LLM Leaderboard comparison from Vellum—https://www.vellum.ai/llm-leaderboard.

As a casual subscriber to any of the available LLMs, you’ll find it impossible to find the one and will most likely find yourself looking to the platform that provides the features you like best. Be it GPT’s voice discussions, Anthropic’s integrations with Google and Asana, or Groq’s “no guard rails” approach to image generation.

The platform providers are not ignorant of this fact and are actively trying to keep your business by providing something they hope you can’t go without.

I’ve grown quite bored with the integrations and have rather looked to building my own, personalised, platform. My hope is that given my fluctuating uses of ChatGPT’s platform, I can save myself the $20 monthly fee and instead invest that in API access. Before I kicked off building my own solution from scratch, I thought I’d check to see what’s already available out there, and in my search I came across CrewAI (https://www.crewai.com/).

I have to admit, I’ve got quite excited by CrewAI’s library, so much so that I feel like a bit of an evangelist. I should mention that I am in no way affiliated nor have been paid or asked to write this post by them.

What is CrewAI?

Visiting their site, you’ll be immediately concerned with the enterprise solution thrown at you—ignore that! CrewAI provides an open source solution to us individuals.

CrewAI provides a multi-agent platform that allows you to build and streamline workflows using any LLM. It’s that simple!

This means, using API integrations, we can ask GPT to do some research for us, then take the results of that research and pass it to Claude to further refine and extract technical documentation for us. We can automate this task a have it run on a schedule or simply do it once off.

But it’s not just GPT. CrewAI supports many different LLMs, OpenAI, Google, Mistral, Groq, you name it. That’s what makes it so powerful.

Let’s jump into a simple example “Crew” to get us started.

What you’ll need

uv is recommended for dependency management. (https://docs.astral.sh/uv/getting-started/installation/).
Python version >=3.10 and <3.14 (this can be taken care of using uv).
Access to an LLM API (I’m going the OpenAI route) as well as an API key.

OpenAI API key

If you’re following along with me, head over to https://platform.openai.com/ and sign yourself in (you can use your regular login details for ChatGPT, if you don’t have that, you’ll have to sign up for new credentials).

Once you’re logged in, head over to API keys on the left-hand navbar and create a new secret key.

Image of new secret key modal

Owned by: Don’t fret with service accounts right now, you’ll be the owner of this.
Name: This is optional, name it whatever you want, it’s simply so you can recognise it in the list of your API keys.
Project: Default project is fine, unless if you want more fine-grained control.
Permissions: Keep this to All to allow unfettered access.

On the next screen you will be shown the API key—copy this and keep it somewhere safe for now, you will need it later, and after closing the modal you will not be able to see that API key again after closing this dialogue and will need to create a new one if you lose it.

It’s also good advice to never share your API key or forget it in a repo that you then check into a public repository.

Billing

Usage of these API keys is not free, but with that being said, it’s also not bank-breaking. What I appreciate most is that you can simply add a small amount of credit to your balance, and if something goes horribly wrong, that’s the most you’re out of pocket. If your balance runs out, your requests simply stop working. This helps prevent those unexpected $5000 bills you get from leaving and Azure service running overnight by accident.

To get setup, follow these steps:

Click on your profile icon in the top right corner.
Click on “Billing” on the left-hand navbar.
Finally, click on “Add to credit balance”. $5 will do for now and will go a long way.
Follow the prompts to get your balance added.

Using GPT-4.1, you get ~1 million tokens for $2. That’s hard to put into an exact request size, but the rule of thumb is that 100 tokens translate to roughly 75 words. This is all dependent on lenght of the input, output, and even the model’s specific context window.

We’re all set up to carry on our merry way to having our first Crew setup.

Installing CrewAI

The installation docs recommend going the uv route, and I do too. In short the following steps should get you up and running.

Install CrewAI as follows:

uv tool install crewai

It will ask you some basic questions, don’t be too nervous answering, it can all be changed later on. If you’re going the OpenAI route, I’ll show you how to get

To confirm the installation went well, run:

uv tool list

Create your first Crew

To create your first crew is incredibly simple. The below command will create the project structure with boilerplate and (mostly) working crew.

crewai create crew test_project

Now navigate to the project directory with cd test_project or open it in your favourite editor, and you’ll be presented with the following structure:


-knowledge
--user_preference.txt - In here you can put information about yourself.
-src
--test_project
---config
----agents.yml - In here we define our agents (https://docs.crewai.com/concepts/agents#yaml-configuration-recommended).
----tasks.yml - In here we define the tasks we'll give to our agents (https://docs.crewai.com/concepts/tasks#yaml-configuration-recommended).
---tools - In here you can create your own custom tools for your agents to use (https://docs.crewai.com/concepts/agents#agent-tools).
----__init__.py
----custom_tool.py
---__init__.py
---crew.py 
---main.py
-.env
-.gitignore
-pyproject.toml
-README.md

Let’s dive into the key files next.

Agents.yaml

In agents.yaml you’ll see two agents defined as researcher and reporting_analyst. The default implementation is fairly straight-forward, retrieving the latest on LLM news. Let’s change it up a bit, let’s create our own security research and advisory team.

researcher:
  role: >
    {operating_system} Senior Vulnerability Researcher    
  goal: >
    Uncover latest software vulnerabilities {operating_system}    
  backstory: >
    You're a seasoned vulnerability researcher with a knack for uncovering the latest
    vulnerabilities on {operating_system}. Known for your ability to find the most relevant
    information and present it in a clear and concise manner.    

reporting_analyst:
  role: >
    {operating_system} Vulnerability Reporting Analyst    
  goal: >
    Create summarised reports on steps to take to cover vulnerabilities based 
    on {operating_system} vulnerability analysis and research findings    
  backstory: >
    You're a meticulous analyst with a keen eye for detail. You're known for
    your ability to turn complex data into clear and concise reports, making
    it easy for others to understand and act on the information you provide.

A small change, but it shows the power of this platform with the smallest adjustments.

What we’ve defined for each agent is a role, a goal, and a backstory. Right now this is still basic stuff that can be achieved through your regular LLM platforms with a bit of prompt engineering. In prompt engineering this would be where you describe to the agent what sort of hat they should be wearing.

I’ve changed the topic variable to operating_system to better align with our goals.

Tasks.yaml

In tasks.yaml there will be 2 tasks, namely research_task, and reporting_task.

Let’s change it up a bit to better address our need for a vulnerability report:

research_task:
  description: >
    Conduct a thorough research about vulnerabilities on {operating_system}
    Make sure you find any interesting and relevant information given
    the current year is {current_year}.    
  expected_output: >
    A list with 10 bullet points of the most relevant vulnerabilities on {operating_system}    
  agent: researcher

reporting_task:
  description: >
    Review the context you got and expand supply solutions to protect against vulnerabilities into a full section for a report.
    Make sure the report is summarised and contains succinct and relevant information.    
  expected_output: >
    A fully fledged report with the main vulnerabilities, each with steps to protect against it.
    Formatted as markdown without '```'    
  agent: reporting_analyst

As in agents.yaml I’ve kept the naming of the tasks the same so we don’t have to do too much replacement later down the line. This is apart from tagain replacing the topic variable with operating_system.

Each Task has a description, expected_output, and agent field.

The description is there to give the agent some context. Expected output helps filter and define your output, a powerful aspect of prompt engineering. Finally, an agent simply ensures that the wrong agent doesn’t do the task for us.

Crew.py

In crew.py we get to a very simple coding example. Notice quite a few decorators which tell CrewAI what you’re defining.

@agent:

    @agent
    def researcher(self) -> Agent:
        return Agent(
            config=self.agents_config['researcher'], # type: ignore[index]
            verbose=True
        )

The @agent decorator tells CrewAI to add this to our agents list. We then define our researcher from the agents_config which refers to agents.yaml. verbose does what it says on the box.

@task:

    @task
    def research_task(self) -> Task:
        return Task(
            config=self.tasks_config['research_task'], # type: ignore[index]
        )

    @task
    def reporting_task(self) -> Task:
        return Task(
            config=self.tasks_config['reporting_task'], # type: ignore[index]
            output_file='report.md'
        )

The @task decorator tells CrewAI to add this to our tasks list. Our tasks are defined by our config in tasks.yaml and for our reporting_task we define an output_file. The output will be given in the terminal as well, but it’s nice to have it handy in a Markdown file for easier perusal.

Finally, @crew:

    @crew
    def crew(self) -> Crew:
        """Creates the TestProject crew"""
        # To learn how to add knowledge sources to your crew, check out the documentation:
        # https://docs.crewai.com/concepts/knowledge#what-is-knowledge

        return Crew(
            agents=self.agents, # Automatically created by the @agent decorator
            tasks=self.tasks, # Automatically created by the @task decorator
            process=Process.sequential,
            verbose=True,
            # process=Process.hierarchical, # In case you wanna use that instead https://docs.crewai.com/how-to/Hierarchical/
        )

The @crew decorator let’s CrewAI know that this is where we define our entire crew, along with their tasks. Importantly process=Process.sequential tells CrewAI to execute the tasks in succession. CrewAI is task-driven, after all.

Great, our Crew is set up, let’s put it to work.

Main.py

In main.py we see more advanced stuff. Thing is, you can use CrewAI to train your own models if you are so inclined. We’re not concerned with that right now, so we’re just looking at the below bit of code:

def run():
    """
    Run the crew.
    """
    inputs = {
        'operating_system': 'Windows 11',
        'current_year': str(datetime.now().year)
    }
    
    try:
        TestProject().crew().kickoff(inputs=inputs)
    except Exception as e:
        raise Exception(f"An error occurred while running the crew: {e}")

This is the method that runs when we fire off CrewAI. We define our inputs operating_system and current_year and then we tell our (already defined) crew to kickoff. Simple as that.

But, before we can kick things off, you might have wondered why we haven’t entered your API key anywhere yet. Let’s do that first.

.env

Before we kick off our Crew, let’s define our environment variables. Open up .env and make sure it looks like below:

MODEL=gpt-4
OPENAI_API_KEY=sk-proj-yourkeygoeshere

Replacing sk-proj-yourkeygoeshere with your own API key that you created earlier.

Once that’s done and saved, we’re ready to see some results.

Kicking things off

First, we just want to make sure our dependencies are installed and locked:

crewai install

Then, we’re ready to rock:

crewai run

Depending on varying factors (number of tasks, complexity, models being used), this could take a while but hang in there, and you’ll start seeing results posted to your terminal in no time.

Eventually you’ll see something like this:

🚀 Crew: crew
└── 📋 Task: 2c7510f3-a384-4ea5-91a6-1106c6957bba
    Assigned to: Windows 11 Senior Vulnerability Researcher

    Status: ✅ Completed
╭────────────────────────────────────────────────── Task Completion ───────────────────────────────────────────────────╮
│                                                                                                                      │
│  Task Completed                                                                                                      │
│  Name: 2c7510f3-a384-4ea5-91a6-1106c6957bba                                                                          │
│  Agent: Windows 11 Senior Vulnerability Researcher                                                                   │
│                                                                                                                      │
│                                                                                                                      │
│                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

That means an agent has completed their task, and we’re moving on to the next task. Then eventually, you’ll see:

╭────────────────────────────────────────────────── Crew Completion ───────────────────────────────────────────────────╮
│                                                                                                                      │
│  Crew Execution Completed                                                                                            │
│  Name: crew                                                                                                          │
│  ID: 80f322e7-b772-423f-a90c-08947d9e9ad2                                                                            │
│                                                                                                                      │
│                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Done! You can scroll through your terminal to see the actual output, or you can simply open report.md and you’ll see the output:

---

# Windows 11 Vulnerability Assessment and Protection Measures

---

1. **Buffer Overflow Vulnerability in Windows 11 Kernel**
    
    *Protection Measure*: Be proactive with patches and system updates conferred by Microsoft to fix these kernel issues. Also, use Firewall and other security tools to identify and block suspicious activities.

2. **Insecure Permissions in Windows Registry Keys**
    
    *Protection Measure*: Regularly audit and modify registry keys' permissions and adhere to the principle of least privilege (PoLP). Limit user access to prevent unauthorized alteration.

3. **Remote Code Execution in Microsoft Edge**
   
    *Protection Measure*: Keep your Internet browser updated with the latest security fixes. Also, educate users about safe browsing and the risks of visiting untrusted sites.

4. **SMB Ghost Vulnerability**
    
    *Protection Measure*: Deactivate SMBv1 and apply patches as released by Microsoft. Utilize VPN or dedicated lines for sharing as an additional layer of security.

5. **Windows 11 Hyper-V Escape Vulnerability**

    *Protection Measure*: Apply host machine security updates and patches as soon as they're released. Isolate guest machines and limit their access to the host machine wherever possible.

6. **NTLM Relay Attack**
   
    *Protection Measure*: Implement Server Signing and EPA (Extended Protection for Authentication) and deactivate the NTLM wherever possible to migrate to a more secure protocol like Kerberos.

7. **DLL Hijacking Vulnerability in Win32k**
    
    *Protection Measure*: Monitor and control the DLLs that can be loaded by high-privilege processes. Regularly update software to mitigate the risk of these types of attacks.

8. **Denial of Service in Windows 11 TCP/IP Stack**
   
    *Protection Measure*: Implement rate limiting and install intrusion detection systems (IDSs) to help prevent DoS attacks. Use network segmentation to protect other parts of your network if one area falls victim.

9. **Information Disclosure through Microsoft Cortana**
   
    *Protection Measure*: Configure privacy settings to limit Cortana's access to personal data. Regularly check Microsoft's privacy policy updates and modify settings as needed.

10. **Privilege Escalation through Windows 11 Task Scheduler**

    *Protection Measure*: Limit user ability to schedule tasks with elevated privileges or entirely deactivate this functionality if it's unnecessary.

---

To conclude, system administrators should enact a wide-ranging and continuous approach to security, including regular system and software updates, user access control, secure configurations and use of security tools. In addition, foster a culture of cyber security awareness among users as it's equally integral to mitigating these risks.

Conclusion

Given the power of CrewAI, as an individual I can think of quite a few workflows to make my day-to-day easier.

On an enterprise level it becomes a no-brainer. I’ve seen Copilot agents being created to do tasks like risk assessments, business analysis, and data aggregation. The problem is being locked in to Copilot if you want to carry on using those agents (not necessarily a bad thing, given the coverage they provide). You could write your own integrations, but let’s be honest, when does an enterprise have the capacity to do that?

In steps CrewAI to make things easier. You might say that it still requires quite a bit of code to set up these tasks and crews, but there is an enterprise offering that slaps a nifty UI over it all allowing a more “no-code” style approach.

Ultimately, if you’re dabbling in AI workflows, you’ve probably come across CrewAI already, but if you haven’t, there’s nothing to lose in trying it out.

I’ll be writing a few more articles like this to show some more advanced workflows I can come up with.

Waldemar Muhl

Multi-agent LLM workflows with CrewAI