Sneaking into the Mistral hackathon in SF
Building something quick and fun to enable automated lead generation using AI agents.
I've always enjoyed hackathons. Hackathons force me to develop and execute an idea rapidly. My favorite part is meeting all the smart engineers and seeing what they are working on and what I can learn from them. If you are ever feeling uninspired, go to a hackathon. Being around that many passionate engineers will charge your batteries and accelerate your passion. So when Mistral announced they were going to be holding an SF hackathon, I was extra excited. Mistral, for those who don’t know, is the leader in open-source AI with some of the strongest 7b models available, which can easily run on consumer hardware. I knew I had to be at their hackathon because I’ve been using their products religiously despite my beginner knowledge of AI. I was early to apply to join the hackathon, and I waited in anticipation of the event.
And I waited….
Then the announcement came that the event was full…
San Francisco is the hottest market for AI right now, and the greatest number of brilliant engineers are working to advance the field here. Whenever we have AI events multiple times a week, they almost always fill up, and people get rejected. It is a testament to how excited the local engineering community is about this technology. As I frequent these events, I know how competitive they are to get into, so I’ve had this happen before. However, this time, I was not about to let a lack of an invitation prevent me from participating. I knew exactly where the event was taking place, so I decided to show up with my old business partner, Nik Pash. Nik has been building some of the most advanced retrieval APIs in the industry for over a year. His company, Vault, has been quietly serving large enterprise customers, and his open-source libraries are used by many. Of course, he has no issue getting an invitation. We arrived at the event, and I strolled right up and wrote a name tag. Of course, the check-in person was quick to say, “Hey, you have to wait in line.” To which I politely said ‘oh, of course, just grabbing a name tag," and proceeded to walk right in, leaving Nik behind to wait in line.
The event space was filled with engineers frantically working on every project imaginable using Mistral's API. The hackathon had two tracks: fine-tuning Mistral models and using their API. Since I wasn’t technically a part of the hackathon, I couldn’t join just any team. Nik and I walked the event, talking to some of the engineers and founders about their ideas and to see if they were using retrieval technology in their projects. I was especially excited to see a robot powered by a local mistral 7b model allowing it to understand basic commands.
We ended up meeting the founder of an SEO startup, Backlinker.ai, which uses AI to provide a substantially cheaper and more effective backlink service. Bennett Heyn, CEO of Backlinker.ai, is a super smart guy with a ton of experience in the SEO industry. There was synergy because a big topic I’ve been researching lately is using AI agents for lead generation and content writing. He outlined how his business needs to send more personalized emails to get more clients. It's not a revolutionary idea, but one I wanted to run with to see if we can do it in the timeframe we had. I also know that if I can crack the formula for agent-powered lead generation, that I can sell this service and make good money.
Building an AI-powered Lead generation Tool
Lead generation is typically handled in a few steps, which lead a potential customer down a sales funnel. It is called a funnel because early in the process, you cast a wide net trying to catch as many possible leads as possible, and then as you proceed through the funnel, only the “Hot” leads that are actually interested in your product make it to the end. In order to build an effective lead generation tool, we wanted to focus on two parts of the funnel:
Generating possible leads
Outreach emails to qualify if those leads are hot
Extracting leads from web pages
Since Backlinker is a tool designed to give people backlinks(crazy, I know) to their sites, we have a built-in method for acquiring these leads. We can simply find a site with many backlinks and then extract all the information about each of the people backlinked on the site. Any of those people could pay money to improve the SEO and get additional backlinks. Bennett wrote a script that gets all the potential backlinks by scraping every email off the page. Although rudimentary, his script is a highly effective strategy for acquiring leads. In order to improve his lead generation code, we decided to write a new strategy using LLMs.
An obvious choice would be to use an LLM to write scrapper code. However, we decided that such a system would be hard to apply to any site consistently. Instead, we opted to use an LLM to directly parse the site’s HTML and return a JSON-structured response with all the leads. Mistral 7b is actually effective at doing this exact sort of operation. Unfortunately, the 8k context window is a bit small, so we decided to use Mistral-Large through the Mistral API.
Our initial results were promising, with Mistral being able to effectively strip the relevant contact information and structure it as we desired. We wanted to improve the latency and reduce the costs to run it. Since each webpage was over 15k context, processing hundreds or thousands of pages this way would be expensive. Realistically the content we actually wanted to parse represented a fraction of the actual HTML code. We added a preprocessing step to remove image tags or HTML attributes and other HTML boilerplate that did not contain our data. This effectively reduced the token count by 25% and our costs accordingly. Now that it was cheap enough to operate, we could move to the next step of the process, qualifying the leads we generated.
Agent-powered email outreach
I was most excited to work on the second stage of the process, qualifying the leads. As someone with a great deal of experience doing the outreach process manually by emailing or calling people, the idea of automating it seems like magic. Ultimately, in order to do effective outreach, we need to know a few things about the customer and the product we are selling. We need to understand who the customer is and why they might want to get backlinks in the first place. Then, we want to know how Backlinker can improve their SEO and site authority, all while saving them money. Finally, we can take that information and write a personalized email explaining how Backlinker can save time and pay for itself with increased website traffic.
To build an AI agent system, I opted to use my favorite Python orchestration framework, CrewAI. It abstracts away most of the complexity of building agents into an easy-to-use experience, allowing us to focus on prompting the agents effectively. CrewAI breaks down into three primary components, Agents, Tasks, and Crews. We needed to build 3 agents who would specialize in each task they were assigned.
Customer Research Agent
The customer research agent’s goal is to google the customer and create a summary outlining the company they work for, their personal history, and anything we know about their current SEO strategy. It has direct access to Google, enabling it to query information about the customer or their company until it has the information it needs to make a full report.
Backlinker.AI Product Expert
The Backlinker expert is an agent with as much knowledge as I could find about the Backlink service. To build this, I found as much information from the Backlink.ai website as possible and included all of it in the prompt. The expert’s task is to read the information gathered during the customer research task and offer bullet-point advice on how to sell the Backlink.ai service to this specific customer.
The Email Sales Agent
This agent crew's most important and final component is the Sales agent. Its job is to take what we learned from the previous two agents and formulate a personalized email from the CEO of Backlink.ai, Bennett Heyn. We need to be careful only to write emails with information we know is true. So it needs to use the research from the customer agent and be careful to validate the email its’s writing. Ensuring that this Agent consistently writes accurate information is a challenge. We found that using bigger models was helpful here as they better understand the context and hallucinate much less than the smaller 7b models.
Watch the agents slowly research potential leads and write emails!
Big surprise we didn’t win the Mistral hackathon
I greatly enjoyed participating in the Mistral hackathon in SF. We didn’t come up with a very original idea, but I’m excited about the execution of this idea. I’ve got customers who want this exact type of lead generation for their business. Most hackathon projects are some cool moonshot that must be a full business. Not me. I snuck into the Mistral hackathon to build something that I can use today to build my business and aid my clients with their businesses. I realize now that I can build lead generation software for my customers that can drive business growth and leverage cutting-edge AI technology, and you can too!
You can find the GitHub for our Hackathon entry here
I wrote a nice readme with instructions to set it up. I will probably fork this and build a more generic version I can use with clients, but that’s for a different article.
Big thanks to my team Nik Pash and Bennett Heyn