AISC9 has ended and there will be an AISC10Linda Linsefors
Published on April 29, 2024 10:53 AM GMTThe 9th AI Safety Camp (AISC9) just ended, and as usual, it was a success! Follow this link to find project summaries, links to their outputs, recordings to the end of camp presentations and contact info to all our teams in case you want to engage more.AISC9 both had the largest number of participants (159) and the smallest number of staff (2) of all the camps we’ve done so far. Me and Remmelt have proven that if necessary, we can do this with just the two
29 April 2024 at 18:53

AISC9 has ended and there will be an AISC10

Published on April 29, 2024 10:53 AM GMT

The 9th AI Safety Camp (AISC9) just ended, and as usual, it was a success!

Follow this link to find project summaries, links to their outputs, recordings to the end of camp presentations and contact info to all our teams in case you want to engage more.

AISC9 both had the largest number of participants (159) and the smallest number of staff (2) of all the camps we’ve done so far. Me and Remmelt have proven that if necessary, we can do this with just the two of us, and luckily our fundraising campaign raised just enough money to pay me and Remmelt to do one more AISC. After that, the future is more uncertain, but that’s almost always the case for small non profit projects.

Get involved in AISC10

AISC10 will follow the same format and timeline (shifted by one year) as AISC9.

Approximate timeline

August: Planning and preparation for the organisers
September: Research lead applications are open
October: We help the research lead applicants improve their project proposals
November: Team member applications are open
December: Research leads interviews and select their team

Mid-January to Mid April: The camp itself, i.e. each team works on their project.

Help us give feedback on the next round of AISC projects

An important part of AISC is that we give individual feedback on all project proposals we receive. This is very staff intensive and the biggest bottleneck in our system. If you’re interested in helping with giving feedback on proposals in certain research areas, please email me at [email protected].

Apply as a research lead – Applications will open in September

As an AISC research lead, you will both plan and lead your project. The AISC staff, and any volunteer helpers (see previous section) will provide you with feedback on your project proposal.

If your project proposal is accepted, we’ll help you recruit a team to help you realise your plans. We’ll broadcast your project, together with all the other accepted project proposals. We’ll provide structure and guidance for the team recruitment. You will choose which applications you think are promising, do the interviews and the final selection for your project team.

If you are unsure if this is for you, you’re welcome to contact Remmelt ([email protected]) specifically for stop/pause AI projects, or me ([email protected]) for anything else.

Apply as a team member – Applications will open in November

Approximately at the start of November, we’ll share all the accepted project proposals on our website. If you have some time to spare in January-April 2025, you should read them and apply to the projects you like.

We have room for more funding

Our stipends pot is empty or very close to empty (accounting for AISC9 is not finalised), if you want to help rectify this by adding some more money, please contact [email protected].

If we get some funding for this, but not enough for everyone, we will prioritise giving stipends to people from low and mid income countries, because we believe that a little money goes a long way for these participants.

Discuss

Some costs of superposition

Published on March 3, 2024 4:08 PM GMT

I don't expect this post to contain anything novel. But from talking to others it seems like some of what I have to say in this post is not widely known, so it seemed worth writing.

In this post I'm defining superposition as: A representation with more features than neurons, achieved by encoding the features as almost orthogonal vectors in neuron space.

One reason to expect superposition in neural nets (NNs), is that for large , has many more than almost orthogonal directions. On the surface, this seems obviously useful for the NN to exploit. However, superposition is not magic. You don't actually get to put in more information, the gain you get from having more feature directions has to be paid for some other way.

All the math in this post is very hand-wavey. I expect it to be approximately correct, to one order of magnitude, but not precisely correct.

Sparsity

One cost of superposition is feature activation sparsity. I.e, even though you get to have many possible features, you only get to have a few of those features simultaneously active.

(I think the restriction of sparsity is widely known, I mainly include this section because I'll need the sparsity math for the next section.)

In this section we'll assume that each feature of interest is a boolean, i.e. it's either turned on or off. We'll investigate how much we can weaken this assumption in the next section.

If you have features represented by neurons, with , then you can't have all the features represented by orthogonal vectors. This means that an activation of one feature will cause some some noise in the activation of other features.

The typical noise on feature caused by 1 unit of activation from feature , for any pair of features , , is (derived from Johnson–Lindenstrauss lemma)

^[1]

If features are active then the typical noise level on any other feature will be approximately units. This is because the individual noise terms add up like a random walk. Or see here for an alternative explanation of where the root square comes from.

For the signal to be stronger than the noise we need , and preferably .

This means that we can have at most simultaneously active features, and possible features.

Boolean-ish

The other cost of superposition is that you lose expressive range for your activations, making them more like booleans than like floats.

In the previous section, we assumed boolean features, i.e. the feature is either on (1 unit of activation + noise) or off (0 units of activation + noise), where "one unit of activation" is some constant. Since the noise is proportional to the activation, it doesn't matter how large "one unit of activation" is, as long as it's consistent between features.

However, what if we want to allow for a range of activation values?

Let's say we have neurons, possible features, at most simultaneous features, with at most activation amplitude. Then we need to be able to deal with noise of the level

The number of activation levels the neural net can distinguish between is at most the max amplitude divided by the noise.

Any more fine grained distinction will be overwhelmed by the noise.

As we get closer to maxing out and , the smaller the signal to noise ratio gets, meaning we can distinguish between fewer and fewer activation levels, making it more and more boolean-ish.

This does not necessarily mean the network encodes values in discrete steps. Feature encodings should probably still be seen as inhabiting a continuous range but with reduced range and precision (except in the limit of maximum super position, when all feature values are boolean). This is similar to how floats for most intents and purposes should be seen as continuous numbers, but with limited precision. Only here, the limited precision is due to noise instead of encoding precision.

My guess is that that the reason you can reduce the float precision of NNs with out suffering much inference loss [citation needed], is because noise levels and not encoding precision is the limiting factor.

Compounding noise

In a multi layer neural network, it's likely that the noise will grow for each layer, unless this is solved by some error correction mechanism. There is probably a way for NNs to deal with this, but I currently don't know what this mechanism would be, and how much of the NNs activation space and computation will have to be allocated to deal with this.

I think than figuring out this cost from having to do error correction, is very relevant for weather or not we should expect superposition to be common in neural networks.

In practice I still think Superposition is a win (probably)

In theory you get to use less bits of information in the superposition framework. The reason being that you only get to use a ball inside neuron activation space (or the interference gets to large) instead of the full hyper volume.

However, I still think superposition let's you store more information in most practical applications. A lot of information about the world is more boolean-ish than float-ish. A lot of information about your current situation will be sparse, i.e. most things that could be present are not present.

The big unknown is the issue of compounding noise. I don't know the answer to this, but I know others are working on it.

Acknowledgement

Thanks to Lucius Burshnaq, Steven Byrnes and Robert Cooper for helpful comments on the draft of this post.

^{^}
In Johnson–Lindenstrauss lemma,  is the error in the length of vectors, not the error in orthogonality, however for small , they should be similar.
Doing the math more carefully, we find that

where  is the angle between two almost orthogonal features.
This is a worst case scenario. I have not calculated the typical case, but I expect it to be somewhat less, but still same order of magnitude, which is why I feel OK with using just  for the typical error in this blogpost.

Discuss

AI Safety Camp 2024

Published on November 18, 2023 10:37 AM GMT

AI Safety Camp connects you with a research lead to collaborate on a project – to see where your work could help ensure future AI is safe.

Apply before December 1, to collaborate online from January to April 2024.

We value diverse backgrounds. Many roles but definitely not all require some knowledge in one of: AI safety, mathematics or machine learning.

Some skills requested by various projects:

Art, design, photography
Humanistic academics
Communication
Marketing/PR
Legal expertise
Project management
Interpretability methods
Using LLMs
Coding
Math
Economics
Cybersecurity
Reading scientific papers
Know scientific methodologies
Think and work independently
Familiarity of AI risk research landscape

Projects

To not build uncontrollable AI
Projects to restrict corporations from recklessly scaling the training and uses of ML models. Given controllability limits.

1. Towards realistic ODDs for foundation model based AI offerings
2. Luddite Pro: information for the refined luddite
3. Lawyers (and coders) for restricting AI data laundering
4. Assessing the potential of congressional messaging campaigns for AIS

Everything else
Diverse other projects, including technical control of AGI in line with human values.

Please write your application with the research lead of your favorite project in mind. Research leads will directly review applications this round. We organizers will only assist when a project receives an overwhelming number of applications.

Apply now

Apply if you…

want to consider and try out roles for helping ensure future AI function safely;
are able to explain why and how you would contribute to one or more projects;
previously studied a topic or trained in skills that can bolster your new team’s progress;
can join weekly team calls and block out 5 hours of work each week from January to April 2024.

Timeline

Applications

By 1 Dec: Apply. Fill in the questions doc and submit it through the form.

Dec 1-22: Interviews. You may receive an email for an interview, from one or more of the research leads whose project you applied for.

By 28 Dec: Final decisions. You will definitely know if you are admitted. Hopefully we can tell you sooner, but we pinky-swear we will by 28 Dec.

Program

Jan 13-14: Opening weekend. First meeting with your teammates and one-on-one chats.

Jan 15 – Apr 28: Research is happening. Teams meet weekly, and plan in their own work hours.

April 25-28: Final presentations spread over four days.

Afterwards

For as long as you want: Some teams keep working together after the official end of AISC.

When you start the project, we recommend that you don’t make any commitment beyond the official length of the program. However if you find that you work well together as a team, we encourage you to keep going even after AISC is officially over.

First virtual edition – a spontaneous collage

Team structure

Every team will have:

one Research Lead (RL)
one Team Coordinator (TC)
other team members

All team members are expected to work at least 5 hours per week on the project (this number can be higher for specific projects), which includes joining weekly team meetings, and communicating regularly with other team members about their work.

Research Lead (RL)

The RL is the person behind the research proposal. They will guide the research project, and keep track of relevant milestones. When things inevitably don’t go as planned (this is research after all) the RL is in charge of setting the new course.

The RL is part of the research team and will be contributing to research the same as everyone else on the team.

Team Coordinator (TC)

The TC is the ops person of the team. If you are the TC then you are in charge of making sure meetings are scheduled, checks in with individuals on their task progress, etc.

The role of the TC is important but not expected to take too much time (except for project management-heavy teams). Most of the time, the TC will act like a regular team member contributing to the research, same as everyone else on the team.

Each project proposal states whether the looking for someone like you to take on this role.

Other team members

Other team members will work on the project under the guidance of the RL and the TC. Team members will be selected based on relevant skills, understandings and commitments to contribute to the research project.

Apply now

Questions?

Check out our frequently asked questions, in case you can find the answer there.

For questions on a project, please contact the research lead. Find their contact info at the bottom of their project doc.
For questions about the camp in general, or if you can’t reach the specific research lead, please email [email protected].
May take 5 days for organizers to reply.

We are fundraising!

Organizers are volunteering this round, since we had to freeze our salaries. This is not sustainable. To make next editions happen, consider making a donation. For larger amounts, feel free to email Remmelt.

Donate

Discuss

Projects I would like to see (possibly at AI Safety Camp)

Published on September 27, 2023 9:27 PM GMT

I recently discussed with my AISC co-organiser Remmelt, some possible project ideas I would be excited about seeing at the upcoming AISC, and I thought these would be valuable to share more widely.

Thanks to Remmelt for helfull suggestions and comments.

What is AI Safety Camp?

AISC in its current form is primarily a structure to help people find collaborators. As a research lead we give your project visibility, and help you recruit a team. As a regular participant, we match you up with a project you can help with.

I want to see more good projects happening. I know there is a lot of unused talent wanting to help with AI safety. If you want to run one of these projects, it doesn't matter to me if you do it as part of AISC or independently, or as part of some other program. The purpose of this post is to highlight these projects as valuable things to do, and to let you know AISC can support you, if you think what we offer is helpful.

Project ideas

These are not my after-long-consideration top picks of most important things to do, just some things I think would be net positive if someone would do. I typically don't spend much cognitive effort on absolute rankings anyway, since I think personal fit is more important for ranking your personal options.

I don't claim originality for anything here. It’s possible there is work on one or several of these topics, which I’m not aware of. Please share links in comments, if you know of such work.

Is substrate-needs convergence inevitable for any autonomous system, or is it preventable with sufficient error correction techniques?

This can be done as an adversarial collaboration (see below) but doesn't have to be.

The risk from substrate-needs convergence can be summarised as such:

If AI is complex enough to self-sufficiently maintain its components, natural selection will sneak in.
This would select for components that cause environmental conditions needed for artificial self-replication.
An AGI will necessarily be complex enough.

Therefore natural selection will push the system towards self replication. Therefore it is not possible for an AGI to be stably aligned with any other goal. Note that this line of reasoning does not necessitate that the AI will come to represent self replication as its goal (although that is a possible outcome), only that natural selection will push it towards this behaviour.

I'm simplifying and skipping over a lot of steps! I don't think there currently is a great writeup of the full argument, but if you're interested you can read more here or watch this talk by Remmelt or reach out to me or Remmelt. Remmelt has a deeper understanding of the arguments for substrate-needs convergence than me, but my communication style might be better suited for some people.

I think substrate-needs convergence is pointing at a real risk. I don't know yet if the argument (which I summarised above) proves that building an AGI that stays aligned is impossible, or if it points to one more challenge to be overcome. Figuring out which of these is the case seems very important.

I've talked to a few people about this problem, and identified what I think is the main crux: How well you can execute error correction mechanisms?

When Forest Laundry and Anders Sandberg discussed substrate-needs convergence, they ended up with a similar crux, but unfortunately did not have time to address it. Here’s a recording of their discussion, however Landry’s mic breaks about 20 minutes in, which makes it hard to hear him from that point onward.

Any alignment-relevant adversarial collaboration

What is adversarial collaborations?
Se this SSC post for an explanation: Call or Adversarial Collaborations | Slate Star Codex

Possible topic:

For and against some alignment plan. Maybe yours?
Is alignment of super human systems possible or not?

I expect this type of project to be most interesting if both sides already have strong reasons for believing the side they are advocating for. My intuition says that different frames will favour different conclusions, and that you will miss one or more important frame if either or both of you start from a weak conviction. The most interesting conversation will come from taking a solid argument for A and another solid argument for not-A, and finding a way for these perspectives to meet.

I think AISC can help finding good matches. The way I suggest doing this is that one person (the AISC Research Lead) lays out their position in their project proposal. Then we post this for everyone to see. Then when we open up for team member applications, anyone who disagrees can submit their application to join this project. Possibly you can have more than one person defending and attacking the position in question, and you can also add a moderator to the team if that seems useful.

However, if the AISC structure seems a bit overkill or just not the right fit for what you want to do in particular, there are other options too. For example you're invited to post ideas in the comments of this post.

Haven't there already been several AI Safety debates?
Yes, and those have been interesting. But also, doing an adversarial collaboration as part of AISC is a longer time commitment than most of these debates, which will allow you to go deeper. I'm sure there have also been long conversations in the past, which continue back and forth over months, and I'm sure many of those have been useful too. Let's have more!

What capability thresholds along what dimensions should we never cross?

This is a project for people who think that alignment is not possible, or at least not tractable in the next couple of decades. I'd be extra interested to see someone work on this from the perspective of risk due to substrate-needs convergence, or at least taking this risk into account, since this is an underexplored risk.

If alignment is not possible and we have to settle for less than god-like AI, then where do we draw the boundary for safe AI capabilities? What capability thresholds along what dimensions should we never cross?

Karl suggested something similar here: Where are the red lines for AI?

A taxonomy of: What end-goal are “we” aiming for?

In what I think of as “classical alignment research” the end goal is a single aligned superintelligent AI, which will solve all our future problems, including defending us against any future harmful AIs. But the field of AI safety has broadened a lot since then. For example there are much more efforts into coordination now. But is the purpose of regulation and other coordination, just to slow down AI so we have time to solve alignment, so we can build our benevolent god later on? Or are we aiming for a world where humans stay in control? I expect different people and different projects to have different end-goals in mind. However, this isn’t talked about much, so I don’t know.

It is likely that some of the disagreement around alignment is based on different agendas aiming for different things. I think it would be good for the AI safety community to have an open discussion about this. However the first step should not be to argue who is right or wrong, but just to map out what end-goal different people and groups have in mind.

In fact, I don’t think consensus on what the end-goal should be is necessarily something we want at this point. We don’t know yet what is possible. It’s probably good for humanity to keep our options open, which means different people preparing the path for different options. I like the fact that different agendas are aiming at different things. But I think the discourse and understanding could be improved by more common knowledge about who is aiming for what.

Discuss

Apply to lead a project during the next virtual AI Safety Camp

Published on September 13, 2023 1:29 PM GMT

Do you have AI Safety research ideas that you would like others to work on? Is there a project you want to do and you want help finding a team to work with you? AI Safety Camp could be the solution for you!

Summary

AI Safety Camp Virtual is a 3-month long online research program from January to April 2024, where participants form teams to work on pre-selected projects. We want you to suggest the projects!

If you have an AI Safety project idea and some research experience, apply to be a Research Lead.

If accepted, we offer some assistance to develop your idea into a plan suitable for AI Safety Camp. When project plans are ready, we open up team member applications. You get to review applications for your team, and select who joins as a team member. From there, it’s your job to guide work on your project.

Your project is totally in your hands. We, Linda and Remmelt, are just there at the start.

Who is qualified?
We require that you have some previous research experience. If you are at least 1 year into a PhD or if you have completed an AI Safety research program (such as a previous AI Safety Camp, Refine or SERI MATS), or done a research internship with an AI Safety org, then you are qualified already. Other research experience can count too.

More senior researchers are of course also welcome, as long as you think our format of leading an online team inquiring into your research questions suits you and your research.

Apply here

If you are unsure, or have any questions you are welcome to:

Book a call with Linda
Message Linda Linsefors on the Alignment Slack
Send an email

Choosing project idea(s)

AI Safety Camp is about ensuring future AI are safe. This round, we split work into two areas:

To not build uncontrollable AI
Focussed work toward restricting corporate-AI scaling. Given reasons why ‘AGI’ cannot be controlled sufficiently (in time) to stay safe.
Everything else
Open to any other ideas, including any work toward controlling/value-aligning AGI.

We welcome diverse projects! Last round, we accepted 14 projects – including in Theoretical research, Machine learning experiments, Deliberative design, Governance, and Communication.

If you already have an idea for what project you would like to lead, that’s great. Apply with that one!

You don’t need to come up with an original idea though. What matters is you understanding the idea you want to work on, and why. If you base your proposal on someone else’s idea, make sure to cite them.

Primary reviewers:

Remmelt reviews uncontrollability-focussed projects.
Linda reviews everything else.

We will also ask for assistance from previous Research Leads, and up to a handful of other trusted people, to review and suggest improvement for your project proposals.

You can submit as many project proposals as you want. However, we will not let you lead more than two projects, and we don’t recommend leading more than one.

Use this template to describe each of your project proposals. We want one document per proposal.

Team structure

Every team will have:

one Research Lead
one Team Coordinator
other team members

To make progress on your project, every team member is expected to work at least 5 hours per week (however the RL can choose to favour people who can put in more time, when selecting their team). This includes time joining weekly team meetings, and communicating regularly (between meetings) with other team members about their work.

Research Lead (RL)

The RL suggests one or several research topics. If a group forms around one of their topics, the RL will guide the project, and keep track of relevant milestones. When things inevitably don’t go as planned (this is research after all) the RL is in charge of setting the new course.

The RL is part of the team and will be contributing to project work, same as everyone else on the team.

Team Coordinator (TC)

The TC is the ops person of the team. They are in charge of making sure meetings are scheduled, check in with individuals on task progress, etc.

The job of the TC is important, but not expected to take much time (except for project management-heavy teams). Most of the time, the TC will act like a regular team member contributing to the research, same as everyone else on the team.

TC and RL can be the same person.

Other team members

Other team members will work on the project under the leadership of the RL and the TC. Team members will be selected based on relevant skills, understandings and commitments to contribute to the project.

Team formation and timeline

Applications for this camp will open in two stages:

Now – Nov 3 is for RLs to suggest project ideas. Selected RLs then get support developing their project plans. Next we publish the project plans, and open applications for other team members.
Nov 10 – Dec 22 is for all other team members. Potential participants will apply to join specific projects they want to work on. RLs are expected to select applicants for their project(s), and interview potential team members.

RL applications

October 6: Application deadline for RLs.
If you apply in time you are guaranteed an interview and help refining your proposal, before we make a decision.
October 11: Deadline for late RL applications.
If you apply late we are not guaranteed help with your proposal. However, you do improve your chances by being less late.
November 3: Deadline for refined proposals.

Team member applications:

November 10: Accepted proposals are posted on the AISC website. Application to join teams open.
December 1: Application to join teams closes.
December 22: Deadline for RLs to choose their team.

Program

Jan 13 – 14: Opening weekend
Jan 15 – Apr 28: Research is happening.
Teams meet weekly, and plan in their own work hours.
April 25 – 28: Final presentations

Afterwards

For as long as you want: Some teams keep working together after the official end of AISC.
When starting out we recommend that you don’t make any commitment beyond the official length of the program. However if you find that you work well together as a team, we encourage you to keep going even after AISC is officially over.

Application process for RL

As part of the RL application process we will help you improve your project plan, mainly through comments on your document. How much support we can provide depends on the number of applications we get. However, everyone who applies on time (before October 6th) is guaranteed at least one 1-on-1 call with someone on the AISC team, to discuss your proposal.

Your application will not be judged based on your initial proposal, but on the refined proposal, after you had the opportunity to respond to our feedback. The final deadline for improving your proposal is November 3rd.

Your RL application will be judged based on:

Theory of impact
What is the theory of impact of your project? Here we are asking about the relevance of your project work for reducing large-scale risks of AI development and deployment. If your project succeeds, can you tell us how this makes the world safer?
Project plan and fit for AISC
Do you have a well-thought-out plan for your project? How well does your plan fit the format of AISC? Is the project something that can be done by a remote team over 3 months? If your project is very ambitious, maybe you want to pick out a smaller sub-goal as the aim of AISC?
Downside risk
What are the downside risks of your projects and what is your plan to mitigate any such risk? The most common risk for AI safety projects is that your project might accelerate AI capabilities. If we think your project will enhance capabilities more than safety, we will not accept it.
You as Research Lead
Do we believe that you have the required skills and commitment to the project, and enough time to spend on this, in order to follow through? If we are going to promote your project and help you recruit a team to join you, we need to know that you will not let your team down.

Applications are open now for Research Leads

Stipends

Stipends are limited this round.
For participants, we have $99K left in our stipend pot, which translated over ~60 participants would mean that we can pay out $1K per team member and $1.5K per research lead for anyone who opts in.

For the rest, we are cash-strapped. We cannot reimburse software subscription or cloud compute costs.

We froze our salary funds from FTX. We organisers are volunteering our time because we think it matters.

Do you want to be a Research Lead?

If you have a project idea and you are willing to lead or guide a team working on this idea, you should apply to be RL.

We don’t expect a fully formed research plan! If we think your idea is suitable for AISC, we can help you to improve it.

Apply here

If you are unsure, or have any further questions you are welcome to:

Book a call with Linda
Message Linda Linsefors on the Alignment Slack
Send an email

Discuss

Normal view

Get involved in AISC10

Approximate timeline

Help us give feedback on the next round of AISC projects

Apply as a research lead – Applications will open in September

Apply as a team member – Applications will open in November

We have room for more funding

Sparsity

Boolean-ish

Compounding noise

In practice I still think Superposition is a win (probably)

Acknowledgement

Projects

Apply if you…

Timeline

Team structure

Questions?

We are fundraising!

What is AI Safety Camp?

Project ideas

Is substrate-needs convergence inevitable for any autonomous system, or is it preventable with sufficient error correction techniques?

Any alignment-relevant adversarial collaboration

What capability thresholds along what dimensions should we never cross?

A taxonomy of: What end-goal are “we” aiming for?

Summary

Choosing project idea(s)

Team structure

Research Lead (RL)

Team Coordinator (TC)

Other team members

Team formation and timeline

Application process for RL

Stipends

Do you want to be a Research Lead?